eHumanities Desktop

[Could not find the bibliography file(s)The eHumanities Desktop has been developed as an online system to support research in the growing field of digital humanities. It provides an intuitive desktop environment, very much like the Windows Desktop, that supports uploading and organizing resources as well as sharing them with other users. Based on this basic resource management a growing number of applications are offered to process, analyze and explore documents online. The applications range from linguistic preprocessing of text into TEI P5 [?] and lexical resource management to text classification and image databases.

eHumanities Desktop


Figure 1: the architecture of the eHumanities Desktop

The eHumanities Desktop supports the modelling, management, retrieval, and editing of multimedia text and image data, as well as the efficient allocation of resources, supporting at times large volumes of data. The eHumanities Desktop [?], which is under development at the Text Technology Lab at the Goethe University Frankfurt, offers a Service Oriented Architecture (SOA) for functionality in the area of the Digital Humanities. It integrates the Linguistic Networks system [?] for the management, searchability, and visualization of linguistic networks. For the time being, the eHumanities Desktop integrates five application modules and makes them accessible via the web as well as via an API. These include:

  1. the Neo4J-based [?] foundation data model
    eHuBase [?]
  2. the integrated document and lexicon module TEILex [?], also based on a Neo4J database
  3. the module Neo4Wikipedia for the management of wiki-based present-day language
  4. the OWL-based annotation module OWLnotator, [?] for the annotation of, where applicable, any number of multimedia resources of the eHumanities Desktop
  5. the Linguistic Networks module [?] that is a part of the eHumanities Desktop also rests on Neo4J.

As the interface for the display, management, and searchability of lexica, the eHumanities Desktop offers the eLexicon Browser. All these modules were developed using the architectural model displayed in Figure 1. For an illustration of the SOA of the eHumanities Desktop, see Figure 2.

Figure 2: the SOA of the eHumanities Desktop.


Figure 3: eHuBase.

The core of the eHumanities Desktop is contained in the fundamental data management system, eHuBase. Here we can view and manage information about users, group membership, and rights (read, write, delete, etc.), about resources, and about base application functionality. eHuBase handles the core data of all the resources that are processed in the eHumanities Desktop. This includes documents, repositories, discrete program functionality, and annotations. The eHumanities Desktop currently has 312 registered users that, in 27 project groups, have access to approximately 800,000 documents. The eHumanities Desktop handles documents in all popular formats, so that users can upload, share, manage, or process text, images, videos, or sound files. The architecture of the eHumanities Desktop allows for a level of abstraction in handling documents, which has to do with the way documents are saved. Binary data (images or video) are saved redundantly in a clustered file system. For this, we use the Apache Hadoop Framework [?] .

TEILex and eLexicon Browser

Figure 4: the eLexicon Browser.

A central feature of the eHumanities Desktop is the integration of lexica and corpora, which can be annotated using the lexica either automatically or manually [?]. Additions, corrections, and revisions to lexica can be done automatically using a linked TEI document as a source, without requiring changes to annotations or a reindexing of the affected corpora. In this way, annotators should be able to edit a lemmatisation without errors or gaps, so that the corrections appear immediately when browsing or searching the corpus. The TEILex module was developed to supply this functionality (see Figure 5). TEILex integrates the data model for TEI P5-conforming documents and for lexica using the same graph database. The Lexical Markup Framework (LMF) [?] serves as an alternative format for TEILex. But the innovation of TEILex is in the integration of documents and lexica, which are used together in processing. Every incidence of a word in a text is linked logically to the corresponding syntactic word in the lexicon. In the case of changes to the lexicon, there is no need for a following data synchronisation. This makes it much easier to make corrections and additions directly to the lexica using the linked document. Documents annotated in this way can be downloaded at any time as TEI P5 documents. The functionality of TEILex is available within the eLexicon Browser as well.

Figure 5: The workflow of document annotation without (above) and with (below) TEILex.


The ImageDB module was developed to support the creation of multimedia corpora. The ImageDB allows users to segment images recursively and to link images and image segments to each other. Segments can take the form of rectangles, circles, ellipses, or any sort of user-defined polygonal shape. To segment an image creates an intra-aggregate relation between the parent and child image. The ImageDB makes it possible to create, manage, and search these inter- and intra-aggregate relations [?]. Consequently the eHumanities Desktop can display the entire range of text-text, image-image, and text-image relationships. The OWLnotator module provides the means for this linking.