BIOfid – Text Technology Lab

BIOfid (FID Biodiversity Research) aims to make both historical and contemporary biodiversity literature available to researchers in modern, machine-readable formats across Germany.

The Johann Christian Senckenberg University Library leads the project in collaboration with the Senckenberg Society for Nature Research and the Text Technology Working Group at the Goethe University Frankfurt.

Scope and Objectives

In terms of subject focus, BIOfid covers a subfield of the former special collection area of biology. Biodiversity research, as defined within the project, includes:

Taxnomy
Systematics
Evolutionary biology
Ecology

The initial three-year development phase is dedicated to building an innovative specialised information service (FID), based on close and continuous exchange with the biodiversity research community.

More information about the service can be found at: https://www.biofid.de/en/

Phases of the BIOfid Project:

Phase I: 2017-2020
Phase II: 2020-2023
Phase III: 2023-2026

Project Structure

The BIOfid project is structured into four complementary modules, each addressing a key aspect of making biodiversity literature accessible, usable, and sustainable for research.

Module 1: Text Mining for Biodiversity Literature

Objective:
- To mobilise structured biodiversity data from existing scientific literature using text-mining techniques.
Description:
- Module 1 focuses on extracting relevant scientific information from biodiversity literature and transforming it into machine-readable data. Based on a DFG roundtable discussion involving representatives from numerous German research institutions, the project initially concentrates on:
  - Publications originating from Germany
  - Three organism groups: birds, butterflies, and vascular plants
- The module develops reusable text-mining tools that can later be applied to additional organism groups and geographical regions. This ensures long-term scalability beyond the initial project scope.
Outcome:
- The extracted and structured data form an extensive, reusable data pool that will support future biodiversity research and enable data-driven analyses.

Module 2: Digitisation of Biodiversity Literature

Objective:
- To create high-quality digital text corpora from print biodiversity literature.
Description:
- Module 2 focuses on the digitisation of 20th-century biodiversity literature. This digitised material serves a dual purpose:
  1. It provides the textual foundation required for text-mining activities in Module 1.
  2. It forms the content basis for the planned open access journal platform in Module 3.
- An additional goal of this module is to ensure that digitised materials are freely available on the web, thereby improving accessibility for researchers.
Outcome:
- A curated, digitised corpus of biodiversity literature that supports both automated processing and open scholarly access.

Module 3: Open Access Journal Platform

Objective:
- To establish a sustainable publishing infrastructure for biodiversity research.
Description:
- Module 3 involves the development of a platform for open access biodiversity journals, designed as a long-term service for publishers such as professional associations.
- The platform supports:
  - The publication of new open access journals
  - The digital transfer of journals previously available only in print form
- This module ensures the long-term preservation, visibility, and accessibility of biodiversity literature.
Outcome:
- A stable and sustainable open access platform that enables continuous dissemination of biodiversity research outputs.

Module 4: Literature Procurement and Supraregional Provision

Objective:
- To ensure comprehensive access to biodiversity literature, including print-only resources.
Description:
- Module 4 focuses on the procurement and supraregional provision of specialist biodiversity literature. It ensures that printed literature remains accessible for research purposes and covers the entire spectrum of organismic biodiversity.
- In addition, this module provides supraregional access to specialised databases, including the Global Plants database, for eligible research institutions.
Outcome:
- Reliable, nationwide access to essential biodiversity literature and databases, complementing the digital services provided by other modules.

Overall Goal

All four modules work together to make biodiversity literature easier to use for research. The project supports digitisation, text mining, publishing, and access to literature so that the content can be analysed with computers as well as read by humans.

Current Work (Ongoing)

As part of the ongoing work in BIOfid, we are focusing on the annotation of biodiversity literature, especially historical German texts. These texts are difficult to process automatically because they contain old spelling forms, changing terminology, and long, complex sentences.
To handle this, we are using large language models (LLMs) to annotate important information in the texts, such as location and time expressions. The models are guided with clear instructions and structured input formats so that annotations are produced in a consistent way.
At the same time, human annotations play an important role. A large set of manually annotated data is used as a reference to evaluate and verify the annotations produced by LLMs. This allows us to compare machine-generated annotations with human judgments and to better understand the strengths and limitations of LLMs on historical biodiversity texts.
This work is still in progress. It helps improve the quality of machine-readable data and supports future text mining and biodiversity research within BIOfid.

Publications

Thiemo Dahmann, Julian Schneider, Philipp Stephan, Giuseppe Abrami and Alexander Mehler. 2026. Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.

BibTeX

@inproceedings{Dahmann:et:al:2026,
  title     = {Towards the Generation and Application of Dynamic Web-Based Visualization
               of UIMA-based Annotations for Big-Data Corpora with the Help of
               Unified Dynamic Annotation Visualizer},
  booktitle = {Proceedings of the 15th International Conference on Language Resources
               and Evaluation (LREC 2026)},
  year      = {2026},
  author    = {Dahmann, Thiemo and Schneider, Julian and Stephan, Philipp and Abrami, Giuseppe
               and Mehler, Alexander},
  keywords  = {NLP, UIMA, Annotations, dynamic visualization, uce},
  abstract  = {The automatic and manual annotation of unstructured corpora is
               a daily task in various scientific fields, which is supported
               by a variety of existing software solutions. Despite this variety,
               there are currently only limited solutions for visualizing annotations,
               especially with regard to dynamic generation and interaction.
               To bridge this gap and to visualize and provide annotated corpora
               based on user-, project- or corpus-specific aspects, Unified Dynamic
               Annotation Visualizer (UDAV) was developed. UDAV is designed as
               a web-based solution that implements a number of essential features
               which comparable tools do not support to enable a customizable
               and extensible toolbox for interacting with annotations, allowing
               the integration into existing big data frameworks.},
  note      = {accepted}
}

Kevin Bönisch, Giuseppe Abrami and Alexander Mehler. 2025. Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), 522–534. Best Demo Award.

BibTeX

@inproceedings{Boenisch:et:al:2025,
  title     = {Towards Unified, Dynamic and Annotation-based Visualisations and
               Exploration of Annotated Big Data Corpora with the Help of Unified
               Corpus Explorer},
  author    = {B{\"o}nisch, Kevin and Abrami, Giuseppe and Mehler, Alexander},
  editor    = {Dziri, Nouha and Ren, Sean (Xiang) and Diao, Shizhe},
  booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas
               Chapter of the Association for Computational Linguistics: Human
               Language Technologies (System Demonstrations)},
  year      = {2025},
  address   = {Albuquerque, New Mexico},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.naacl-demo.42/},
  pages     = {522--534},
  isbn      = {979-8-89176-191-9},
  abstract  = {The annotation and exploration of large text corpora, both automatic
               and manual, presents significant challenges across multiple disciplines,
               including linguistics, digital humanities, biology, and legal
               science. These challenges are exacerbated by the heterogeneity
               of processing methods, which complicates corpus visualization,
               interaction, and integration. To address these issues, we introduce
               the Unified Corpus Explorer (UCE), a standardized, dockerized,
               open-source and dynamic Natural Language Processing (NLP) application
               designed for flexible and scalable corpus navigation. Herein,
               UCE utilizes the UIMA format for NLP annotations as a standardized
               input, constructing interfaces and features around those annotations
               while dynamically adapting to the corpora and their extracted
               annotations. We evaluate UCE based on a user study and demonstrate
               its versatility as a corpus explorer based on generative AI.},
  note      = {Best Demo Award},
  pdf       = {https://aclanthology.org/2025.naacl-demo.42.pdf},
  keywords  = {uce,new-data-spaces,circlet,core,core_c08}
}

Giuseppe Abrami, Markos Genios, Filip Fitzermann, Daniel Baumartz and Alexander Mehler. 2025. Docker Unified UIMA Interface: New perspectives for NLP on big data. SoftwareX, 29:102033.

BibTeX

@article{Abrami:et:al:2025:a,
  title     = {Docker Unified UIMA Interface: New perspectives for NLP on big data},
  journal   = {SoftwareX},
  volume    = {29},
  pages     = {102033},
  year      = {2025},
  issn      = {2352-7110},
  doi       = {https://doi.org/10.1016/j.softx.2024.102033},
  url       = {https://www.sciencedirect.com/science/article/pii/S2352711024004047},
  author    = {Giuseppe Abrami and Markos Genios and Filip Fitzermann and Daniel Baumartz
               and Alexander Mehler},
  keywords  = {Docker, Kubernetes, UIMA, Distributed NLP, duui, biofid, neglab, new-data-spaces, circlet, core, core_c08},
  abstract  = {Processing large amounts of natural language text using machine
               learning-based models is becoming important in many disciplines.
               This demand is being met by a variety of approaches, resulting
               in the heterogeneous deployment of separate, partly incompatible,
               not natively scalable applications. To overcome the technological
               bottleneck involved, we have developed Docker Unified UIMA Interface,
               a system for the standardized, parallel, platform-independent,
               distributed and microservices-based solution for processing large
               and extensive text corpora with any NLP method. We present DUUI
               as a framework that enables automated orchestration of GPU-based
               NLP processes beyond the existing Docker Swarm cluster variant,
               and in addition to the adaptation to new runtime environments
               such as Kubernetes. Therefore, a new driver for DUUI is introduced,
               which enables the lightweight orchestration of DUUI processes
               within a Kubernetes environment in a scalable setup. In this way,
               the paper opens up novel text-technological perspectives for existing
               practices in disciplines that deal with the scientific analysis
               of large amounts of data based on NLP.}
}

Andy Lücking, Christine Driller, Manuel Stoeckel, Giuseppe Abrami, Adrian Pachzelt and Alexander Mehler. 2021. Multiple Annotation for Biodiversity: Developing an annotation framework among biology, linguistics and text technology. Language Resources and Evaluation.

BibTeX

@article{Luecking:et:al:2021,
  author    = {Andy Lücking and Christine Driller and Manuel Stoeckel and Giuseppe Abrami
               and Adrian Pachzelt and Alexander Mehler},
  year      = {2021},
  journal   = {Language Resources and Evaluation},
  title     = {Multiple Annotation for Biodiversity: Developing an annotation
               framework among biology, linguistics and text technology},
  editor    = {Nancy Ide and Nicoletta Calzolari},
  doi       = {10.1007/s10579-021-09553-5},
  pdf       = {https://link.springer.com/content/pdf/10.1007/s10579-021-09553-5.pdf},
  keywords  = {biofid}
}

Giuseppe Abrami, Alexander Henlein, Andy Lücking, Attila Kett, Pascal Adeberg and Alexander Mehler. June, 2021. Unleashing annotations with TextAnnotator: Multimedia, multi-perspective document views for ubiquitous annotation. Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, 65–75.

BibTeX

@inproceedings{Abrami:et:al:2021,
  author    = {Abrami, Giuseppe and Henlein, Alexander and Lücking, Andy and Kett, Attila
               and Adeberg, Pascal and Mehler, Alexander},
  title     = {Unleashing annotations with {TextAnnotator}: Multimedia, multi-perspective
               document views for ubiquitous annotation},
  booktitle = {Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable
               Semantic Annotation},
  series    = {ISA-17},
  publisher = {Association for Computational Linguistics},
  address   = {Groningen, The Netherlands (online)},
  month     = {June},
  editor    = {Bunt, Harry},
  year      = {2021},
  url       = {https://aclanthology.org/2021.isa-1.7},
  pages     = {65--75},
  keywords  = {textannotator, biofid},
  pdf       = {https://iwcs2021.github.io/proceedings/isa/pdf/2021.isa-1.7.pdf},
  abstract  = {We argue that mainly due to technical innovation in the landscape
               of annotation tools, a conceptual change in annotation models
               and processes is also on the horizon. It is diagnosed that these
               changes are bound up with multi-media and multi-perspective facilities
               of annotation tools, in particular when considering virtual reality
               (VR) and augmented reality (AR) applications, their potential
               ubiquitous use, and the exploitation of externally trained natural
               language pre-processing methods. Such developments potentially
               lead to a dynamic and exploratory heuristic construction of the
               annotation process. With TextAnnotator an annotation suite is
               introduced which focuses on multi-mediality and multi-perspectivity
               with an interoperable set of task-specific annotation modules
               (e.g., for word classification, rhetorical structures, dependency
               trees, semantic roles, and more) and their linkage to VR and mobile
               implementations. The basic architecture and usage of TextAnnotator
               is described and related to the above mentioned shifts in the
               field.}
}

Christine Driller, Markus Koch, Giuseppe Abrami, Wahed Hemati, Andy Lücking, Alexander Mehler, Adrian Pachzelt and Gerwin Kasperek. 2020. Fast and Easy Access to Central European Biodiversity Data with BIOfid. Biodiversity Information Science and Standards, 4:e59157.

BibTeX

@article{Driller:et:al:2020,
  author    = {Christine Driller and Markus Koch and Giuseppe Abrami and Wahed Hemati
               and Andy Lücking and Alexander Mehler and Adrian Pachzelt and Gerwin Kasperek},
  title     = {Fast and Easy Access to Central European Biodiversity Data with BIOfid},
  volume    = {4},
  number    = {},
  year      = {2020},
  doi       = {10.3897/biss.4.59157},
  publisher = {Pensoft Publishers},
  abstract  = {The storage of data in public repositories such as the Global
               Biodiversity Information Facility (GBIF) or the National Center
               for Biotechnology Information (NCBI) is nowadays stipulated in
               the policies of many publishers in order to facilitate data replication
               or proliferation. Species occurrence records contained in legacy
               printed literature are no exception to this. The extent of their
               digital and machine-readable availability, however, is still far
               from matching the existing data volume (Thessen and Parr 2014).
               But precisely these data are becoming more and more relevant to
               the investigation of ongoing loss of biodiversity. In order to
               extract species occurrence records at a larger scale from available
               publications, one has to apply specialised text mining tools.
               However, such tools are in short supply especially for scientific
               literature in the German language.The Specialised Information
               Service Biodiversity Research*1 BIOfid (Koch et al. 2017) aims
               at reducing this desideratum, inter alia, by preparing a searchable
               text corpus semantically enriched by a new kind of multi-label
               annotation. For this purpose, we feed manual annotations into
               automatic, machine-learning annotators. This mixture of automatic
               and manual methods is needed, because BIOfid approaches a new
               application area with respect to language (mainly German of the
               19th century), text type (biological reports), and linguistic
               focus (technical and everyday language).We will present current
               results of the performance of BIOfid’s semantic search engine
               and the application of independent natural language processing
               (NLP) tools. Most of these are freely available online, such as
               TextImager (Hemati et al. 2016). We will show how TextImager is
               tied into the BIOfid pipeline and how it is made scalable (e.g.
               extendible by further modules) and usable on different systems
               (docker containers).Further, we will provide a short introduction
               to generating machine-learning training data using TextAnnotator
               (Abrami et al. 2019) for multi-label annotation. Annotation reproducibility
               can be assessed by the implementation of inter-annotator agreement
               methods (Abrami et al. 2020). Beyond taxon recognition and entity
               linking, we place particular emphasis on location and time information.
               For this purpose, our annotation tag-set combines general categories
               and biology-specific categories (including taxonomic names) with
               location and time ontologies. The application of the annotation
               categories is regimented by annotation guidelines (Lücking et
               al. 2020). Within the next years, our work deliverable will be
               a semantically accessible and data-extractable text corpus of
               around two million pages. In this way, BIOfid is creating a new
               valuable resource that expands our knowledge of biodiversity and
               its determinants.},
  issn      = {},
  pages     = {e59157},
  url       = {https://doi.org/10.3897/biss.4.59157},
  eprint    = {https://doi.org/10.3897/biss.4.59157},
  journal   = {Biodiversity Information Science and Standards},
  keywords  = {biofid}
}

Giuseppe Abrami, Alexander Mehler and Manuel Stoeckel. 2020. TextAnnotator: A web-based annotation suite for texts. Proceedings of the Digital Humanities 2020.

BibTeX

@inproceedings{Abrami:Mehler:Stoeckel:2020,
  author    = {Abrami, Giuseppe and Mehler, Alexander and Stoeckel, Manuel},
  title     = {{TextAnnotator}: A web-based annotation suite for texts},
  booktitle = {Proceedings of the Digital Humanities 2020},
  series    = {DH 2020},
  location  = {Ottawa, Canada},
  year      = {2020},
  url       = {https://dh2020.adho.org/wp-content/uploads/2020/07/547_TextAnnotatorAwebbasedannotationsuitefortexts.html},
  doi       = {http://dx.doi.org/10.17613/tenm-4907},
  abstract  = {The TextAnnotator is a tool for simultaneous and collaborative
               annotation of texts with visual annotation support, integration
               of knowledge bases and, by pipelining the TextImager, a rich variety
               of pre-processing and automatic annotation tools. It includes
               a variety of modules for the annotation of texts, which contains
               the annotation of argumentative, rhetorical, propositional and
               temporal structures as well as a module for named entity linking
               and rapid annotation of named entities. Especially the modules
               for annotation of temporal, argumentative and propositional structures
               are currently unique in web-based annotation tools. The TextAnnotator,
               which allows the annotation of texts as a platform, is divided
               into a front- and a backend component. The backend is a web service
               based on WebSockets, which integrates the UIMA Database Interface
               to manage and use texts. Texts are made accessible by using the
               ResourceManager and the AuthorityManager, based on user and group
               access permissions. Different views of a document can be created
               and used depending on the scenario. Once a document has been opened,
               access is gained to the annotations stored within annotation views
               in which these are organized. Any annotation view can be assigned
               with access permissions and by default, each user obtains his
               or her own user view for every annotated document. In addition,
               with sufficient access permissions, all annotation views can also
               be used and curated. This allows the possibility to calculate
               an Inter-Annotator-Agreement for a document, which shows an agreement
               between the annotators. Annotators without sufficient rights cannot
               display this value so that the annotators do not influence each
               other. This contribution is intended to reflect the current state
               of development of TextAnnotator, demonstrate the possibilities
               of an instantaneous Inter-Annotator-Agreement and trigger a discussion
               about further functions for the community.},
  keywords  = {textannotator, biofid},
  poster    = {https://hcommons.org/deposits/download/hc:31816/CONTENT/dh2020_textannotator_poster.pdf}
}

Giuseppe Abrami, Manuel Stoeckel and Alexander Mehler. 2020. TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts. Proceedings of The 12th Language Resources and Evaluation Conference, 891–900.

BibTeX

@inproceedings{Abrami:Stoeckel:Mehler:2020,
  author    = {Abrami, Giuseppe and Stoeckel, Manuel and Mehler, Alexander},
  title     = {TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative
               Annotation of Texts},
  booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
  year      = {2020},
  address   = {Marseille, France},
  publisher = {European Language Resources Association},
  pages     = {891--900},
  isbn      = {979-10-95546-34-4},
  abstract  = {The annotation of texts and other material in the field of digital
               humanities and Natural Language Processing (NLP) is a common task
               of research projects. At the same time, the annotation of corpora
               is certainly the most time- and cost-intensive component in research
               projects and often requires a high level of expertise according
               to the research interest. However, for the annotation of texts,
               a wide range of tools is available, both for automatic and manual
               annotation. Since the automatic pre-processing methods are not
               error-free and there is an increasing demand for the generation
               of training data, also with regard to machine learning, suitable
               annotation tools are required. This paper defines criteria of
               flexibility and efficiency of complex annotations for the assessment
               of existing annotation tools. To extend this list of tools, the
               paper describes TextAnnotator, a browser-based, multi-annotation
               system, which has been developed to perform platform-independent
               multimodal annotations and annotate complex textual structures.
               The paper illustrates the current state of development of TextAnnotator
               and demonstrates its ability to evaluate annotation quality (inter-annotator
               agreement) at runtime. In addition, it will be shown how annotations
               of different users can be performed simultaneously and collaboratively
               on the same document from different platforms using UIMA as the
               basis for annotation.},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.112},
  keywords  = {textannotator, biofid},
  pdf       = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.112.pdf}
}

Sajawel Ahmed, Manuel Stoeckel, Christine Driller, Adrian Pachzelt and Alexander Mehler. 2019. BIOfid Dataset: Publishing a German Gold Standard for Named Entity Recognition in Historical Biodiversity Literature. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), 871–880.

BibTeX

@inproceedings{Ahmed:Stoeckel:Driller:Pachzelt:Mehler:2019,
  author    = {Sajawel Ahmed and Manuel Stoeckel and Christine Driller and Adrian Pachzelt
               and Alexander Mehler},
  title     = {{BIOfid Dataset: Publishing a German Gold Standard for Named Entity
               Recognition in Historical Biodiversity Literature}},
  publisher = {Association for Computational Linguistics},
  year      = {2019},
  booktitle = {Proceedings of the 23rd Conference on Computational Natural Language
               Learning (CoNLL)},
  address   = {Hong Kong, China},
  url       = {https://www.aclweb.org/anthology/K19-1081},
  doi       = {10.18653/v1/K19-1081},
  pages     = {871--880},
  abstract  = {The Specialized Information Service Biodiversity Research (BIOfid)
               has been launched to mobilize valuable biological data from printed
               literature hidden in German libraries for over the past 250 years.
               In this project, we annotate German texts converted by OCR from
               historical scientific literature on the biodiversity of plants,
               birds, moths and butterflies. Our work enables the automatic extraction
               of biological information previously buried in the mass of papers
               and volumes. For this purpose, we generated training data for
               the tasks of Named Entity Recognition (NER) and Taxa Recognition
               (TR) in biological documents. We use this data to train a number
               of leading machine learning tools and create a gold standard for
               TR in biodiversity literature. More specifically, we perform a
               practical analysis of our newly generated BIOfid dataset through
               various downstream-task evaluations and establish a new state
               of the art for TR with 80.23{\%} F-score. In this sense, our paper
               lays the foundations for future work in the field of information
               extraction in biology texts.},
  keywords  = {biofid}
}

Giuseppe Abrami, Alexander Mehler, Andy Lücking, Elias Rieb and Philipp Helfrich. May, 2019. TextAnnotator: A flexible framework for semantic annotations. Proceedings of the Fifteenth Joint ACL - ISO Workshop on Interoperable Semantic Annotation, (ISA-15).

BibTeX

@inproceedings{Abrami:et:al:2019,
  author    = {Abrami, Giuseppe and Mehler, Alexander and Lücking, Andy and Rieb, Elias
               and Helfrich, Philipp},
  title     = {{TextAnnotator}: A flexible framework for semantic annotations},
  booktitle = {Proceedings of the Fifteenth Joint ACL - ISO Workshop on Interoperable
               Semantic Annotation, (ISA-15)},
  series    = {ISA-15},
  location  = {Gothenburg, Sweden},
  month     = {May},
  pdf       = {https://www.texttechnologylab.org/wp-content/uploads/2019/04/TextAnnotator_IWCS_Göteborg.pdf},
  year      = {2019},
  keywords  = {textannotator, biofid},
  abstract  = {Modern annotation tools should meet at least the following general
               requirements: they can handle diverse data and annotation levels
               within one tool, and they support the annotation process with
               automatic (pre-)processing outcomes as much as possible. We developed
               a framework that meets these general requirements and that enables
               versatile and browser-based annotations of texts, the TextAnnotator.
               It combines NLP methods of pre-processing with methods of flexible
               post-processing. Infact, machine learning (ML) requires a lot
               of training and test data, but is usually far from achieving perfect
               results. Producing high-level annotations for ML and post-correcting
               its results are therefore necessary. This is the purpose of TextAnnotator,
               which is entirely implemented in ExtJS and provides a range of
               interactive visualizations of annotations. In addition, it allows
               for flexibly integrating knowledge resources, e.g. in the course
               of post-processing named entity recognition. The paper describes
               TextAnnotator’s architecture together with three use cases: annotating
               temporal structures, argument structures and named entity linking.}
}

Christine Driller, Markus Koch, Marco Schmidt, Claus Weiland, Thomas Hörnschemeyer, Thomas Hickler, Giuseppe Abrami, Sajawel Ahmed, Rüdiger Gleim, Wahed Hemati, Tolga Uslu, Alexander Mehler, Adrian Pachzelt, Jashar Rexhepi, Thomas Risse, Janina Schuster, Gerwin Kasperek and Angela Hausinger. 2018. Workflow and Current Achievements of BIOfid, an Information Service Mobilizing Biodiversity Data from Literature Sources. Biodiversity Information Science and Standards, 2:e25876.

BibTeX

@article{Driller:et:al:2018,
  author    = {Christine Driller and Markus Koch and Marco Schmidt and Claus Weiland
               and Thomas Hörnschemeyer and Thomas Hickler and Giuseppe Abrami and Sajawel Ahmed
               and Rüdiger Gleim and Wahed Hemati and Tolga Uslu and Alexander Mehler
               and Adrian Pachzelt and Jashar Rexhepi and Thomas Risse and Janina Schuster
               and Gerwin Kasperek and Angela Hausinger},
  title     = {Workflow and Current Achievements of BIOfid, an Information Service
               Mobilizing Biodiversity Data from Literature Sources},
  volume    = {2},
  number    = {},
  year      = {2018},
  doi       = {10.3897/biss.2.25876},
  publisher = {Pensoft Publishers},
  abstract  = {BIOfid is a specialized information service currently being developed
               to mobilize biodiversity data dormant in printed historical and
               modern literature and to offer a platform for open access journals
               on the science of biodiversity. Our team of librarians, computer
               scientists and biologists produce high-quality text digitizations,
               develop new text-mining tools and generate detailed ontologies
               enabling semantic text analysis and semantic search by means of
               user-specific queries. In a pilot project we focus on German publications
               on the distribution and ecology of vascular plants, birds, moths
               and butterflies extending back to the Linnaeus period about 250
               years ago. The three organism groups have been selected according
               to current demands of the relevant research community in Germany.
               The text corpus defined for this purpose comprises over 400 volumes
               with more than 100,000 pages to be digitized and will be complemented
               by journals from other digitization projects, copyright-free and
               project-related literature. With TextImager (Natural Language
               Processing & Text Visualization) and TextAnnotator (Discourse
               Semantic Annotation) we have already extended and launched tools
               that focus on the text-analytical section of our project. Furthermore,
               taxonomic and anatomical ontologies elaborated by us for the taxa
               prioritized by the project’s target group - German institutions
               and scientists active in biodiversity research - are constantly
               improved and expanded to maximize scientific data output. Our
               poster describes the general workflow of our project ranging from
               literature acquisition via software development, to data availability
               on the BIOfid web portal (http://biofid.de/), and the implementation
               into existing platforms which serve to promote global accessibility
               of biodiversity data.},
  issn      = {},
  pages     = {e25876},
  url       = {https://doi.org/10.3897/biss.2.25876},
  eprint    = {https://doi.org/10.3897/biss.2.25876},
  journal   = {Biodiversity Information Science and Standards},
  keywords  = {biofid}
}