Workshop: Accessing knowledge from legacy biodiversity literature
Wednesday, 23 October 2019, 13:30 – 15:00 h
Stadsgehoorzaal Breestraat 60, Room:WAALSE KERK, Leiden / Netherlands
Christine Driller, Senckenberg Gesellschaft für Naturforschung
Giuseppe Abrami, Text Technology Lab, Goethe University Frankfurt
Gerwin Kasperek, University Library J.C. Senckenberg
Alexander Mehler, Text Technology Lab, Goethe University Frankfurt
The workshop aims to share the technological developments from the Specialised Information Service for Biodiversity Research (BIOfid). The BIOfid team will introduce the participants to the easy accessibility and fast exploitation of data trapped within legacy biodiversity literature. Furthermore, we want to foster the dialogue between the participants and our team to feed back researchers’ demands and requirements into the further development of the BIOfid tools.
The workshop addresses scientists working on all data-intensive aspects of biodiversity research and comprises three sections:
- An introduction into the BIOfid web portal enabling fast and easy access to literature, facts, and concepts extracted from historical texts through a visual interface.
- Participants will utilise state-of-the-art, easy-to-use Natural Language Processing (NLP) tools, e.g. deep learning of text content. We will analyse large text corpora automatically to extract knowledge and to link it to established ontologies and knowledge bases. Participants are invited to bring a selection of own texts to explore them with our methods.
- The BIOfid team supports the participants in establishing custom workflows in order to perform all stages from source materials to processable texts and thus to achieve the best results through the BIOfid methods.
In all sections, the participants will learn how the BIOfid team overcame diverse challenges in regard to data quality, text recognition, information extraction and linking.
Making knowledge and data from legacy biodiversity literature available is the main goal of BIOfid. Hence, we gather the expertise of biologists and computer scientists to give biodiversity researchers a gateway into the data of historical biodiversity literature and to supply them with high-quality tools for text mining. The current focus of the project is Central European literature about three taxonomic groups: vascular plants, birds, as well as moths and butterflies.