Workshop: Tracing back biodiversity to the 19th century via text mining
Tuesday, 25 February 2020, 10.30 – 12.30 h
Congress Centre Davos, Davos / Swiss
Markus Koch, Senckenberg Gesellschaft für Naturforschung
Christine Driller, Senckenberg Gesellschaft für Naturforschung
Giuseppe Abrami, Text Technology Lab, Goethe University Frankfurt
Manuel Stoeckel, Text Technology Lab, Goethe University Frankfurt
Gerwin Kasperek, University Library J.C. Senckenberg
Climate change and biodiversity loss are among the major challenges of our time. Investigating the causes and causal relationships are therefore the focus of current research and policy making. The extraction of biodiversity information from historical data sources becomes more and more relevant in this context in order to capture the baseline of the pre-agroindustrial time. The Specialised Information Service for Biodiversity Research (BIOfid) taps into this growing demand by improving the accessibility of legacy literature relevant to biodiversity research and by developing reusable text mining tools for data extraction. BIOfid originally targets the Central European literature on the distribution and ecology of vascular plants, birds, as well as moths and butterflies, but the tools and software developed in this project are basically applicable to literature of any geographic area and taxonomic focus. In framework of this workshop we want to share our technological developments and to discuss researchers ́ requirements and expectations for this type of information service.
The focus of our training considers the following topics:
- Introduction to the BIOfid web portal, i.e. accessing literature and extracting data from historical texts through a visual interface.
- Use of state-of-the-art and easy-to-use Natural Language Processing (NLP) tools, e. g. deep learning of text content.
- Developing customer workflows from source materials to processable texts and data output.
We especially aim at improving skills in dealing with challenges associated with data quality, natural language processing, as well as information and semantic relation extraction. For this purpose, the participants will analyse samples of our text corpus to extract data linked to established ontologies and knowledge bases. Participants are furthermore invited to submit text samples to demonstrate the analysis procedure with their own target literature.