née Stoeckel

Scientific Assistant
Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401b
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:
Office Hour: Wednesday, 10-12 AM
Projects

The specialised information service BIOfid (www.biofid.de) is oriented towards the special needs of scientists researching biodiversity topics at research institutions and in natural history collections. Since 2017, BIOfid has been building an infrastructure that contributes to the provision and mobilisation of research-relevant data in a variety of ways in the context of current developments in biodiversity research.
BIOfid Publications
2021.
Multiple Annotation for Biodiversity: Developing an annotation
framework among biology, linguistics and text technology. Language Resources and Evaluation.
BibTeX
@article{Luecking:et:al:2021,
author = {Andy Lücking and Christine Driller and Manuel Stoeckel and Giuseppe Abrami
and Adrian Pachzelt and Alexander Mehler},
year = {2021},
journal = {Language Resources and Evaluation},
title = {Multiple Annotation for Biodiversity: Developing an annotation
framework among biology, linguistics and text technology},
editor = {Nancy Ide and Nicoletta Calzolari},
doi = {10.1007/s10579-021-09553-5},
pdf = {https://link.springer.com/content/pdf/10.1007/s10579-021-09553-5.pdf},
keywords = {biofid}
}
2019.
BIOfid Dataset: Publishing a German Gold Standard for Named Entity
Recognition in Historical Biodiversity Literature. Proceedings of the 23rd Conference on Computational Natural Language
Learning (CoNLL), 871–880.
BibTeX
@inproceedings{Ahmed:Stoeckel:Driller:Pachzelt:Mehler:2019,
author = {Sajawel Ahmed and Manuel Stoeckel and Christine Driller and Adrian Pachzelt
and Alexander Mehler},
title = {{BIOfid Dataset: Publishing a German Gold Standard for Named Entity
Recognition in Historical Biodiversity Literature}},
publisher = {Association for Computational Linguistics},
year = {2019},
booktitle = {Proceedings of the 23rd Conference on Computational Natural Language
Learning (CoNLL)},
address = {Hong Kong, China},
url = {https://www.aclweb.org/anthology/K19-1081},
doi = {10.18653/v1/K19-1081},
pages = {871--880},
abstract = {The Specialized Information Service Biodiversity Research (BIOfid)
has been launched to mobilize valuable biological data from printed
literature hidden in German libraries for over the past 250 years.
In this project, we annotate German texts converted by OCR from
historical scientific literature on the biodiversity of plants,
birds, moths and butterflies. Our work enables the automatic extraction
of biological information previously buried in the mass of papers
and volumes. For this purpose, we generated training data for
the tasks of Named Entity Recognition (NER) and Taxa Recognition
(TR) in biological documents. We use this data to train a number
of leading machine learning tools and create a gold standard for
TR in biodiversity literature. More specifically, we perform a
practical analysis of our newly generated BIOfid dataset through
various downstream-task evaluations and establish a new state
of the art for TR with 80.23{\%} F-score. In this sense, our paper
lays the foundations for future work in the field of information
extraction in biology texts.},
keywords = {biofid}
}
Teaching
Courses
None, 2025
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
Winter Semester, 2024
Summer Semester, 2024
Winter Semester, 2023
Summer Semester, 2023
Student Research Topics
deep learning
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
ethics
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
jailbreak
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
large language models
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
llms
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
safety
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing
LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
Corresponding Lab Member:
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.
Related Work
- Arditi et al., 2024, Refusal in Language Models Is Mediated by a Single Direction
- Mazeika et al., 2024, HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
- Zou et al., 2023, Universal and Transferable Adversarial Attacks on Aligned Language Models
- Chao et al., 2023, Jailbreaking Black Box Large Language Models in Twenty Queries
Corresponding Lab Member:
Publications
Total: 9
2024
2024.
Towards New Data Spaces for the Study of Multiple Documents with
Va.Si.Li-Lab: A Conceptual Analysis. In: Students', Graduates' and Young Professionals' Critical Use of
Online Information: Digital Performance Assessment and Training
within and across Domains, 259–303.
Springer Nature Switzerland.
BibTeX
@inbook{Mehler:et:al:2024:a,
author = {Mehler, Alexander and Bagci, Mevl{\"u}t and Schrottenbacher, Patrick
and Henlein, Alexander and Konca, Maxim and Abrami, Giuseppe and B{\"o}nisch, Kevin
and Stoeckel, Manuel and Spiekermann, Christian and Engel, Juliane},
editor = {Zlatkin-Troitschanskaia, Olga and Nagel, Marie-Theres and Klose, Verena
and Mehler, Alexander},
title = {Towards New Data Spaces for the Study of Multiple Documents with
Va.Si.Li-Lab: A Conceptual Analysis},
booktitle = {Students', Graduates' and Young Professionals' Critical Use of
Online Information: Digital Performance Assessment and Training
within and across Domains},
year = {2024},
publisher = {Springer Nature Switzerland},
address = {Cham},
pages = {259--303},
abstract = {The constitution of multiple documents has so far been studied
essentially as a process in which a single learner consults a
number (of segments) of different documents in the context of
the task at hand in order to construct a mental model for the
purpose of completing the task. As a result of this research focus,
the constitution of multiple documents appears predominantly as
a monomodal, non-interactive process in which mainly textual units
are studied, supplemented by images, text-image relations and
comparable artifacts. This approach is reflected in the contextual
fixity of the research design, in which the learners under study
search for information using suitably equipped computers. If,
on the other hand, we consider the openness of multi-agent learning
situations, this scenario lacks the aspects of interactivity,
contextual openness and, above all, the multimodality of information
objects, information processing and information exchange. This
is where the chapter comes in. It describes Va.Si.Li-Lab as an
instrument for multimodal measurement for studying and modeling
multiple documents in the context of interactive learning in a
multi-agent environment. To this end, the chapter places Va.Si.Li-Lab
in the spectrum of evolutionary approaches that vary the combination
of human and machine innovation and selection. It also combines
the requirements of multimodal representational learning with
various aspects of contextual plasticity to prepare Va.Si.Li-Lab
as a system that can be used for experimental research. The chapter
is conceptual in nature, designing a system of requirements using
the example of Va.Si.Li-Lab to outline an experimental environment
in which the study of Critical Online Reasoning (COR) as a group
process becomes possible. Although the chapter illustrates some
of these requirements with realistic data from the field of simulation-based
learning, the focus is still conceptual rather than experimental,
hypothesis-driven. That is, the chapter is concerned with the
design of a technology for future research into COR processes.},
isbn = {978-3-031-69510-0},
doi = {10.1007/978-3-031-69510-0_12},
url = {https://doi.org/10.1007/978-3-031-69510-0_12}
}
2024.
HyperCausal: Visualizing Causal Inference in 3D Hypertext. Proceedings of the 35th ACM Conference on Hypertext and Social Media, 330––336.
BibTeX
@inproceedings{Boenisch:et:al:2024,
author = {B\"{o}nisch, Kevin and Stoeckel, Manuel and Mehler, Alexander},
title = {HyperCausal: Visualizing Causal Inference in 3D Hypertext},
year = {2024},
isbn = {9798400705953},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3648188.3677049},
doi = {10.1145/3648188.3677049},
abstract = {We present HyperCausal, a 3D hypertext visualization framework
for exploring causal inference in generative Large Language Models
(LLMs). HyperCausal maps the generative processes of LLMs into
spatial hypertexts, where tokens are represented as nodes connected
by probability-weighted edges. The edges are weighted by the prediction
scores of next tokens, depending on the underlying language model.
HyperCausal facilitates navigation through the causal space of
the underlying LLM, allowing users to explore predicted word sequences
and their branching. Through comparative analysis of LLM parameters
such as token probabilities and search algorithms, HyperCausal
provides insight into model behavior and performance. Implemented
using the Hugging Face transformers library and Three.js, HyperCausal
ensures cross-platform accessibility to advance research in natural
language processing using concepts from hypertext research. We
demonstrate several use cases of HyperCausal and highlight the
potential for detecting hallucinations generated by LLMs using
this framework. The connection with hypertext research arises
from the fact that HyperCausal relies on user interaction to unfold
graphs with hierarchically appearing branching alternatives in
3D space. This approach refers to spatial hypertexts and early
concepts of hierarchical hypertext structures. A third connection
concerns hypertext fiction, since the branching alternatives mediated
by HyperCausal manifest non-linearly organized reading threads
along artificially generated texts that the user decides to follow
optionally depending on the reading context.},
booktitle = {Proceedings of the 35th ACM Conference on Hypertext and Social Media},
pages = {330–-336},
numpages = {7},
keywords = {3D hypertext, large language models, visualization},
location = {Poznan, Poland},
series = {HT '24},
video = {https://www.youtube.com/watch?v=ANHFTupnKhI}
}
2022
2022.
I still have Time(s): Extending HeidelTime for German Texts. Proceedings of the 13th Language Resources and Evaluation Conference.
BibTeX
@inproceedings{Luecking:Stoeckel:Abrami:Mehler:2022,
author = {L{\"u}cking, Andy and Stoeckel, Manuel and Abrami, Giuseppe and Mehler, Alexander},
title = {I still have Time(s): Extending {HeidelTime} for {German} Texts},
booktitle = {Proceedings of the 13th Language Resources and Evaluation Conference},
series = {LREC 2022},
location = {Marseille, France},
year = {2022},
url = {https://aclanthology.org/2022.lrec-1.505},
pdf = {https://aclanthology.org/2022.lrec-1.505.pdf}
}
2021
2021.
Multiple Annotation for Biodiversity: Developing an annotation
framework among biology, linguistics and text technology. Language Resources and Evaluation.
BibTeX
@article{Luecking:et:al:2021,
author = {Andy Lücking and Christine Driller and Manuel Stoeckel and Giuseppe Abrami
and Adrian Pachzelt and Alexander Mehler},
year = {2021},
journal = {Language Resources and Evaluation},
title = {Multiple Annotation for Biodiversity: Developing an annotation
framework among biology, linguistics and text technology},
editor = {Nancy Ide and Nicoletta Calzolari},
doi = {10.1007/s10579-021-09553-5},
pdf = {https://link.springer.com/content/pdf/10.1007/s10579-021-09553-5.pdf},
keywords = {biofid}
}
2020
2020.
TextAnnotator: A web-based annotation suite for texts. Proceedings of the Digital Humanities 2020.
BibTeX
@inproceedings{Abrami:Mehler:Stoeckel:2020,
author = {Abrami, Giuseppe and Mehler, Alexander and Stoeckel, Manuel},
title = {{TextAnnotator}: A web-based annotation suite for texts},
booktitle = {Proceedings of the Digital Humanities 2020},
series = {DH 2020},
location = {Ottawa, Canada},
year = {2020},
url = {https://dh2020.adho.org/wp-content/uploads/2020/07/547_TextAnnotatorAwebbasedannotationsuitefortexts.html},
doi = {http://dx.doi.org/10.17613/tenm-4907},
abstract = {The TextAnnotator is a tool for simultaneous and collaborative
annotation of texts with visual annotation support, integration
of knowledge bases and, by pipelining the TextImager, a rich variety
of pre-processing and automatic annotation tools. It includes
a variety of modules for the annotation of texts, which contains
the annotation of argumentative, rhetorical, propositional and
temporal structures as well as a module for named entity linking
and rapid annotation of named entities. Especially the modules
for annotation of temporal, argumentative and propositional structures
are currently unique in web-based annotation tools. The TextAnnotator,
which allows the annotation of texts as a platform, is divided
into a front- and a backend component. The backend is a web service
based on WebSockets, which integrates the UIMA Database Interface
to manage and use texts. Texts are made accessible by using the
ResourceManager and the AuthorityManager, based on user and group
access permissions. Different views of a document can be created
and used depending on the scenario. Once a document has been opened,
access is gained to the annotations stored within annotation views
in which these are organized. Any annotation view can be assigned
with access permissions and by default, each user obtains his
or her own user view for every annotated document. In addition,
with sufficient access permissions, all annotation views can also
be used and curated. This allows the possibility to calculate
an Inter-Annotator-Agreement for a document, which shows an agreement
between the annotators. Annotators without sufficient rights cannot
display this value so that the annotators do not influence each
other. This contribution is intended to reflect the current state
of development of TextAnnotator, demonstrate the possibilities
of an instantaneous Inter-Annotator-Agreement and trigger a discussion
about further functions for the community.},
keywords = {textannotator},
poster = {https://hcommons.org/deposits/download/hc:31816/CONTENT/dh2020_textannotator_poster.pdf}
}
2020.
TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative
Annotation of Texts. Proceedings of The 12th Language Resources and Evaluation Conference, 891–900.
BibTeX
@inproceedings{Abrami:Stoeckel:Mehler:2020,
author = {Abrami, Giuseppe and Stoeckel, Manuel and Mehler, Alexander},
title = {TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative
Annotation of Texts},
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {891--900},
isbn = {979-10-95546-34-4},
abstract = {The annotation of texts and other material in the field of digital
humanities and Natural Language Processing (NLP) is a common task
of research projects. At the same time, the annotation of corpora
is certainly the most time- and cost-intensive component in research
projects and often requires a high level of expertise according
to the research interest. However, for the annotation of texts,
a wide range of tools is available, both for automatic and manual
annotation. Since the automatic pre-processing methods are not
error-free and there is an increasing demand for the generation
of training data, also with regard to machine learning, suitable
annotation tools are required. This paper defines criteria of
flexibility and efficiency of complex annotations for the assessment
of existing annotation tools. To extend this list of tools, the
paper describes TextAnnotator, a browser-based, multi-annotation
system, which has been developed to perform platform-independent
multimodal annotations and annotate complex textual structures.
The paper illustrates the current state of development of TextAnnotator
and demonstrates its ability to evaluate annotation quality (inter-annotator
agreement) at runtime. In addition, it will be shown how annotations
of different users can be performed simultaneously and collaboratively
on the same document from different platforms using UIMA as the
basis for annotation.},
url = {https://www.aclweb.org/anthology/2020.lrec-1.112},
keywords = {textannotator},
pdf = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.112.pdf}
}
May, 2020.
Voting for POS tagging of Latin texts: Using the flair of FLAIR
to better Ensemble Classifiers by Example of Latin. Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies
for Historical and Ancient Languages, 130–135.
BibTeX
@inproceedings{Stoeckel:et:al:2020,
author = {Stoeckel, Manuel and Henlein, Alexander and Hemati, Wahed and Mehler, Alexander},
title = {{Voting for POS tagging of Latin texts: Using the flair of FLAIR
to better Ensemble Classifiers by Example of Latin}},
booktitle = {Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies
for Historical and Ancient Languages},
month = {May},
year = {2020},
address = {Marseille, France},
publisher = {European Language Resources Association (ELRA)},
pages = {130--135},
abstract = {Despite the great importance of the Latin language in the past,
there are relatively few resources available today to develop
modern NLP tools for this language. Therefore, the EvaLatin Shared
Task for Lemmatization and Part-of-Speech (POS) tagging was published
in the LT4HALA workshop. In our work, we dealt with the second
EvaLatin task, that is, POS tagging. Since most of the available
Latin word embeddings were trained on either few or inaccurate
data, we trained several embeddings on better data in the first
step. Based on these embeddings, we trained several state-of-the-art
taggers and used them as input for an ensemble classifier called
LSTMVoter. We were able to achieve the best results for both the
cross-genre and the cross-time task (90.64\% and 87.00\%) without
using additional annotated data (closed modality). In the meantime,
we further improved the system and achieved even better results
(96.91\% on classical, 90.87\% on cross-genre and 87.35\% on cross-time).},
url = {https://www.aclweb.org/anthology/2020.lt4hala-1.21},
pdf = {http://www.lrec-conf.org/proceedings/lrec2020/workshops/LT4HALA/pdf/2020.lt4hala-1.21.pdf}
}
2019
November, 2019.
When Specialization Helps: Using Pooled Contextualized Embeddings
to Detect Chemical and Biomedical Entities in Spanish. Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, 11–15.
BibTeX
@inproceedings{Stoeckel:Hemati:Mehler:2019,
title = {When Specialization Helps: Using Pooled Contextualized Embeddings
to Detect Chemical and Biomedical Entities in {S}panish},
author = {Stoeckel, Manuel and Hemati, Wahed and Mehler, Alexander},
booktitle = {Proceedings of The 5th Workshop on BioNLP Open Shared Tasks},
month = {nov},
year = {2019},
address = {Hong Kong, China},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/D19-5702},
doi = {10.18653/v1/D19-5702},
pages = {11--15},
abstract = {The recognition of pharmacological substances, compounds and proteins
is an essential preliminary work for the recognition of relations
between chemicals and other biomedically relevant units. In this
paper, we describe an approach to Task 1 of the PharmaCoNER Challenge,
which involves the recognition of mentions of chemicals and drugs
in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF
sequence tagger with stacked Pooled Contextualized Embeddings,
word and sub-word embeddings using the open-source framework FLAIR.
We present a new corpus composed of articles and papers from Spanish
health science journals, termed the Spanish Health Corpus, and
use it to train domain-specific embeddings which we incorporate
in our model training. We achieve a result of 89.76{\%} F1-score
using pre-trained embeddings and are able to improve these results
to 90.52{\%} F1-score using specialized embeddings.}
}
2019.
BIOfid Dataset: Publishing a German Gold Standard for Named Entity
Recognition in Historical Biodiversity Literature. Proceedings of the 23rd Conference on Computational Natural Language
Learning (CoNLL), 871–880.
BibTeX
@inproceedings{Ahmed:Stoeckel:Driller:Pachzelt:Mehler:2019,
author = {Sajawel Ahmed and Manuel Stoeckel and Christine Driller and Adrian Pachzelt
and Alexander Mehler},
title = {{BIOfid Dataset: Publishing a German Gold Standard for Named Entity
Recognition in Historical Biodiversity Literature}},
publisher = {Association for Computational Linguistics},
year = {2019},
booktitle = {Proceedings of the 23rd Conference on Computational Natural Language
Learning (CoNLL)},
address = {Hong Kong, China},
url = {https://www.aclweb.org/anthology/K19-1081},
doi = {10.18653/v1/K19-1081},
pages = {871--880},
abstract = {The Specialized Information Service Biodiversity Research (BIOfid)
has been launched to mobilize valuable biological data from printed
literature hidden in German libraries for over the past 250 years.
In this project, we annotate German texts converted by OCR from
historical scientific literature on the biodiversity of plants,
birds, moths and butterflies. Our work enables the automatic extraction
of biological information previously buried in the mass of papers
and volumes. For this purpose, we generated training data for
the tasks of Named Entity Recognition (NER) and Taxa Recognition
(TR) in biological documents. We use this data to train a number
of leading machine learning tools and create a gold standard for
TR in biodiversity literature. More specifically, we perform a
practical analysis of our newly generated BIOfid dataset through
various downstream-task evaluations and establish a new state
of the art for TR with 80.23{\%} F-score. In this sense, our paper
lays the foundations for future work in the field of information
extraction in biology texts.},
keywords = {biofid}
}