Manuel Stoeckel

Manuel Stoeckel, M.Sc. Computer Science

Manuel Stoeckel
M.Sc. Computer Science

Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401a
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone: +49 69-798-24661
Fax: +49 69-798-28931
Office Hour  Tuesday, 10-12 AM

ContactPublications

Total: 7

2022 (1)

  • [PDF] A. Lücking, M. Stoeckel, G. Abrami, and A. Mehler, “I still have Time(s): Extending HeidelTime for German Texts,” in Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 2022, pp. 4723-4728.
    [Abstract] [Poster][BibTeX]

    HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime’s pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeExt. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 \% or 8.5 \%, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeExt, its evaluation on text samples from various genres, and share some linguistic observations. HeidelTimeExt can be obtained from https://github.com/texttechnologylab/heideltime.
    @InProceedings{Luecking:Stoeckel:Abrami:Mehler:2022,
      Author         = {L\"{u}cking, Andy and Stoeckel, Manuel and Abrami, Giuseppe and Mehler, Alexander},
      title     = {I still have Time(s): Extending HeidelTime for German Texts},
      booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
      month          = {June},
      year           = {2022},
      address        = {Marseille, France},
      publisher      = {European Language Resources Association},
      pages     = {4723--4728},
      abstract  = {HeidelTime is one of the most widespread and successful tools for detecting temporal expressions in texts. Since HeidelTime’s pattern matching system is based on regular expression, it can be extended in a convenient way. We present such an extension for the German resources of HeidelTime: HeidelTimeExt. The extension has been brought about by means of observing false negatives within real world texts and various time banks. The gain in coverage is 2.7 \% or 8.5 \%, depending on the admitted degree of potential overgeneralization. We describe the development of HeidelTimeExt, its evaluation on text samples from various genres, and share some linguistic observations. HeidelTimeExt can be obtained from https://github.com/texttechnologylab/heideltime.},
      poster   = {https://www.texttechnologylab.org/wp-content/uploads/2022/06/HeidelTimeExt_LREC_2022.pdf},
      pdf    = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.505.pdf}
    }

2021 (1)

  • [PDF] [DOI] A. Lücking, C. Driller, M. Stoeckel, G. Abrami, A. Pachzelt, and A. Mehler, “Multiple Annotation for Biodiversity: Developing an annotation framework among biology, linguistics and text technology,” Language Resources and Evaluation, 2021.
    [BibTeX]

    @Article{Luecking:et:al:2021,
      author =     {Andy Lücking and Christine Driller and Manuel
                      Stoeckel and Giuseppe Abrami and Adrian Pachzelt and
                      Alexander Mehler},
      year =     {2021},
      journal = {Language Resources and Evaluation},
      title =     {Multiple Annotation for Biodiversity: Developing an annotation framework among biology,
    linguistics and text technology},
      editor =     {Nancy Ide and Nicoletta Calzolari},
      doi = {10.1007/s10579-021-09553-5},
      pdf = {https://link.springer.com/content/pdf/10.1007/s10579-021-09553-5.pdf}
    }

2020 (3)

  • [https://dh2020.adho.org/wp-content/uploads/2020/07/547_TextAnnotatorAwebbasedannotationsuitefortexts.html] [DOI] G. Abrami, A. Mehler, and M. Stoeckel, “TextAnnotator: A web-based annotation suite for texts,” in Proceedings of the Digital Humanities 2020, 2020.
    [Abstract] [Poster][BibTeX]

    The TextAnnotator is a tool for simultaneous and collaborative annotation of texts with visual annotation support, integration of knowledge bases and, by pipelining the TextImager, a rich variety of pre-processing and automatic annotation tools. It includes a variety of modules for the annotation of texts, which contains the annotation of argumentative, rhetorical, propositional and temporal structures as well as a module for named entity linking and rapid annotation of named entities. Especially the modules for annotation of temporal, argumentative and propositional structures are currently unique in web-based annotation tools. The TextAnnotator, which allows the annotation of texts as a platform, is divided into a front- and a backend component. The backend is a web service based on WebSockets, which integrates the UIMA Database Interface to manage and use texts. Texts are made accessible by using the ResourceManager and the AuthorityManager, based on user and group access permissions. Different views of a document can be created and used depending on the scenario. Once a document has been opened, access is gained to the annotations stored within annotation views in which these are organized. Any annotation view can be assigned with access permissions and by default, each user obtains his or her own user view for every annotated document. In addition, with sufficient access permissions, all annotation views can also be used and curated. This allows the possibility to calculate an Inter-Annotator-Agreement for a document, which shows an agreement between the annotators. Annotators without sufficient rights cannot display this value so that the annotators do not influence each other. This contribution is intended to reflect the current state of development of TextAnnotator, demonstrate the possibilities of an instantaneous Inter-Annotator-Agreement and trigger a discussion about further functions for the community.
    @InProceedings{Abrami:Mehler:Stoeckel:2020,
      author         = {Abrami, Giuseppe and Mehler, Alexander and Stoeckel, Manuel},
      title          = {{TextAnnotator}: A web-based annotation suite for texts},
      booktitle      = {Proceedings of the Digital Humanities 2020},
      series         = {DH 2020},
      location       = {Ottawa, Canada},
      year           = {2020},
      url            = {https://dh2020.adho.org/wp-content/uploads/2020/07/547_TextAnnotatorAwebbasedannotationsuitefortexts.html},
      doi     = {http://dx.doi.org/10.17613/tenm-4907},
      abstract    = {The TextAnnotator is a tool for simultaneous and collaborative annotation of texts with visual annotation support, integration of knowledge bases and, by pipelining the TextImager, a rich variety of pre-processing and automatic annotation tools. It includes a variety of modules for the annotation of texts, which contains the annotation of argumentative, rhetorical, propositional and temporal structures as well as a module for named entity linking and rapid annotation of named entities. Especially the modules for annotation of temporal, argumentative and propositional structures are currently unique in web-based annotation tools. The TextAnnotator, which allows the annotation of texts as a platform, is divided into a front- and a backend component. The backend is a web service based on WebSockets, which integrates the UIMA Database Interface to manage and use texts. Texts are made accessible by using the ResourceManager and the AuthorityManager, based on user and group access permissions. Different views of a document can be created and used depending on the scenario. Once a document has been opened, access is gained to the annotations stored within annotation views in which these are organized. Any annotation view can be assigned with access permissions and by default, each user obtains his or her own user view for every annotated document. In addition, with sufficient access permissions, all annotation views can also be used and curated. This allows the possibility to calculate an Inter-Annotator-Agreement for a document, which shows an agreement between the annotators. Annotators without sufficient rights cannot display this value so that the annotators do not influence each other. This contribution is intended to reflect the current state of development of TextAnnotator, demonstrate the possibilities of an instantaneous Inter-Annotator-Agreement and trigger a discussion about further functions for the community.},
     poster     = {https://hcommons.org/deposits/download/hc:31816/CONTENT/dh2020_textannotator_poster.pdf}
    }
  • [PDF] [https://www.aclweb.org/anthology/2020.lrec-1.112] G. Abrami, M. Stoeckel, and A. Mehler, “TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts,” in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2020, pp. 891-900.
    [Abstract] [BibTeX]

    The annotation of texts and other material in the field of digital humanities and Natural Language Processing (NLP) is a common task of research projects. At the same time, the annotation of corpora is certainly the most time- and cost-intensive component in research projects and often requires a high level of expertise according to the research interest. However, for the annotation of texts, a wide range of tools is available, both for automatic and manual annotation. Since the automatic pre-processing methods are not error-free and there is an increasing demand for the generation of training data, also with regard to machine learning, suitable annotation tools are required. This paper defines criteria of flexibility and efficiency of complex annotations for the assessment of existing annotation tools. To extend this list of tools, the paper describes TextAnnotator, a browser-based, multi-annotation system, which has been developed to perform platform-independent multimodal annotations and annotate complex textual structures. The paper illustrates the current state of development of TextAnnotator and demonstrates its ability to evaluate annotation quality (inter-annotator agreement) at runtime. In addition, it will be shown how annotations of different users can be performed simultaneously and collaboratively on the same document from different platforms using UIMA as the basis for annotation.
    @InProceedings{Abrami:Stoeckel:Mehler:2020,
      author    = {Abrami, Giuseppe  and  Stoeckel, Manuel  and  Mehler, Alexander},
      title     = {TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts},
      booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
      month     = {May},
      year      = {2020},
      address   = {Marseille, France},
      publisher = {European Language Resources Association},
      pages     = {891--900},
      ISBN = "979-10-95546-34-4",
      abstract  = {The annotation of texts and other material in the field of digital humanities and Natural Language Processing (NLP) is a common task of research projects. At the same time, the annotation of corpora is certainly the most time- and cost-intensive component in research projects and often requires a high level of expertise according to the research interest. However, for the annotation of texts, a wide range of tools is available, both for automatic and manual annotation. Since the automatic pre-processing methods are not error-free and there is an increasing demand for the generation of training data, also with regard to machine learning, suitable annotation tools are required. This paper defines criteria of flexibility and efficiency of complex annotations for the assessment of existing annotation tools. To extend this list of tools, the paper describes TextAnnotator, a browser-based, multi-annotation system, which has been developed to perform platform-independent multimodal annotations and annotate complex textual structures. The paper illustrates the current state of development of TextAnnotator and demonstrates its ability to evaluate annotation quality (inter-annotator agreement) at runtime. In addition, it will be shown how annotations of different users can be performed simultaneously and collaboratively on the same document from different platforms using UIMA as the basis for annotation.},
      url       = {https://www.aclweb.org/anthology/2020.lrec-1.112},
      pdf       = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.112.pdf}
    }
  • [PDF] [https://www.aclweb.org/anthology/2020.lt4hala-1.21] M. Stoeckel, A. Henlein, W. Hemati, and A. Mehler, “Voting for POS tagging of Latin texts: Using the flair of FLAIR to better Ensemble Classifiers by Example of Latin,” in Proceedings of LT4HALA 2020 – 1st Workshop on Language Technologies for Historical and Ancient Languages, Marseille, France, 2020, pp. 130-135.
    [Abstract] [BibTeX]

    Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64\% and 87.00\%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91\% on classical, 90.87\% on cross-genre and 87.35\% on cross-time).
    @InProceedings{Stoeckel:et:al:2020,
      author    = {Stoeckel, Manuel and Henlein, Alexander and Hemati, Wahed and Mehler, Alexander},
      title     = {{Voting for POS tagging of Latin texts: Using the flair of FLAIR to better Ensemble Classifiers by Example of Latin}},
      booktitle      = {Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages},
      month          = {May},
      year           = {2020},
      address        = {Marseille, France},
      publisher      = {European Language Resources Association (ELRA)},
      pages     = {130--135},
      abstract  = {Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64\% and 87.00\%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91\% on classical, 90.87\% on cross-genre and 87.35\% on cross-time).},
      url       = {https://www.aclweb.org/anthology/2020.lt4hala-1.21},
      pdf       = {http://www.lrec-conf.org/proceedings/lrec2020/workshops/LT4HALA/pdf/2020.lt4hala-1.21.pdf}
    
    }

2019 (2)

  • [https://www.aclweb.org/anthology/D19-5702] [DOI] M. Stoeckel, W. Hemati, and A. Mehler, “When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in Spanish,” in Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, Hong Kong, China, 2019, pp. 11-15.
    [Abstract] [BibTeX]

    The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76\% F1-score using pre-trained embeddings and are able to improve these results to 90.52\% F1-score using specialized embeddings.
    @inproceedings{Stoeckel:Hemati:Mehler:2019,
        title = "When Specialization Helps: Using Pooled Contextualized Embeddings to Detect Chemical and Biomedical Entities in {S}panish",
        author = "Stoeckel, Manuel and Hemati, Wahed and Mehler, Alexander",
        booktitle = "Proceedings of The 5th Workshop on BioNLP Open Shared Tasks",
        month = nov,
        year = "2019",
        address = "Hong Kong, China",
        publisher = "Association for Computational Linguistics",
        url = "https://www.aclweb.org/anthology/D19-5702",
        doi = "10.18653/v1/D19-5702",
        pages = "11--15",
        abstract = "The recognition of pharmacological substances, compounds and proteins is an essential preliminary work for the recognition of relations between chemicals and other biomedically relevant units. In this paper, we describe an approach to Task 1 of the PharmaCoNER Challenge, which involves the recognition of mentions of chemicals and drugs in Spanish medical texts. We train a state-of-the-art BiLSTM-CRF sequence tagger with stacked Pooled Contextualized Embeddings, word and sub-word embeddings using the open-source framework FLAIR. We present a new corpus composed of articles and papers from Spanish health science journals, termed the Spanish Health Corpus, and use it to train domain-specific embeddings which we incorporate in our model training. We achieve a result of 89.76{\%} F1-score using pre-trained embeddings and are able to improve these results to 90.52{\%} F1-score using specialized embeddings.",
    }
  • [https://www.aclweb.org/anthology/K19-1081] [DOI] S. Ahmed, M. Stoeckel, C. Driller, A. Pachzelt, and Alexander Mehler, “BIOfid Dataset: Publishing a German Gold Standard for Named Entity Recognition in Historical Biodiversity Literature,” in Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), Hong Kong, China, 2019, pp. 871-880.
    [Abstract] [BibTeX]

    The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23\% F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.
    @InProceedings{Ahmed:Stoeckel:Driller:Pachzelt:Mehler:2019,
     author = {Sajawel Ahmed and Manuel Stoeckel and Christine Driller and Adrian Pachzelt and Alexander
    Mehler},
     title = {{BIOfid Dataset: Publishing a German Gold Standard for Named Entity Recognition in Historical
    Biodiversity Literature}},
     booktitle = {Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)},
     publisher = {Association for Computational Linguistics},
     year = 2019,
        booktitle = "Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)",
        address = "Hong Kong, China",
        url = "https://www.aclweb.org/anthology/K19-1081",
        doi = "10.18653/v1/K19-1081",
        pages = "871--880",
        abstract = "The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and create a gold standard for TR in biodiversity literature. More specifically, we perform a practical analysis of our newly generated BIOfid dataset through various downstream-task evaluations and establish a new state of the art for TR with 80.23{\%} F-score. In this sense, our paper lays the foundations for future work in the field of information extraction in biology texts.",
    }