Alexander Henlein

Staff member

 

 

 

 

ContactPublications

Total: 8

2020 (5)

  • G. Abrami, A. Henlein, A. Kett, and A. Mehler, “Text2SceneVR: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems,” in Proceedings of the 31st ACM Conference on Hypertext and Social Media, 2020. accepted
    [BibTeX]

    @InProceedings{Abrami:Henlein:Kett:Mehler:2020,
        author = {Abrami, Giuseppe and Henlein, Alexander and Kett, Attila and Mehler, Alexander},
        title = {{Text2SceneVR}: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems},
        booktitle = {Proceedings of the 31st ACM Conference on Hypertext and Social Media},
        series = {Proceedings of the 31st ACM Conference on Hypertext and Social Media (HT '20)},
        year = {2020},
        location = {Florida, USA / Online},
        publisher = {ACM},
        note = {accepted}
    }
  • [PDF] [https://www.aclweb.org/anthology/2020.lt4hala-1.21] M. Stoeckel, A. Henlein, W. Hemati, and A. Mehler, “Voting for POS tagging of Latin texts: Using the flair of FLAIR to better Ensemble Classifiers by Example of Latin,” in Proceedings of LT4HALA 2020 – 1st Workshop on Language Technologies for Historical and Ancient Languages, Marseille, France, 2020, pp. 130-135.
    [Abstract] [BibTeX]

    Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64\% and 87.00\%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91\% on classical, 90.87\% on cross-genre and 87.35\% on cross-time).
    @InProceedings{Stoeckel:et:al:2020,
      author    = {Stoeckel, Manuel and Henlein, Alexander and Hemati, Wahed and Mehler, Alexander},
      title     = {{Voting for POS tagging of Latin texts: Using the flair of FLAIR to better Ensemble Classifiers by Example of Latin}},
      booktitle      = {Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages},
      month          = {May},
      year           = {2020},
      address        = {Marseille, France},
      publisher      = {European Language Resources Association (ELRA)},
      pages     = {130--135},
      abstract  = {Despite the great importance of the Latin language in the past, there are relatively few resources available today to develop modern NLP tools for this language. Therefore, the EvaLatin Shared Task for Lemmatization and Part-of-Speech (POS) tagging was published in the LT4HALA workshop. In our work, we dealt with the second EvaLatin task, that is, POS tagging. Since most of the available Latin word embeddings were trained on either few or inaccurate data, we trained several embeddings on better data in the first step. Based on these embeddings, we trained several state-of-the-art taggers and used them as input for an ensemble classifier called LSTMVoter. We were able to achieve the best results for both the cross-genre and the cross-time task (90.64\% and 87.00\%) without using additional annotated data (closed modality). In the meantime, we further improved the system and achieved even better results (96.91\% on classical, 90.87\% on cross-genre and 87.35\% on cross-time).},
      url       = {https://www.aclweb.org/anthology/2020.lt4hala-1.21},
      pdf       = {http://www.lrec-conf.org/proceedings/lrec2020/workshops/LT4HALA/pdf/2020.lt4hala-1.21.pdf}
    }
  • [PDF] A. Mehler, B. Jussen, T. Geelhaar, A. Henlein, G. Abrami, D. Baumartz, T. Uslu, and W. Hemati, “The Frankfurt Latin Lexicon. From Morphological Expansion and Word Embeddings to SemioGraphs,” Studi e Saggi Linguistici, 2020. in press
    [Abstract] [BibTeX]

    In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.
    @article{Mehler:et:al:2020b,
        author={Mehler, Alexander and Jussen, Bernhard and Geelhaar, Tim and Henlein, Alexander and Abrami, Giuseppe and Baumartz, Daniel and Uslu, Tolga and Hemati, Wahed},
        title={{The Frankfurt Latin Lexicon. From Morphological Expansion and Word Embeddings to SemioGraphs}},
        journal={Studi e Saggi Linguistici},
        year={2020},
        note={in press},
        abstract={In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.},
        pdf={https://arxiv.org/pdf/2005.10790.pdf}
    }
  • [PDF] [https://www.aclweb.org/anthology/2020.isa-1.4] A. Henlein, G. Abrami, A. Kett, and A. Mehler, “Transfer of ISOSpace into a 3D Environment for Annotations and Applications,” in 16th Joint ACL – ISO Workshop on Interoperable Semantic Annotation PROCEEDINGS, Marseille, 2020, pp. 32-35.
    [Abstract] [BibTeX]

    People's visual perception is very pronounced and therefore it is usually no problem for them to describe the space around them in words. Conversely, people also have no problems imagining a concept of a described space. In recent years many efforts have been made to develop a linguistic concept for spatial and spatial-temporal relations. However, the systems have not really caught on so far, which in our opinion is due to the complex models on which they are based and the lack of available training data and automated taggers. In this paper we describe a project to support spatial annotation, which could facilitate annotation by its many functions, but also enrich it with many more information. This is to be achieved by an extension by means of a VR environment, with which spatial relations can be better visualized and connected with real objects. And we want to use the available data to develop a new state-of-the-art tagger and thus lay the foundation for future systems such as improved text understanding for Text2Scene.
    @InProceedings{Henlein:et:al:2020,
      Author         = {Henlein, Alexander and Abrami, Giuseppe and Kett, Attila and Mehler, Alexander},
      Title          = {Transfer of ISOSpace into a 3D Environment for Annotations and Applications},
    booktitle      = {16th Joint ACL - ISO Workshop on Interoperable Semantic Annotation PROCEEDINGS},
      month          = {May},
      year           = {2020},
      address        = {Marseille},
      publisher      = {European Language Resources Association},
      pages     = {32--35},
      abstract  = {People's visual perception is very pronounced and therefore it is usually no problem for them to describe the space around them in words. Conversely, people also have no problems imagining a concept of a described space. In recent years many efforts have been made to develop a linguistic concept for spatial and spatial-temporal relations. However, the systems have not really caught on so far, which in our opinion is due to the complex models on which they are based and the lack of available training data and automated taggers. In this paper we describe a project to support spatial annotation, which could facilitate annotation by its many functions, but also enrich it with many more information. This is to be achieved by an extension by means of a VR environment, with which spatial relations can be better visualized and connected with real objects. And we want to use the available data to develop a new state-of-the-art tagger and thus lay the foundation for future systems such as improved text understanding for Text2Scene.},
      url       = {https://www.aclweb.org/anthology/2020.isa-1.4},
      pdf      = {http://www.lrec-conf.org/proceedings/lrec2020/workshops/ISA16/pdf/2020.isa-1.4.pdf}
    }
  • [PDF] [https://www.aclweb.org/anthology/2020.lrec-1.4] A. Henlein and A. Mehler, “On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks,” in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2020, pp. 27-33.
    [Abstract] [BibTeX]

    Coreference resolution (CR) aims to find all spans of a text that refer to the same entity. The F1-Scores on these task have been greatly improved by new developed End2End-approaches and transformer networks. The inclusion of CR as a pre-processing step is expected to lead to improvements in downstream tasks. The paper examines this effect with respect to word embeddings. That is, we analyze the effects of CR on six different embedding methods and evaluate them in the context of seven lexical-semantic evaluation tasks and instantiation/hypernymy detection. Especially in the last tasks we hoped for a significant increase in performance. We show that all word embedding approaches do not benefit significantly from pronoun substitution. The measurable improvements are only marginal (around 0.5\% in most test cases). We explain this result with the loss of contextual information, reduction of the relative occurrence of rare words and the lack of pronouns to be replaced.
    @InProceedings{Henlein:Mehler:2020,
      Author         = {Henlein, Alexander and Mehler, Alexander},
      Title          = {{On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks}},
      booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
      month          = {May},
      year           = {2020},
      address        = {Marseille, France},
      publisher      = {European Language Resources Association},
      pages     = {27--33},
      abstract  = {Coreference resolution (CR) aims to find all spans of a text that refer to the same entity. The F1-Scores on these task have been greatly improved by new developed End2End-approaches and transformer networks. The inclusion of CR as a pre-processing step is expected to lead to improvements in downstream tasks. The paper examines this effect with respect to word embeddings. That is, we analyze the effects of CR on six different embedding methods and evaluate them in the context of seven lexical-semantic evaluation tasks and instantiation/hypernymy detection. Especially in the last tasks we hoped for a significant increase in performance. We show that all word embedding approaches do not benefit significantly from pronoun substitution. The measurable improvements are only marginal (around 0.5\% in most test cases). We explain this result with the loss of contextual information, reduction of the relative occurrence of rare words and the lack of pronouns to be replaced.},
      url       = {https://www.aclweb.org/anthology/2020.lrec-1.4},
      pdf      = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.4.pdf}
    }

2019 (1)

  • [PDF] [http://jlm.ipipan.waw.pl/index.php/JLM/article/view/205] [DOI] R. Gleim, S. Eger, A. Mehler, T. Uslu, W. Hemati, A. Lücking, A. Henlein, S. Kahlsdorf, and A. Hoenen, “A practitioner’s view: a survey and comparison of lemmatization and morphological tagging in German and Latin,” Journal of Language Modeling, 2019.
    [BibTeX]

    @article{Gleim:Eger:Mehler:2019,
      author    = {Gleim, R\"{u}diger and Eger, Steffen and Mehler, Alexander and Uslu, Tolga and Hemati, Wahed and L\"{u}cking, Andy and Henlein, Alexander and Kahlsdorf, Sven and Hoenen, Armin},
      title     = {A practitioner's view: a survey and comparison of lemmatization and morphological tagging in German and Latin},
      journal   = {Journal of Language Modeling},
      year      = {2019},
      pdf = {https://www.texttechnologylab.org/wp-content/uploads/2019/07/jlm-tagging.pdf},
      doi = {10.15398/jlm.v7i1.205},
      url = {http://jlm.ipipan.waw.pl/index.php/JLM/article/view/205} 
    }

2018 (2)

  • T. Uslu, L. Miebach, S. Wolfsgruber, M. Wagner, K. Fließbach, R. Gleim, W. Hemati, A. Henlein, and A. Mehler, “Automatic Classification in Memory Clinic Patients and in Depressive Patients,” in Proceedings of Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric impairments (RaPID-2), 2018.
    [BibTeX]

    @InProceedings{Uslu:et:al:2018:a,
      Author         = {Tolga Uslu and Lisa Miebach and Steffen Wolfsgruber
                       and Michael Wagner and Klaus Fließbach and Rüdiger
                       Gleim and Wahed Hemati and Alexander Henlein and
                       Alexander Mehler},
      Title          = {{Automatic Classification in Memory Clinic Patients
                       and in Depressive Patients}},
      BookTitle      = {Proceedings of Resources and ProcessIng of linguistic,
                       para-linguistic and extra-linguistic Data from people
                       with various forms of cognitive/psychiatric impairments
                       (RaPID-2)},
      Series         = {RaPID},
      location       = {Miyazaki, Japan},
      year           = 2018
    }
  • [PDF] T. Uslu, A. Mehler, D. Baumartz, A. Henlein, and W. Hemati, “fastSense: An Efficient Word Sense Disambiguation Classifier,” in Proceedings of the 11th edition of the Language Resources and Evaluation Conference, May 7 – 12, Miyazaki, Japan, 2018.
    [BibTeX]

    @InProceedings{Uslu:et:al:2018,
      Author         = {Tolga Uslu and Alexander Mehler and Daniel Baumartz
                       and Alexander Henlein and Wahed Hemati },
      Title          = {fastSense: An Efficient Word Sense Disambiguation
                       Classifier},
      BookTitle      = {Proceedings of the 11th edition of the Language
                       Resources and Evaluation Conference, May 7 - 12},
      Series         = {LREC 2018},
      Address        = {Miyazaki, Japan},
      pdf            = {https://www.texttechnologylab.org/wp-content/uploads/2018/03/fastSense.pdf},
      year           = 2018
    }