New submissions accepted at LREC 2020, Marseille, France

      Comments Off on New submissions accepted at LREC 2020, Marseille, France

The following new publications was accepted at the 12th International Conference on Language Resources and Evaluation (LREC) 2020 in Marseille, France:

  • [1] TextAnnotator: A UIMA based tool for simultaneous and collaborative annotation of texts
  • [2] On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks
  • [3] Recognizing Sentence-level Logical Document Structures with the Help of Context-free Grammars

[1] [pdf] G. Abrami, M. Stoeckel, and A. Mehler, “TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts,” in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2020, pp. 891-900.
[Bibtex]
@InProceedings{Abrami:Stoeckel:Mehler:2020,
  author    = {Abrami, Giuseppe  and  Stoeckel, Manuel  and  Mehler, Alexander},
  title     = {TextAnnotator: A UIMA Based Tool for the Simultaneous and Collaborative Annotation of Texts},
  booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month     = {May},
  year      = {2020},
  address   = {Marseille, France},
  publisher = {European Language Resources Association},
  pages     = {891--900},
  ISBN = "979-10-95546-34-4",
  abstract  = {The annotation of texts and other material in the field of digital humanities and Natural Language Processing (NLP) is a common task of research projects. At the same time, the annotation of corpora is certainly the most time- and cost-intensive component in research projects and often requires a high level of expertise according to the research interest. However, for the annotation of texts, a wide range of tools is available, both for automatic and manual annotation. Since the automatic pre-processing methods are not error-free and there is an increasing demand for the generation of training data, also with regard to machine learning, suitable annotation tools are required. This paper defines criteria of flexibility and efficiency of complex annotations for the assessment of existing annotation tools. To extend this list of tools, the paper describes TextAnnotator, a browser-based, multi-annotation system, which has been developed to perform platform-independent multimodal annotations and annotate complex textual structures. The paper illustrates the current state of development of TextAnnotator and demonstrates its ability to evaluate annotation quality (inter-annotator agreement) at runtime. In addition, it will be shown how annotations of different users can be performed simultaneously and collaboratively on the same document from different platforms using UIMA as the basis for annotation.},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.112},
  pdf       = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.112.pdf}
}
[2] [pdf] A. Henlein and A. Mehler, “On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks,” in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2020, pp. 27-33.
[Bibtex]
@InProceedings{Henlein:Mehler:2020,
  Author         = {Henlein, Alexander and Mehler, Alexander},
  Title          = {{On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks}},
  booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month          = {May},
  year           = {2020},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {27--33},
  abstract  = {Coreference resolution (CR) aims to find all spans of a text that refer to the same entity. The F1-Scores on these task have been greatly improved by new developed End2End-approaches and transformer networks. The inclusion of CR as a pre-processing step is expected to lead to improvements in downstream tasks. The paper examines this effect with respect to word embeddings. That is, we analyze the effects of CR on six different embedding methods and evaluate them in the context of seven lexical-semantic evaluation tasks and instantiation/hypernymy detection. Especially in the last tasks we hoped for a significant increase in performance. We show that all word embedding approaches do not benefit significantly from pronoun substitution. The measurable improvements are only marginal (around 0.5\% in most test cases). We explain this result with the loss of contextual information, reduction of the relative occurrence of rare words and the lack of pronouns to be replaced.},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.4},
  pdf      = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.4.pdf}
}
[3] [pdf] J. Hildebrand, W. Hemati, and A. Mehler, “Recognizing Sentence-level Logical Document Structures with the Help of Context-free Grammars,” in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2020, pp. 5282-5290.
[Bibtex]
@InProceedings{Hildebrand:Hemati:Mehler:2020,
  Author         = {Hildebrand, Jonathan and Hemati, Wahed and Mehler, Alexander},
  Title          = {Recognizing Sentence-level Logical Document Structures with the Help of Context-free Grammars},
 booktitle      = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month          = {May},
  year           = {2020},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {5282--5290},
  abstract  = {Current sentence boundary detectors split documents into sequentially ordered sentences by detecting their beginnings and ends. Sentences, however, are more deeply structured even on this side of constituent and dependency structure: they can consist of a main sentence and several subordinate clauses as well as further segments (e.g. inserts in parentheses); they can even recursively embed whole sentences and then contain multiple sentence beginnings and ends. In this paper, we introduce a tool that segments sentences into tree structures to detect this type of recursive structure. To this end, we retrain different constituency parsers with the help of modified training data to transform them into sentence segmenters. With these segmenters, documents are mapped to sequences of sentence-related “logical document structures”. The resulting segmenters aim to improve downstream tasks by providing additional structural information. In this context, we experiment with German dependency parsing. We show that for certain sentence categories, which can be determined automatically, improvements in German dependency parsing can be achieved using our segmenter for preprocessing. The assumption suggests that improvements in other languages and tasks can be achieved.},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.650},
  pdf      = {http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.650.pdf}
}