Leon Hammerla

PhD Student

Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401b
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:

Office Hour: TBA

Publications

2024

Andy Lücking, Giuseppe Abrami, Leon Hammerla, Marc Rahn, Daniel Baumartz, Steffen Eger and Alexander Mehler. May, 2024. Dependencies over Times and Tools (DoTT). Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 4641–4653.
BibTeX
@inproceedings{Luecking:et:al:2024,
  abstract  = {Purpose: Based on the examples of English and German, we investigate
               to what extent parsers trained on modern variants of these languages
               can be transferred to older language levels without loss. Methods:
               We developed a treebank called DoTT (https://github.com/texttechnologylab/DoTT)
               which covers, roughly, the time period from 1800 until today,
               in conjunction with the further development of the annotation
               tool DependencyAnnotator. DoTT consists of a collection of diachronic
               corpora enriched with dependency annotations using 3 parsers,
               6 pre-trained language models, 5 newly trained models for German,
               and two tag sets (TIGER and Universal Dependencies). To assess
               how the different parsers perform on texts from different time
               periods, we created a gold standard sample as a benchmark. Results:
               We found that the parsers/models perform quite well on modern
               texts (document-level LAS ranging from 82.89 to 88.54) and slightly
               worse on older texts, as expected (average document-level LAS
               84.60 vs. 86.14), but not significantly. For German texts, the
               (German) TIGER scheme achieved slightly better results than UD.
               Conclusion: Overall, this result speaks for the transferability
               of parsers to past language levels, at least dating back until
               around 1800. This very transferability, it is however argued,
               means that studies of language change in the field of dependency
               syntax can draw on dependency distance but miss out on some grammatical
               phenomena.},
  address   = {Torino, Italy},
  author    = {L{\"u}cking, Andy and Abrami, Giuseppe and Hammerla, Leon and Rahn, Marc
               and Baumartz, Daniel and Eger, Steffen and Mehler, Alexander},
  booktitle = {Proceedings of the 2024 Joint International Conference on Computational
               Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  editor    = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro
               and Sakti, Sakriani and Xue, Nianwen},
  month     = {may},
  pages     = {4641--4653},
  publisher = {ELRA and ICCL},
  title     = {Dependencies over Times and Tools ({D}o{TT})},
  url       = {https://aclanthology.org/2024.lrec-main.415},
  poster    = {https://www.texttechnologylab.org/wp-content/uploads/2024/05/LREC_2024_Poster_DoTT.pdf},
  year      = {2024}
}

2022

Giuseppe Abrami, Mevlüt Bagci, Leon Hammerla and Alexander Mehler. 2022. German Parliamentary Corpus (GerParCor). Proceedings of the Language Resources and Evaluation Conference, 1900–1906.
BibTeX
@inproceedings{Abrami:Bagci:Hammerla:Mehler:2022,
  author    = {Abrami, Giuseppe and Bagci, Mevlüt and Hammerla, Leon and Mehler, Alexander},
  editor    = {Calzolari, Nicoletta and B\'echet, Fr\'ed\'eric and Blache, Philippe
               and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara
               and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H\'el\`ene
               and Odijk, Jan and Piperidis, Stelios},
  title     = {German Parliamentary Corpus (GerParCor)},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference},
  year      = {2022},
  address   = {Marseille, France},
  publisher = {European Language Resources Association},
  pages     = {1900--1906},
  abstract  = {Parliamentary debates represent a large and partly unexploited
               treasure trove of publicly accessible texts. In the German-speaking
               area, there is a certain deficit of uniformly accessible and annotated
               corpora covering all German-speaking parliaments at the national
               and federal level. To address this gap, we introduce the German
               Parliamentary Corpus (GerParCor). GerParCor is a genre-specific
               corpus of (predominantly historical) German-language parliamentary
               protocols from three centuries and four countries, including state
               and federal level data. In addition, GerParCor contains conversions
               of scanned protocols and, in particular, of protocols in Fraktur
               converted via an OCR process based on Tesseract. All protocols
               were preprocessed by means of the NLP pipeline of spaCy3 and automatically
               annotated with metadata regarding their session date. GerParCor
               is made available in the XMI format of the UIMA project. In this
               way, GerParCor can be used as a large corpus of historical texts
               in the field of political communication for various tasks in NLP.},
  url       = {https://aclanthology.org/2022.lrec-1.202},
  poster    = {https://www.texttechnologylab.org/wp-content/uploads/2022/06/GerParCor_LREC_2022.pdf},
  keywords  = {gerparcor},
  pdf       = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.202.pdf}
}