
PhD Student
Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401b
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:
Office Hour: TBA
Thesis topic proposals
2025
Master Thesis: Negation and LLM Reasoning.
Description
As lexical and logical negation appears to play a crucial role in human reasoning and inquiry, we are interested in analyzing negation patterns in reasoning traces produced by large language models (LLMs), as well as in LLM reasoning frameworks that explicitly incorporate negation, with the goal of better mimicking human reasoning. Possible directions for this thesis include: (1) The development of LLM reasoning frameworks centered around the phenomenon of negation and their evaluation against existing frameworks such as Chain-of-Thought (CoT) or Tree-of-Thought (ToT). (2) Negation-centered fine-tuning of LLM reasoning. (3) Qualitative and quantitative analysis of reasoning traces produced by LLMs, focusing on negation patterns.
Corresponding Lab Member:
Corresponding Lab Member:
Bachelor Thesis: Detecting the negated Event/Detecting the Focus of Negation.
Description
Classical negation annotation in computational linguistics involves identifying the negation cue, determining the scope of the negation, and detecting both the negated event and the most prominent part of the scope that is negated (the focus). While reliable systems already exist for detecting negation cues and scopes, current frameworks need to be extended to identify the negated event and/or the focus. For a bachelor thesis, addressing one of these two aspects is sufficient; for a master thesis, both should be tackled. A Python-based pipeline for cue and scope detection is already available, and the newly developed detection modules can be integrated into this existing framework (python).
Corresponding Lab Member:
Corresponding Lab Member:
If you have any suggestions of your own relating to this or our other proposed topics, please do not hesitate to contact us.
In addition, we provide a mailing list for free, which we use to inform regularly about updates on new qualification and research work as well as other information relating to Texttechnology.
Publications
2025
2025.
D-Neg: Syntax-Aware Graph Reasoning for Negation Detection. Proceedings of 2025 International Joint Conference on Natural
Language Processing & Asia-Pacific Chapter of the Association
for Computational Linguistics (IJCNLP-AACL-Findings).
accepted.
BibTeX
@inproceedings{Hammerla:et:al:2025b,
author = {Hammerla, Leon and Lücking, Andy and Reinert, Carolin and Mehler, Alexander},
title = {D-Neg: Syntax-Aware Graph Reasoning for Negation Detection},
booktitle = {Proceedings of 2025 International Joint Conference on Natural
Language Processing & Asia-Pacific Chapter of the Association
for Computational Linguistics (IJCNLP-AACL-Findings)},
year = {2025},
note = {accepted}
}
2025.
Standardizing Heterogeneous Corpora with DUUR: A Dual Data- and
Process-Oriented Approach to Enhancing NLP Pipeline Integration. Proceedings of 2025 International Joint Conference on Natural
Language Processing & Asia-Pacific Chapter of the Association
for Computational Linguistics (IJCNLP-AACL-Findings).
accepted.
BibTeX
@inproceedings{Hammerla:et:al:2025a,
author = {Hammerla, Leon and Mehler, Alexander and Abrami, Giuseppe},
title = {Standardizing Heterogeneous Corpora with DUUR: A Dual Data- and
Process-Oriented Approach to Enhancing NLP Pipeline Integration},
booktitle = {Proceedings of 2025 International Joint Conference on Natural
Language Processing & Asia-Pacific Chapter of the Association
for Computational Linguistics (IJCNLP-AACL-Findings)},
year = {2025},
note = {accepted}
}
2025.
Constructed Responses beyond NLP – Auswertungsansätze für graphische Antworten. Inproceedings of 12. Jahrestagung der Gesellschaft für empirische
Bildungsforschung (GEBF 2025).
BibTeX
@inproceedings{Hahn:et:al:2025,
author = {Sonja Hahn and Leon Hammerla and Corinna Hankeln and Sebastian Groß
and Christina Röpers and Ulf Kröhne},
title = {Constructed Responses beyond NLP – Auswertungsansätze für graphische Antworten},
booktitle = {Inproceedings of 12. Jahrestagung der Gesellschaft für empirische
Bildungsforschung (GEBF 2025)},
location = {Mannheim, Deutschland},
year = {2025}
}
2024
2024.
How much training data are required? Automatic scoring using prompting
compared to text classification tasks as fine-tuning large-language
models. Inproceedings of 53. Kongress der Deutschen Gesellschaft für Psychologie
/ 15. ÖGP Conference.
BibTeX
@inproceedings{Kroehne:et:al:2024,
author = {Ulf Kröhne and Leon Hammerla and Corinna Hankeln and Marc Müller and Sonja Hahn},
title = {How much training data are required? Automatic scoring using prompting
compared to text classification tasks as fine-tuning large-language
models},
booktitle = {Inproceedings of 53. Kongress der Deutschen Gesellschaft für Psychologie
/ 15. ÖGP Conference},
location = {Wien, Österreich},
year = {2024}
}
May, 2024.
Dependencies over Times and Tools (DoTT). Proceedings of the 2024 Joint International Conference on Computational
Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 4641–4653.
BibTeX
@inproceedings{Luecking:et:al:2024,
abstract = {Purpose: Based on the examples of English and German, we investigate
to what extent parsers trained on modern variants of these languages
can be transferred to older language levels without loss. Methods:
We developed a treebank called DoTT (https://github.com/texttechnologylab/DoTT)
which covers, roughly, the time period from 1800 until today,
in conjunction with the further development of the annotation
tool DependencyAnnotator. DoTT consists of a collection of diachronic
corpora enriched with dependency annotations using 3 parsers,
6 pre-trained language models, 5 newly trained models for German,
and two tag sets (TIGER and Universal Dependencies). To assess
how the different parsers perform on texts from different time
periods, we created a gold standard sample as a benchmark. Results:
We found that the parsers/models perform quite well on modern
texts (document-level LAS ranging from 82.89 to 88.54) and slightly
worse on older texts, as expected (average document-level LAS
84.60 vs. 86.14), but not significantly. For German texts, the
(German) TIGER scheme achieved slightly better results than UD.
Conclusion: Overall, this result speaks for the transferability
of parsers to past language levels, at least dating back until
around 1800. This very transferability, it is however argued,
means that studies of language change in the field of dependency
syntax can draw on dependency distance but miss out on some grammatical
phenomena.},
address = {Torino, Italy},
author = {L{\"u}cking, Andy and Abrami, Giuseppe and Hammerla, Leon and Rahn, Marc
and Baumartz, Daniel and Eger, Steffen and Mehler, Alexander},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational
Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro
and Sakti, Sakriani and Xue, Nianwen},
month = {may},
pages = {4641--4653},
publisher = {ELRA and ICCL},
title = {Dependencies over Times and Tools ({D}o{TT})},
url = {https://aclanthology.org/2024.lrec-main.415},
poster = {https://www.texttechnologylab.org/wp-content/uploads/2024/05/LREC_2024_Poster_DoTT.pdf},
year = {2024}
}
2022
2022.
German Parliamentary Corpus (GerParCor). Proceedings of the Language Resources and Evaluation Conference, 1900–1906.
BibTeX
@inproceedings{Abrami:Bagci:Hammerla:Mehler:2022,
author = {Abrami, Giuseppe and Bagci, Mevlüt and Hammerla, Leon and Mehler, Alexander},
editor = {Calzolari, Nicoletta and B\'echet, Fr\'ed\'eric and Blache, Philippe
and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara
and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H\'el\`ene
and Odijk, Jan and Piperidis, Stelios},
title = {German Parliamentary Corpus (GerParCor)},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {1900--1906},
abstract = {Parliamentary debates represent a large and partly unexploited
treasure trove of publicly accessible texts. In the German-speaking
area, there is a certain deficit of uniformly accessible and annotated
corpora covering all German-speaking parliaments at the national
and federal level. To address this gap, we introduce the German
Parliamentary Corpus (GerParCor). GerParCor is a genre-specific
corpus of (predominantly historical) German-language parliamentary
protocols from three centuries and four countries, including state
and federal level data. In addition, GerParCor contains conversions
of scanned protocols and, in particular, of protocols in Fraktur
converted via an OCR process based on Tesseract. All protocols
were preprocessed by means of the NLP pipeline of spaCy3 and automatically
annotated with metadata regarding their session date. GerParCor
is made available in the XMI format of the UIMA project. In this
way, GerParCor can be used as a large corpus of historical texts
in the field of political communication for various tasks in NLP.},
url = {https://aclanthology.org/2022.lrec-1.202},
poster = {https://www.texttechnologylab.org/wp-content/uploads/2022/06/GerParCor_LREC_2022.pdf},
keywords = {gerparcor},
pdf = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.202.pdf}
}
