Publication

New EMNLP 2025 publication accepted

The publication MedLinkDE — MedDRA Entity Linking for German with Guided Chain of Thought Reasoning was accepted at the EMNLP 2025.

Roman Christof, Farnaz Zeidi, Manuela Messelhäußer, Dirk Mentzer, Renate Koenig, Liam Childs and Alexander Mehler. November, 2025. MedLinkDE – MedDRA Entity Linking for German with Guided Chain of Thought Reasoning. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 31569–31581.
BibTeX
@inproceedings{Christof:et:al:2025,
  author    = {Christof, Roman and Zeidi, Farnaz and Messelhäußer, Manuela and Mentzer, Dirk
               and Koenig, Renate and Childs, Liam and Mehler, Alexander},
  title     = {{M}ed{L}ink{DE} {--} {M}ed{DRA} Entity Linking for {G}erman with
               Guided Chain of Thought Reasoning},
  editor    = {Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn
               and Peng, Violet},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural
               Language Processing},
  month     = {nov},
  year      = {2025},
  address   = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.emnlp-main.1609/},
  doi       = {10.18653/v1/2025.emnlp-main.1609},
  pages     = {31569--31581},
  isbn      = {979-8-89176-332-6},
  pdf       = {https://aclanthology.org/2025.emnlp-main.1609.pdf},
  abstract  = {In pharmacovigilance, effective automation of medical data structuring,
               especially linking entities to standardized terminologies such
               as MedDRA, is critical. This challenge is rarely addressed for
               German data. With MedLinkDE we address German MedDRA entity linking
               for adverse drug reactions in a two-step approach: (1) retrieval
               of medical terms with fine-tuned embedding models, followed (2)
               by guided chain-of-thought re-ranking using LLMs. To this end,
               we introduce RENOde, a German real-world MedDRA dataset consisting
               of reportings from patients and healthcare professionals. To overcome
               the challenges posed by the linguistic diversity of these reports,
               we generate synthetic data mapping the two reporting styles of
               patients and healthcare professionals. Our embedding models, fine-tuned
               on these synthetic, quasi-personalized datasets, show competitive
               performance with real datasets in terms of accuracy at high top-
               recall, providing a robust basis for re-ranking. Our subsequent
               guided Chain of Thought (CoT) re-ranking, informed by MedDRA coding
               guidelines, improves entity linking accuracy by approximately
               15{\%} (Acc@1) compared to embedding-only strategies. In this
               way, our approach demonstrates the feasibility of entity linking
               in medical reports under the constraints of data scarcity by relying
               on synthetic data reflecting different informant roles of reporting
               persons.}
}

New SemDial publication

TTLab publishes its VR-based human–human directions dialogue corpus mediated by avatars.

Andy Lücking, Felix Voll, Daniel Rott, Alexander Henlein and Alexander Mehler. 2025. Head and Hand Movements During Turn Transitions: Data-Based Multimodal Analysis Using the Frankfurt VR Gesture–Speech Alignment Corpus (FraGA). Proceedings of the 29th Workshop on The Semantics and Pragmatics of Dialogue – Full Papers, 146–156.
BibTeX
@inproceedings{Luecking:Voll:Rott:Henlein:Mehler:2025-fraga,
  title     = {Head and Hand Movements During Turn Transitions: Data-Based Multimodal
               Analysis Using the {Frankfurt VR Gesture--Speech Alignment Corpus}
               ({FraGA})},
  author    = {Lücking, Andy and Voll, Felix and Rott, Daniel and Henlein, Alexander
               and Mehler, Alexander},
  year      = {2025},
  booktitle = {Proceedings of the 29th Workshop on The Semantics and Pragmatics
               of Dialogue -- Full Papers},
  series    = {SemDial'25 -- Bialogue},
  publisher = {SEMDIAL},
  url       = {http://semdial.org/anthology/Z25-Luecking_semdial_3316.pdf},
  pages     = {146--156},
  keywords  = {gemdis}
}

New publications accepted

The following publications were accepted at the related conferences:

ACM Hypertext 2025 (36th ACM Conference on Hypertext and Social Media)

Giuseppe Abrami, Daniel Bundan, Chrisowaladis Manolis and Alexander Mehler. 2025. VR-ParlExplorer: A Hypertext System for the Collaborative Interaction in Parliamentary Debate Spaces. Proceedings of the 36th ACM Conference on Hypertext and Social Media, 177–183.
BibTeX
@inproceedings{Abrami:et:al:2025:c,
  author    = {Abrami, Giuseppe and Bundan, Daniel and Manolis, Chrisowaladis
               and Mehler, Alexander},
  title     = {VR-ParlExplorer: A Hypertext System for the Collaborative Interaction
               in Parliamentary Debate Spaces},
  year      = {2025},
  isbn      = {9798400715341},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  url       = {https://doi.org/10.1145/3720553.3746672},
  doi       = {10.1145/3720553.3746672},
  abstract  = {The enhanced visualization and interaction with information in
               collaborative VR environments enabled by chatbots is currently
               rather limited. To fill this gap and create a concrete application
               that combines spatial and virtual concepts of hypertext systems
               based on the use of LLMs, we present VR-ParlExplorer as a system
               for virtualizing plenary debates that allows users to interact
               with virtual members of parliament through chatbots. VR-ParlExplorer
               is implemented as a Plugin for Va.Si.Li-Lab to enable immersion
               in the dynamics of communication in parliamentary debates. The
               paper describes the functionality of VR-ParlExplorer and discusses
               specifics of the use case it addresses.},
  booktitle = {Proceedings of the 36th ACM Conference on Hypertext and Social Media},
  pages     = {177--183},
  numpages  = {7},
  location  = {Chicago, USA},
  series    = {HT '25},
  pdf       = {https://dl.acm.org/doi/pdf/10.1145/3720553.3746672}
}


KONVENS 2025 (21th Conference on Natural Language Processing)

Daniel Bundan, Giuseppe Abrami and Alexander Mehler. 2025. Multimodal Docker Unified UIMA Interface: New Horizons for Distributed Microservice-Oriented Processing of Corpora using UIMA. Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Long and Short Papers, 257–268.
BibTeX
@inproceedings{Bundan:Abrami:Mehler:2025,
  author    = {Bundan, Daniel and Abrami, Giuseppe and Mehler, Alexander},
  title     = {Multimodal Docker Unified {UIMA} Interface: New Horizons for Distributed
               Microservice-Oriented Processing of Corpora using {UIMA}},
  booktitle = {Proceedings of the 21st Conference on Natural Language Processing
               (KONVENS 2025): Long and Short Papers},
  year      = {2025},
  editor    = {Wartena, Christian and Heid, Ulrich},
  location  = {Hildesheim, Germany},
  address   = {Hannover, Germany},
  publisher = {HsH Applied Academics},
  pages     = {257--268},
  series    = {KONVENS '25},
  url       = {https://aclanthology.org/2025.konvens-1.22/},
  pdf       = {https://aclanthology.org/2025.konvens-1.22.pdf},
  poster    = {https://www.texttechnologylab.org/wp-content/uploads/2025/09/Poster_Multimodal_DUUI_KONVENS_2025.pdf},
  keywords  = {duui,neglab,new-data-spaces,circlet}
}

New publication accepted in ACL Findings 2025

Our paper, Filling the Temporal Void: Recovering Missing Publication Years in the Project Gutenberg Corpus Using LLMs, has been accepted to the Findings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).

Omar Momen, Manuel Schaaf and Alexander Mehler. July, 2025. Filling the Temporal Void: Recovering Missing Publication Years in the Project Gutenberg Corpus Using LLMs. Findings of the Association for Computational Linguistics: ACL 2025, 17318–17334.
BibTeX
@inproceedings{Momen:Schaaf:Mehler:2025,
  title     = {Filling the Temporal Void: Recovering Missing Publication Years
               in the Project Gutenberg Corpus Using {LLM}s},
  author    = {Momen, Omar and Schaaf, Manuel and Mehler, Alexander},
  editor    = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2025},
  month     = {jul},
  year      = {2025},
  address   = {Vienna, Austria},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.findings-acl.890/},
  pages     = {17318--17334},
  isbn      = {979-8-89176-256-5},
  abstract  = {Analysing texts spanning long periods of time is critical for
               researchers in historical linguistics and related disciplines.
               However, publicly available corpora suitable for such analyses
               are scarce. The Project Gutenberg (PG) corpus presents a significant
               yet underutilized opportunity in this context, due to the absence
               of accurate temporal metadata. We take advantage of language models
               and information retrieval to explore four sources of information
               {--} Open Web, Wikipedia, Open Library API, and PG books texts
               {--} to add missing temporal metadata to the PG corpus. Through
               20 experiments employing state-of-the-art Large Language Models
               (LLMs) and Retrieval-Augmented Generation (RAG) methods, we estimate
               the production years of all PG books. We curate an enriched metadata
               repository for the PG corpus and propose a refined version for
               it, which includes 53,774 books with a total of 3.8 billion tokens
               in 11 languages, produced between 1600 and 2000. This work provides
               a new resource for computational linguistics and humanities studies
               focusing on diachronic analyses. The final dataset and all experiments
               data are publicly available (https://github.com/OmarMomen14/pg-dates).},
  pdf       = {https://aclanthology.org/2025.findings-acl.890.pdf}
}