New EMNLP 2025 publication accepted

The publication MedLinkDE — MedDRA Entity Linking for German with Guided Chain of Thought Reasoning was accepted at the EMNLP 2025.

Roman Christof, Farnaz Zeidi, Manuela Messelhäußer, Dirk Mentzer, Renate Koenig, Liam Childs and Alexander Mehler. November, 2025. MedLinkDE – MedDRA Entity Linking for German with Guided Chain of Thought Reasoning. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 31569–31581.

BibTeX

@inproceedings{Christof:et:al:2025,
  author    = {Christof, Roman and Zeidi, Farnaz and Messelhäußer, Manuela and Mentzer, Dirk
               and Koenig, Renate and Childs, Liam and Mehler, Alexander},
  title     = {{M}ed{L}ink{DE} {--} {M}ed{DRA} Entity Linking for {G}erman with
               Guided Chain of Thought Reasoning},
  editor    = {Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn
               and Peng, Violet},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural
               Language Processing},
  month     = {nov},
  year      = {2025},
  address   = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.emnlp-main.1609/},
  doi       = {10.18653/v1/2025.emnlp-main.1609},
  pages     = {31569--31581},
  isbn      = {979-8-89176-332-6},
  pdf       = {https://aclanthology.org/2025.emnlp-main.1609.pdf},
  abstract  = {In pharmacovigilance, effective automation of medical data structuring,
               especially linking entities to standardized terminologies such
               as MedDRA, is critical. This challenge is rarely addressed for
               German data. With MedLinkDE we address German MedDRA entity linking
               for adverse drug reactions in a two-step approach: (1) retrieval
               of medical terms with fine-tuned embedding models, followed (2)
               by guided chain-of-thought re-ranking using LLMs. To this end,
               we introduce RENOde, a German real-world MedDRA dataset consisting
               of reportings from patients and healthcare professionals. To overcome
               the challenges posed by the linguistic diversity of these reports,
               we generate synthetic data mapping the two reporting styles of
               patients and healthcare professionals. Our embedding models, fine-tuned
               on these synthetic, quasi-personalized datasets, show competitive
               performance with real datasets in terms of accuracy at high top-
               recall, providing a robust basis for re-ranking. Our subsequent
               guided Chain of Thought (CoT) re-ranking, informed by MedDRA coding
               guidelines, improves entity linking accuracy by approximately
               15{\%} (Acc@1) compared to embedding-only strategies. In this
               way, our approach demonstrates the feasibility of entity linking
               in medical reports under the constraints of data scarcity by relying
               on synthetic data reflecting different informant roles of reporting
               persons.}
}