
Research assistant
Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401c
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:
Thesis topic proposals
2025
Bachelor Thesis: Full-text Scientific Argument Mining using Large Language Models.
Description
Scientific articles contain a mix of argumentative and non-argumentative content, yet only
argumentative sentences, particularly claims, contribute to the scientific discourse and are
therefore central to argument mining. A key challenge is not only to identify whether a sentence
expresses a claim, but also to distinguish between own claims (novel contributions by the
author), background claims (statements grounded in prior work, often signaled by citations),
data or evidence (empirical results that support claims), and non-argumentative content
(methodological or descriptive text). This project proposes to address the task of claim detection
and classification in full-text scientific articles by leveraging large language models, beginning
with binary classification of claim versus non-claim sentences and extending to multi-class
classification across the four categories. The approach will explore prompt-based classification
and domain-specific fine-tuning, with the potential integration of citation-aware heuristics, aiming
to establish a robust baseline for scientific claim detection as a foundation for downstream
argument mining tasks.
See also:
Corresponding Lab Member:
See also:
- Full-Text Argumentation Mining on Scientific Publications
- Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment
Corresponding Lab Member:
Bachelor Thesis: Can we use scientific mentions to reconstruct or identify scientific
argumentative text?.
Description
Scientific articles contain numerous mentions of datasets, methods, tasks, and metrics, which
capture essential elements of the scientific discourse. A key question is whether these scientific
mentions and their interrelations can be leveraged to reconstruct or identify argumentative text,
such as claims and supporting evidence. Existing resources like SciER and SciREX provide
annotations for such mentions and their relations, which can be used to detect how claims are
formulated or to identify sentences that express claims in context. Beyond leveraging existing
mentions, identifying additional scientific entities and their relations could further enrich the
representation of scientific arguments. Given the current lack of full-text scientific argument
mining datasets, this task has the potential to support the creation of a large-scale corpus of
argumentative sentences and their relational structure, providing a foundation for downstream
tasks in scientific argument mining and automated knowledge extraction.
See also:
Corresponding Lab Member:
- An Argument-Annotated Corpus of Scientific Publications
- SCIREX: A Challenge Dataset for Document-Level Information Extraction
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents
Corresponding Lab Member:
If you have any suggestions of your own relating to this or our other proposed topics, please do not hesitate to contact us.
In addition, we provide a mailing list for free, which we use to inform regularly about updates on new qualification and research work as well as other information relating to Texttechnology.
Publications
2024
2024.
Socio-Semantic X-Ray of Multi-Actor Constellations using Topics
and Interstitial Authors: A Toolkit for Augmenting Computational
Literature Reviews. Available at SSRN 4713155.
BibTeX
@article{Owoyele:et:al:2020,
title = {Socio-Semantic X-Ray of Multi-Actor Constellations using Topics
and Interstitial Authors: A Toolkit for Augmenting Computational
Literature Reviews},
author = {Owoyele, Babajide and Verma, Bhuvanesh and Omolaoye, Victor and Edelman, Jonathan Antonio
and Loorbach, Derk and de Melo, Gerard},
journal = {Available at SSRN 4713155},
doi = {10.2139/ssrn.4713155},
url = {https://dx.doi.org/10.2139/ssrn.4713155},
year = {2024}
}
2024.
MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy
Risks and Maximizing Utility in Audio-Visual Data Archiving.
BibTeX
@misc{Owoyele:et:al:2024,
title = {MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy
Risks and Maximizing Utility in Audio-Visual Data Archiving},
author = {Babajide Alamu Owoyele and Martin Schilling and Rohan Sawahn and Niklas Kaemer
and Pavel Zherebenkov and Bhuvanesh Verma and Wim Pouw and Gerard de Melo},
year = {2024},
eprint = {2408.03185},
archiveprefix = {arXiv},
primaryclass = {cs.CR},
url = {https://arxiv.org/abs/2408.03185}
}
2024.
DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data
Perturbations and MinMax Training.
BibTeX
@misc{Verma:Raithel:2024,
title = {DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data
Perturbations and MinMax Training},
author = {Bhuvanesh Verma and Lisa Raithel},
year = {2024},
eprint = {2405.00321},
archiveprefix = {arXiv},
primaryclass = {cs.CL},
url = {https://arxiv.org/abs/2405.00321}
}
August, 2024.
Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot
Relation Extraction for Pharmacovigilance in French, German,
and Japanese. Proceedings of The 9th Social Media Mining for Health Research
and Applications (SMM4H 2024) Workshop and Shared Tasks, 170–182.
BibTeX
@inproceedings{Raithel:et:al:2024,
title = {Overview of {\#}{SMM}4{H} 2024 {--} Task 2: Cross-Lingual Few-Shot
Relation Extraction for Pharmacovigilance in {F}rench, {G}erman,
and {J}apanese},
author = {Raithel, Lisa and Thomas, Philippe and Verma, Bhuvanesh and Roller, Roland
and Yeh, Hui-Syuan and Yada, Shuntaro and Grouin, Cyril and Wakamiya, Shoko
and Aramaki, Eiji and M{\"o}ller, Sebastian and Zweigenbaum, Pierre},
editor = {Xu, Dongfang and Gonzalez-Hernandez, Graciela},
booktitle = {Proceedings of The 9th Social Media Mining for Health Research
and Applications (SMM4H 2024) Workshop and Shared Tasks},
month = {aug},
year = {2024},
address = {Bangkok, Thailand},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2024.smm4h-1.39/},
pages = {170--182},
abstract = {This paper provides an overview of Task 2 from the Social Media
Mining for Health 2024 shared task ({\#}SMM4H 2024), which focused
on Named Entity Recognition (NER, Subtask 2a) and the joint task
of NER and Relation Extraction (RE, Subtask 2b) for detecting
adverse drug reactions (ADRs) in German, Japanese, and French
texts written by patients. Participants were challenged with a
few-shot learning scenario, necessitating models that can effectively
generalize from limited annotated examples. Despite the diverse
strategies employed by the participants, the overall performance
across submissions from three teams highlighted significant challenges.
The results underscored the complexity of extracting entities
and relations in multi-lingual contexts, especially from the noisy
and informal nature of user-generated content. Further research
is required to develop robust systems capable of accurately identifying
and associating ADR-related information in low-resource and multilingual
settings.}
}
2022
2020
2020.
Estimating electrification using multi-temporal DMSP/OLS night
imagery as proxy measure of human well-being in India. Spatial Information Research, 28:469–473.
BibTeX
@article{Paul:et:al:2020,
title = {Estimating electrification using multi-temporal DMSP/OLS night
imagery as proxy measure of human well-being in India},
author = {Paul, Arati and Verma, Bhuvanesh and Chakraborty, Debasish},
journal = {Spatial Information Research},
volume = {28},
issn = {2366-3294},
pages = {469--473},
year = {2020},
url = {http://dx.doi.org/10.1007/s41324-019-00307-8},
doi = {10.1007/s41324-019-00307-8},
publisher = {Springer}
}
