Bhuvanesh Verma – Text Technology Lab

Research assistant

Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401c
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:

Current Projects

Thesis topic proposals

2025

Bachelor Thesis: Full-text Scientific Argument Mining using Large Language Models.

Description

Scientific articles contain a mix of argumentative and non-argumentative content, yet only argumentative sentences, particularly claims, contribute to the scientific discourse and are therefore central to argument mining. A key challenge is not only to identify whether a sentence expresses a claim, but also to distinguish between own claims (novel contributions by the author), background claims (statements grounded in prior work, often signaled by citations), data or evidence (empirical results that support claims), and non-argumentative content (methodological or descriptive text). This project proposes to address the task of claim detection and classification in full-text scientific articles by leveraging large language models, beginning with binary classification of claim versus non-claim sentences and extending to multi-class classification across the four categories. The approach will explore prompt-based classification and domain-specific fine-tuning, with the potential integration of citation-aware heuristics, aiming to establish a robust baseline for scientific claim detection as a foundation for downstream argument mining tasks.

See also:

Corresponding Lab Member: Bhuvanesh Verma and Alexander Mehler.

Bachelor Thesis: Can we use scientific mentions to reconstruct or identify scientific argumentative text?.

Description

Scientific articles contain numerous mentions of datasets, methods, tasks, and metrics, which capture essential elements of the scientific discourse. A key question is whether these scientific mentions and their interrelations can be leveraged to reconstruct or identify argumentative text, such as claims and supporting evidence. Existing resources like SciER and SciREX provide annotations for such mentions and their relations, which can be used to detect how claims are formulated or to identify sentences that express claims in context. Beyond leveraging existing mentions, identifying additional scientific entities and their relations could further enrich the representation of scientific arguments. Given the current lack of full-text scientific argument mining datasets, this task has the potential to support the creation of a large-scale corpus of argumentative sentences and their relational structure, providing a foundation for downstream tasks in scientific argument mining and automated knowledge extraction. See also:

Corresponding Lab Member: Bhuvanesh Verma and Alexander Mehler.

If you have any suggestions of your own relating to this or our other proposed topics, please do not hesitate to contact us.

In addition, we provide a mailing list for free, which we use to inform regularly about updates on new qualification and research work as well as other information relating to Texttechnology.

Publications

2026

Ali Abusaleh, Bhuvanesh Verma and Alexander Mehler. 2026. TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic. Proceedings of the 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7), co-located with the Language Resources and Evaluation Conference (LREC 2026). accepted.

BibTeX

@inproceedings{Abusaleh:et:al:2026:sarf,
  title     = {TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based
               Fusion for Multi-Dialectal Arabic},
  author    = {Abusaleh, Ali and Verma, Bhuvanesh and Mehler, Alexander},
  booktitle = {Proceedings of the 7th Workshop on Open-Source Arabic Corpora
               and Processing Tools (OSACT7), co-located with the Language Resources
               and Evaluation Conference (LREC 2026)},
  eventdate = {May, 2026},
  location  = {Palma, Mallorca, Spain},
  year      = {2026},
  keywords  = {NLP, Sentiment Analysis, Arabic analysis, new-data-spaces, circlet, satek},
  abstract  = {Arabic sentiment analysis is challenged by morphological complexity
               and lexical variation across Arabic dialects, compounded by subjectivity
               in how speakers and writers express sentiment. In this paper,
               we present our submission for the AraSentEval 2026 Shared Task
               on Arabic Dialect Sentiment Analysis. We propose SARF (صرف) a
               multi-view architectural framework that integrates surface-level
               context with stemmed and rooted morphological perspectives using
               a shared MARBERTv2 encoder. Our system employs a hybrid BERT-CNN-BiLSTM-Attention
               architecture to capture both local sentiment n-grams and global
               sequential dependencies. Experimental results show that while
               individual morphological normalization strategies (stemming or
               rooting) may degrade performance, their joint integration via
               cross-morphological attention provides robust features across
               diverse dialects. Our final system achieved a competitive macro-F1-score
               of 0.9263, ranking 2nd out of 15 participating teams.},
  note      = {accepted}
}

Bhuvanesh Verma and Alexander Mehler. 2026. Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.

BibTeX

@inproceedings{Verma:Mehler:2026,
  title     = {Predicting Topic (Co-)Occurrence Using Topic Networks Built from
               the Project Gutenberg Corpus},
  booktitle = {Proceedings of the 15th International Conference on Language Resources
               and Evaluation (LREC 2026)},
  year      = {2026},
  author    = {Verma, Bhuvanesh and Mehler, Alexander},
  keywords  = {Topic Evolution, Topic Network,Time-aware Networks, Temporal Autocorrelation, Project Gutenberg, satek},
  abstract  = {Although temporal topic modeling has been widely applied to scientific
               and legal texts, literary corpora have largely been overlooked
               in this regard. To address this issue, we analyze topic evolution
               in a subset of the Project Gutenberg (PG) corpus. We model this
               subset as a sequence of topic networks that capture the emergence,
               persistence, and interaction of thematic structures over decades.
               Using supervised topic representations, we predict nodes (topics)
               and edges (topic pairings) to forecast future topics and their
               co-occurrence. Our experiments demonstrate moderate to strong
               temporal persistence in topic connectivity patterns across three
               topic systems, with ROC-AUC and AP values consistently above 0.85.
               We find that the temporal span of topic networks significantly
               impacts predictive performance: longer spans improve the stability
               and recall of topic presence, while shorter spans better capture
               evolving topic relationships. Overall, our findings demonstrate
               the predictability of topics in literary texts over time.},
  note      = {accepted}
}

Bhuvanesh Verma, Mounika Marreddy and Alexander Mehler. 2026. Predicting Convincingness in Political Speech: How Emotional Tone Shapes Persuasive Strength. Proceedings of the 15th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis. accepted.

BibTeX

@inproceedings{Verma:et:al:2026,
  title     = {Predicting Convincingness in Political Speech: How Emotional Tone
               Shapes Persuasive Strength},
  booktitle = {Proceedings of the 15th Workshop on Computational Approaches to
               Subjectivity, Sentiment, \& Social Media Analysis},
  year      = {2026},
  author    = {Verma, Bhuvanesh and Marreddy, Mounika and Mehler, Alexander},
  keywords  = {Argument Detection, Argument Quality Assessment,Topic Modelling, Persuasiveness, Convincingness, Emotion Analysis, Argument Mining, satek},
  abstract  = {Emotional tone plays a central role in persuasion, yet its impact
               on computational assessments of political argument quality in
               real world election campaign speeches remains understudied. In
               this work, we investigate whether positive emotional framing correlates
               with higher perceived convincingness in political arguments. We
               fine-tune language models on argument quality datasets and test
               their ability to transfer convincingness predictions to real-world
               campaign speeches. Using a corpus of U.S. presidential campaign
               speeches, we analyze emotional polarity in relation to predicted
               persuasive strength to test whether positively framed arguments
               are judged more convincing than neutral or negative ones. Our
               empirical analysis shows that political parties rely heavily on
               argumentation during their election campaigns. Also, we found
               the evidence that politicians strategically employ emotional cues
               within their arguments during these campaign speeches, with positive
               emotions being more strongly associated with persuasive strength,
               for example in topics such as USMCA’s Effect on American Jobs
               and Agriculture, Border Control Policies, Progressive Tax Reforms.
               At the same time, we find that negative emotions have a weaker
               yet still non-negligible influence on voter persuasion in topics
               such as City Crime and Civil Unrest and White Supremacist Violence
               (Charlottesville Incident).},
  note      = {accepted}
}

2024

Babajide Owoyele, Bhuvanesh Verma, Victor Omolaoye, Jonathan Antonio Edelman, Derk Loorbach and Gerard de Melo. 2024. Socio-Semantic X-Ray of Multi-Actor Constellations using Topics and Interstitial Authors: A Toolkit for Augmenting Computational Literature Reviews. Available at SSRN 4713155.

BibTeX

@article{Owoyele:et:al:2020,
  title     = {Socio-Semantic X-Ray of Multi-Actor Constellations using Topics
               and Interstitial Authors: A Toolkit for Augmenting Computational
               Literature Reviews},
  author    = {Owoyele, Babajide and Verma, Bhuvanesh and Omolaoye, Victor and Edelman, Jonathan Antonio
               and Loorbach, Derk and de Melo, Gerard},
  journal   = {Available at SSRN 4713155},
  doi       = {10.2139/ssrn.4713155},
  url       = {https://dx.doi.org/10.2139/ssrn.4713155},
  year      = {2024}
}

Babajide Alamu Owoyele, Martin Schilling, Rohan Sawahn, Niklas Kaemer, Pavel Zherebenkov, Bhuvanesh Verma, Wim Pouw and Gerard de Melo. 2024. MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving.

BibTeX

@misc{Owoyele:et:al:2024,
  title     = {MaskAnyone Toolkit: Offering Strategies for Minimizing Privacy
               Risks and Maximizing Utility in Audio-Visual Data Archiving},
  author    = {Babajide Alamu Owoyele and Martin Schilling and Rohan Sawahn and Niklas Kaemer
               and Pavel Zherebenkov and Bhuvanesh Verma and Wim Pouw and Gerard de Melo},
  year      = {2024},
  eprint    = {2408.03185},
  archiveprefix = {arXiv},
  primaryclass = {cs.CR},
  url       = {https://arxiv.org/abs/2408.03185}
}

Bhuvanesh Verma and Lisa Raithel. 2024. DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training.

BibTeX

@misc{Verma:Raithel:2024,
  title     = {DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data
               Perturbations and MinMax Training},
  author    = {Bhuvanesh Verma and Lisa Raithel},
  year      = {2024},
  eprint    = {2405.00321},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url       = {https://arxiv.org/abs/2405.00321}
}

Lisa Raithel, Philippe Thomas, Bhuvanesh Verma, Roland Roller, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Shoko Wakamiya, Eiji Aramaki, Sebastian Möller and Pierre Zweigenbaum. August, 2024. Overview of #SMM4H 2024 – Task 2: Cross-Lingual Few-Shot Relation Extraction for Pharmacovigilance in French, German, and Japanese. Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, 170–182.

BibTeX

@inproceedings{Raithel:et:al:2024,
  title     = {Overview of {\#}{SMM}4{H} 2024 {--} Task 2: Cross-Lingual Few-Shot
               Relation Extraction for Pharmacovigilance in {F}rench, {G}erman,
               and {J}apanese},
  author    = {Raithel, Lisa and Thomas, Philippe and Verma, Bhuvanesh and Roller, Roland
               and Yeh, Hui-Syuan and Yada, Shuntaro and Grouin, Cyril and Wakamiya, Shoko
               and Aramaki, Eiji and M{\"o}ller, Sebastian and Zweigenbaum, Pierre},
  editor    = {Xu, Dongfang and Gonzalez-Hernandez, Graciela},
  booktitle = {Proceedings of The 9th Social Media Mining for Health Research
               and Applications (SMM4H 2024) Workshop and Shared Tasks},
  month     = {aug},
  year      = {2024},
  address   = {Bangkok, Thailand},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2024.smm4h-1.39/},
  pages     = {170--182},
  abstract  = {This paper provides an overview of Task 2 from the Social Media
               Mining for Health 2024 shared task ({\#}SMM4H 2024), which focused
               on Named Entity Recognition (NER, Subtask 2a) and the joint task
               of NER and Relation Extraction (RE, Subtask 2b) for detecting
               adverse drug reactions (ADRs) in German, Japanese, and French
               texts written by patients. Participants were challenged with a
               few-shot learning scenario, necessitating models that can effectively
               generalize from limited annotated examples. Despite the diverse
               strategies employed by the participants, the overall performance
               across submissions from three teams highlighted significant challenges.
               The results underscored the complexity of extracting entities
               and relations in multi-lingual contexts, especially from the noisy
               and informal nature of user-generated content. Further research
               is required to develop robust systems capable of accurately identifying
               and associating ADR-related information in low-resource and multilingual
               settings.}
}

2022

Arne Binder, Bhuvanesh Verma and Leonhard Hennig. 2022. Full-Text Argumentation Mining on Scientific Publications.

BibTeX

@misc{Binder:et:al:2022,
  title     = {Full-Text Argumentation Mining on Scientific Publications},
  author    = {Arne Binder and Bhuvanesh Verma and Leonhard Hennig},
  year      = {2022},
  eprint    = {2210.13084},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url       = {https://arxiv.org/abs/2210.13084}
}

2020

Arati Paul, Bhuvanesh Verma and Debasish Chakraborty. 2020. Estimating electrification using multi-temporal DMSP/OLS night imagery as proxy measure of human well-being in India. Spatial Information Research, 28:469–473.

BibTeX

@article{Paul:et:al:2020,
  title     = {Estimating electrification using multi-temporal DMSP/OLS night
               imagery as proxy measure of human well-being in India},
  author    = {Paul, Arati and Verma, Bhuvanesh and Chakraborty, Debasish},
  journal   = {Spatial Information Research},
  volume    = {28},
  issn      = {2366-3294},
  pages     = {469--473},
  year      = {2020},
  url       = {http://dx.doi.org/10.1007/s41324-019-00307-8},
  doi       = {10.1007/s41324-019-00307-8},
  publisher = {Springer}
}