
Postdoctoral Researcher
Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401e
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:
Thesis topic proposals
2025
Bachelor Thesis: Modelling Task Effects: Human-AI Judgements.
Description
Motivation
Given both abstract and concrete words, does a word (eg: Apartment) always have the same representation? If the word appeared in different views (with picture, in a sentence, and in a word cloud), does the task (can you take rest in it?) under which the word presented with different views alter the word representation? The idea is to label these task effects associated with words labelled by humans align with labels generated by LLMs?
Tasks
Main Research Questions
To what extent do large language model (LLM) annotations of task relevance for concepts align with human annotations across different representational views and concept types?
View-Based Alignment: How does the similarity between human and LLM annotations of concept relevance vary across different presentation views (e.g., sentences, pictures, word clouds)? b. In which representational view is the alignment between human and LLM relevance ratings most pronounced?
Concept Type Sensitivity: Do LLMs align more closely with human judgments for concrete concepts than for abstract concepts, or vice versa? b. How does the type of concept (abstract vs. concrete) affect the degree of alignment between human and LLM annotations?
Interaction Effects: Is there an interaction between concept type and representational view that influences the alignment of task-relevance judgments between humans and LLMs? b. Are certain combinations of view and concept type particularly conducive to high alignment between LLM and human annotations?
Cognitive Modeling Potential: Can LLMs effectively model human-like concept relevance judgments across varying contexts of presentation and abstraction?
Goal
Should work towards publishing the work in a conference or workshop.
References
Corresponding Lab Member:
Given both abstract and concrete words, does a word (eg: Apartment) always have the same representation? If the word appeared in different views (with picture, in a sentence, and in a word cloud), does the task (can you take rest in it?) under which the word presented with different views alter the word representation? The idea is to label these task effects associated with words labelled by humans align with labels generated by LLMs?
Tasks
- Create questions related to daily life activities and how relevant these questions with abstract and concrete words.
- The dataset is periera dataset consisting of 180 words (120 concrete and 60 abstract). Every word is presented in three views.
- Using human annotators we can annotate how relevant a word w.r.t task in a given view.
- Do the similar analysis with LLMs where we can annotate relevance of a word w.r.t task in a given view. 5. Evaluate the human judgements with LLM based annotators and provide unique capabilities and drawbacks of these LLMs (GPT-3.5,4, and Mistral models)
Main Research Questions
To what extent do large language model (LLM) annotations of task relevance for concepts align with human annotations across different representational views and concept types?
View-Based Alignment: How does the similarity between human and LLM annotations of concept relevance vary across different presentation views (e.g., sentences, pictures, word clouds)? b. In which representational view is the alignment between human and LLM relevance ratings most pronounced?
Concept Type Sensitivity: Do LLMs align more closely with human judgments for concrete concepts than for abstract concepts, or vice versa? b. How does the type of concept (abstract vs. concrete) affect the degree of alignment between human and LLM annotations?
Interaction Effects: Is there an interaction between concept type and representational view that influences the alignment of task-relevance judgments between humans and LLMs? b. Are certain combinations of view and concept type particularly conducive to high alignment between LLM and human annotations?
Cognitive Modeling Potential: Can LLMs effectively model human-like concept relevance judgments across varying contexts of presentation and abstraction?
Goal
Should work towards publishing the work in a conference or workshop.
References
- https://www.nature.com/articles/s41467-018-03068-4.pdf
- https://proceedings.neurips.cc/paper/2020/file/38a8e18d75e95ca619af8df0da1417f2-Paper.pdf
Corresponding Lab Member:
Bachelor Thesis: Human–AI Annotator Alignment across Multiple NLP Tasks.
Description
Motivation
Recent large language models now outperform many complex benchmark NLP tasks like reading comprehension, mathschool grading, coding, and boolean algebra. This shift has been fruitful and yield human benchmark performance on reasoning tasks. However, the extent to which the labels annotated by humans align with those used by LLMs remains unclear? Do large language models that are better at annotating longer user conversations and naturally learn more human-like conceptual representations? Do scaling of parameters in LLMs helps in alignment with human judgements?
Tasks
Should work towards publishing the work in a conference or workshop.
References
Tan, Zhen, et al. Large Language Models for Data Annotation: A Survey. https://arxiv.org/pdf/2402.13446.pdf
Corresponding Lab Member:
Recent large language models now outperform many complex benchmark NLP tasks like reading comprehension, mathschool grading, coding, and boolean algebra. This shift has been fruitful and yield human benchmark performance on reasoning tasks. However, the extent to which the labels annotated by humans align with those used by LLMs remains unclear? Do large language models that are better at annotating longer user conversations and naturally learn more human-like conceptual representations? Do scaling of parameters in LLMs helps in alignment with human judgements?
Tasks
- Explore pre-existing LLMs based approaches to pre-label or pre-tag parts of data for various NLP tasks e.g. text classification, natural language inference, sentiment analysis and subjectivity tasks like stance and so on. Here we can consider gold datasets as well.
- Investigation of how LLMs as annotators process tasks in the world in the same way that humans do.
- Development of a user-friendly interface for input of annotation guidelines, formation of prompts, review/modify model generated annotations and provide feedback.
- Exploration of evaluations methods for LLM-assisted annotations. 5. Shared and unique capabilities of LLMs and their alignment with human judgments.
- Does LLM parameter scaling significantly improve alignment with human labels in subjective NLP tasks?
- How do annotation prompt designs and task instructions influence the degree of alignment between LLM-generated labels and human labels across different NLP tasks?
- How do different demographic backgrounds of annotators (e.g., race, gender, cultural context) affect the alignment between LLM-generated annotations and human labels in subjective or socially sensitive NLP tasks?
Should work towards publishing the work in a conference or workshop.
References
Tan, Zhen, et al. Large Language Models for Data Annotation: A Survey. https://arxiv.org/pdf/2402.13446.pdf
Corresponding Lab Member:
Publications
2025
2025.
Large language models are human-like annotators. European Conference on Information Retrieval, 291–299.
BibTeX
@inproceedings{marreddy:et:al:2025-ecir,
title = {Large language models are human-like annotators},
author = {Marreddy, Mounika and Oota, Subba Reddy and Gupta, Manish},
booktitle = {European Conference on Information Retrieval},
pages = {291--299},
year = {2025},
organization = {Springer}
}
2025.
USDC: A Dataset of User Stance and Dogmatism in Long Conversations. Findings of ACL.
BibTeX
@article{marreddy:et:al:2025,
title = {USDC: A Dataset of User Stance and Dogmatism in Long Conversations},
author = {Marreddy, Mounika and Oota, Subba Reddy and Chinni, Venkata Charan
and Gupta, Manish and Flek, Lucie},
journal = {Findings of ACL},
year = {2025}
}
2023
2023.
On robustness of finetuned transformer-based nlp models. arXiv preprint arXiv:2305.14453.
BibTeX
@article{Marreddy:et:al:2023emnlp,
title = {On robustness of finetuned transformer-based nlp models},
author = {Neerudu, Pavan Kalyan Reddy and Oota, Subba Reddy and Marreddy, Mounika
and Kagita, Venkateswara Rao and Gupta, Manish},
journal = {arXiv preprint arXiv:2305.14453},
year = {2023}
}
2023.
How does the brain process syntactic structure while listening?. Findings of the Association for Computational Linguistics: ACL 2023, 6624–6647.
BibTeX
@inproceedings{Marreddy:et:al:2023acl,
title = {How does the brain process syntactic structure while listening?},
author = {Oota, Subba Reddy and Marreddy, Mounika and Gupta, Manish and Bapi, Raju},
booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
pages = {6624--6647},
year = {2023}
}
2023.
Neural architecture of speech. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 1–5.
BibTeX
@inproceedings{Marreddy:et:al:2023icassp,
title = {Neural architecture of speech},
author = {Oota, Subba Reddy and Pahwa, Khushbu and Marreddy, Mounika and Gupta, Manish
and Raju, Bapi S},
booktitle = {ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP)},
pages = {1--5},
year = {2023},
organization = {IEEE}
}
2022
2022.
Multi-task text classification using graph convolutional networks
for large-scale low resource language. 2022 international joint conference on neural networks (IJCNN), 1–8.
BibTeX
@inproceedings{marreddy:et:al:2022multi,
title = {Multi-task text classification using graph convolutional networks
for large-scale low resource language},
author = {Marreddy, Mounika and Oota, Subba Reddy and Vakada, Lakshmi Sireesha
and Chinni, Venkata Charan and Mamidi, Radhika},
booktitle = {2022 international joint conference on neural networks (IJCNN)},
pages = {1--8},
year = {2022},
organization = {IEEE}
}
2022.
Neural language taskonomy: Which NLP tasks are the most predictive
of fMRI brain activity?. arXiv preprint arXiv:2205.01404.
BibTeX
@article{Oota:et:al:2022,
title = {Neural language taskonomy: Which NLP tasks are the most predictive
of fMRI brain activity?},
author = {Oota, Subba Reddy and Arora, Jashn and Agarwal, Veeral and Marreddy, Mounika
and Gupta, Manish and Surampudi, Bapi Raju},
journal = {arXiv preprint arXiv:2205.01404},
url = {https://arxiv.org/pdf/2205.01404},
year = {2022},
abstract = {Several popular Transformer based language models have been found
to be successful for text-driven brain encoding. However, existing
literature leverages only pretrained text Transformer models and
has not explored the efficacy of task-specific learned Transformer
representations. In this work, we explore transfer learning from
representations learned for ten popular natural language processing
tasks (two syntactic and eight semantic) for predicting brain
responses from two diverse datasets: Pereira (subjects reading
sentences from paragraphs) and Narratives (subjects listening
to the spoken stories). Encoding models based on task features
are used to predict activity in different regions across the whole
brain. Features from coreference resolution, NER, and shallow
syntax parsing explain greater variance for the reading activity.
On the other hand, for the listening activity, tasks such as paraphrase
generation, summarization, and natural language inference show
better encoding performance. Experiments across all 10 task representations
provide the following cognitive insights: (i) language left hemisphere
has higher predictive brain activity versus language right hemisphere,
(ii) posterior medial cortex, temporoparieto-occipital junction,
dorsal frontal lobe have higher correlation versus early auditory
and auditory association cortex, (iii) syntactic and semantic
tasks display a good predictive performance across brain regions
for reading and listening stimuli resp},
pdf = {https://arxiv.org/pdf/2205.01404}
}
2022.
Am I a resource-poor language? Data sets, embeddings, models and
analysis for four different NLP tasks in telugu language. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(1).
BibTeX
@article{Marreddy:et:al:2022,
title = {Am I a resource-poor language? Data sets, embeddings, models and
analysis for four different NLP tasks in telugu language},
author = {Marreddy, Mounika and Oota, Subba Reddy and Vakada, Lakshmi Sireesha
and Chinni, Venkata Charan and Mamidi, Radhika},
journal = {ACM Transactions on Asian and Low-Resource Language Information Processing},
volume = {22},
number = {1},
numpages = {34},
articleno = {18},
year = {2022},
issn = {2375-4699},
url = {https://doi.org/10.1145/3531535},
doi = {10.1145/3531535},
publisher = {Association for Computing Machinery},
abstract = {Due to the lack of a large annotated corpus, many resource-poor
Indian languages struggle to reap the benefits of recent deep
feature representations in Natural Language Processing (NLP).
Moreover, adopting existing language models trained on large English
corpora for Indian languages is often limited by data availability,
rich morphological variation, syntax, and semantic differences.
In this paper, we explore the traditional to recent efficient
representations to overcome the challenges of a low resource language,
Telugu. In particular, our main objective is to mitigate the low-resource
problem for Telugu. Overall, we present several contributions
to a resource-poor language viz. Telugu. (i) a large annotated
data (35,142 sentences in each task) for multiple NLP tasks such
as sentiment analysis, emotion identification, hate-speech detection,
and sarcasm detection, (ii) we create different lexicons for sentiment,
emotion, and hate-speech for improving the efficiency of the models,
(iii) pretrained word and sentence embeddings, and (iv) different
pretrained language models for Telugu such as ELMo-Te, BERT-Te,
RoBERTa-Te, ALBERT-Te, and DistilBERT-Te on a large Telugu corpus
consisting of 8,015,588 sentences (1,637,408 sentences from Telugu
Wikipedia and 6,378,180 sentences crawled from different Telugu
websites). Further, we show that these representations significantly
improve the performance of four NLP tasks and present the benchmark
results for Telugu. We argue that our pretrained embeddings are
competitive or better than the existing multilingual pretrained
models: mBERT, XLM-R, and IndicBERT. Lastly, the fine-tuning of
pretrained models show higher performance than linear probing
results on four NLP tasks with the following F1-scores: Sentiment
(68.72), Emotion (58.04), Hate-Speech (64.27), and Sarcasm (77.93).
We also experiment on publicly available Telugu datasets (Named
Entity Recognition, Article Genre Classification, and Sentiment
Analysis) and find that our Telugu pretrained language models
(BERT-Te and RoBERTa-Te) outperform the state-of-the-art system
except for the sentiment task. We open-source our corpus, four
different datasets, lexicons, embeddings, and code https://github.com/Cha14ran/DREAM-T.
The pretrained Transformer models for Telugu are available at
https://huggingface.co/ltrctelugu.},
pdf = {https://dl.acm.org/doi/pdf/10.1145/3531535}
}
2021
2021.
Clickbait detection in telugu: Overcoming nlp challenges in resource-poor
languages using benchmarked techniques. 2021 International Joint Conference on Neural Networks (IJCNN), 1–8.
BibTeX
@inproceedings{Marreddy:et:al:2011,
title = {Clickbait detection in telugu: Overcoming nlp challenges in resource-poor
languages using benchmarked techniques},
author = {Marreddy, Mounika and Oota, Subba Reddy and Vakada, Lakshmi Sireesha
and Chinni, Venkata Charan and Mamidi, Radhika},
booktitle = {2021 International Joint Conference on Neural Networks (IJCNN)},
pages = {1--8},
year = {2021},
organization = {IEEE},
doi = {10.1109/IJCNN52387.2021.9534382},
url = {https://ieeexplore.ieee.org/document/9534382},
abstract = {Clickbait headlines have become a nudge in social media and news
websites. The methods to identify clickbaits are largely being
developed for En- glish. There is a need for the same in other
languages as well with the increase in the usage of social me-
dia platforms in different languages. In this work, we present
an annotated clickbait dataset of 112,657 headlines that can be
used for building an automated clickbait detection system for
Telugu, a resource-poor language. Our contribution in this paper
includes (i) generation of the latest pre-trained language models,
including RoBERTa, ALBERT, and ELECTRA trained on a large Telugu
corpora of 8,015,588 sentences that we had collected, (ii) data
analysis and benchmarking the performance of different approaches
ranging from hand-crafted features to state-of-the-art models.
We show that the pre-trained language models trained on Telugu
outperform the existing pre-trained models viz. BERT-Mulingual-Case,
XLM-MLM, and XLM-R on clickbait task. On a large Telugu clickbait
dataset of 112,657 samples, the Light Gradient Boosted Machines
(LGBM) model achieves an F1- score of 0.94 for clickbait headlines.
For Non-Clickbait headlines, F1-score of 0.93 is obtained which
is similar to that of Clickbait class. We open-source our dataset,
pre-trained models, and code}
}
