Dr. Mounika Marreddy

Postdoctoral Researcher

Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 401e
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:

Thesis topic proposals

2025

Bachelor Thesis: Modelling Task Effects: Human-AI Judgements.
Description
Motivation
Given both abstract and concrete words, does a word (eg: Apartment) always have the same representation? If the word appeared in different views (with picture, in a sentence, and in a word cloud), does the task (can you take rest in it?) under which the word presented with different views alter the word representation? The idea is to label these task effects associated with words labelled by humans align with labels generated by LLMs?

Tasks
  1. Create questions related to daily life activities and how relevant these questions with abstract and concrete words.
  2. The dataset is periera dataset consisting of 180 words (120 concrete and 60 abstract). Every word is presented in three views.
  3. Using human annotators we can annotate how relevant a word w.r.t task in a given view.
  4. Do the similar analysis with LLMs where we can annotate relevance of a word w.r.t task in a given view. 5. Evaluate the human judgements with LLM based annotators and provide unique capabilities and drawbacks of these LLMs (GPT-3.5,4, and Mistral models)

Main Research Questions
To what extent do large language model (LLM) annotations of task relevance for concepts align with human annotations across different representational views and concept types?

View-Based Alignment: How does the similarity between human and LLM annotations of concept relevance vary across different presentation views (e.g., sentences, pictures, word clouds)? b. In which representational view is the alignment between human and LLM relevance ratings most pronounced?

Concept Type Sensitivity: Do LLMs align more closely with human judgments for concrete concepts than for abstract concepts, or vice versa? b. How does the type of concept (abstract vs. concrete) affect the degree of alignment between human and LLM annotations?

Interaction Effects: Is there an interaction between concept type and representational view that influences the alignment of task-relevance judgments between humans and LLMs? b. Are certain combinations of view and concept type particularly conducive to high alignment between LLM and human annotations?

Cognitive Modeling Potential: Can LLMs effectively model human-like concept relevance judgments across varying contexts of presentation and abstraction?

Goal
Should work towards publishing the work in a conference or workshop.

References


Corresponding Lab Member: Mounika Marreddy and Alexander Mehler.
Bachelor Thesis: Human–AI Annotator Alignment across Multiple NLP Tasks.
Description
Motivation
Recent large language models now outperform many complex benchmark NLP tasks like reading comprehension, mathschool grading, coding, and boolean algebra. This shift has been fruitful and yield human benchmark performance on reasoning tasks. However, the extent to which the labels annotated by humans align with those used by LLMs remains unclear? Do large language models that are better at annotating longer user conversations and naturally learn more human-like conceptual representations? Do scaling of parameters in LLMs helps in alignment with human judgements?

Tasks
  1. Explore pre-existing LLMs based approaches to pre-label or pre-tag parts of data for various NLP tasks e.g. text classification, natural language inference, sentiment analysis and subjectivity tasks like stance and so on. Here we can consider gold datasets as well.
  2. Investigation of how LLMs as annotators process tasks in the world in the same way that humans do.
  3. Development of a user-friendly interface for input of annotation guidelines, formation of prompts, review/modify model generated annotations and provide feedback.
  4. Exploration of evaluations methods for LLM-assisted annotations. 5. Shared and unique capabilities of LLMs and their alignment with human judgments.
Main Research Questions
  1. Does LLM parameter scaling significantly improve alignment with human labels in subjective NLP tasks?
  2. How do annotation prompt designs and task instructions influence the degree of alignment between LLM-generated labels and human labels across different NLP tasks?
  3. How do different demographic backgrounds of annotators (e.g., race, gender, cultural context) affect the alignment between LLM-generated annotations and human labels in subjective or socially sensitive NLP tasks?
Goal
Should work towards publishing the work in a conference or workshop.

References
Tan, Zhen, et al. Large Language Models for Data Annotation: A Survey. https://arxiv.org/pdf/2402.13446.pdf
Corresponding Lab Member: Mounika Marreddy and Alexander Mehler.

Publications

2025

Mounika Marreddy, Subba Reddy Oota and Manish Gupta. 2025. Large language models are human-like annotators. European Conference on Information Retrieval, 291–299.
BibTeX
@inproceedings{marreddy:et:al:2025-ecir,
  title     = {Large language models are human-like annotators},
  author    = {Marreddy, Mounika and Oota, Subba Reddy and Gupta, Manish},
  booktitle = {European Conference on Information Retrieval},
  pages     = {291--299},
  year      = {2025},
  organization = {Springer}
}
Mounika Marreddy, Subba Reddy Oota, Venkata Charan Chinni, Manish Gupta and Lucie Flek. 2025. USDC: A Dataset of User Stance and Dogmatism in Long Conversations. Findings of ACL.
BibTeX
@article{marreddy:et:al:2025,
  title     = {USDC: A Dataset of User Stance and Dogmatism in Long Conversations},
  author    = {Marreddy, Mounika and Oota, Subba Reddy and Chinni, Venkata Charan
               and Gupta, Manish and Flek, Lucie},
  journal   = {Findings of ACL},
  year      = {2025}
}

2023

Pavan Kalyan Reddy Neerudu, Subba Reddy Oota, Mounika Marreddy, Venkateswara Rao Kagita and Manish Gupta. 2023. On robustness of finetuned transformer-based nlp models. arXiv preprint arXiv:2305.14453.
BibTeX
@article{Marreddy:et:al:2023emnlp,
  title     = {On robustness of finetuned transformer-based nlp models},
  author    = {Neerudu, Pavan Kalyan Reddy and Oota, Subba Reddy and Marreddy, Mounika
               and Kagita, Venkateswara Rao and Gupta, Manish},
  journal   = {arXiv preprint arXiv:2305.14453},
  year      = {2023}
}
Subba Reddy Oota, Mounika Marreddy, Manish Gupta and Raju Bapi. 2023. How does the brain process syntactic structure while listening?. Findings of the Association for Computational Linguistics: ACL 2023, 6624–6647.
BibTeX
@inproceedings{Marreddy:et:al:2023acl,
  title     = {How does the brain process syntactic structure while listening?},
  author    = {Oota, Subba Reddy and Marreddy, Mounika and Gupta, Manish and Bapi, Raju},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
  pages     = {6624--6647},
  year      = {2023}
}
Subba Reddy Oota, Khushbu Pahwa, Mounika Marreddy, Manish Gupta and Bapi S Raju. 2023. Neural architecture of speech. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5.
BibTeX
@inproceedings{Marreddy:et:al:2023icassp,
  title     = {Neural architecture of speech},
  author    = {Oota, Subba Reddy and Pahwa, Khushbu and Marreddy, Mounika and Gupta, Manish
               and Raju, Bapi S},
  booktitle = {ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech
               and Signal Processing (ICASSP)},
  pages     = {1--5},
  year      = {2023},
  organization = {IEEE}
}

2022

Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Charan Chinni and Radhika Mamidi. 2022. Multi-task text classification using graph convolutional networks for large-scale low resource language. 2022 international joint conference on neural networks (IJCNN), 1–8.
BibTeX
@inproceedings{marreddy:et:al:2022multi,
  title     = {Multi-task text classification using graph convolutional networks
               for large-scale low resource language},
  author    = {Marreddy, Mounika and Oota, Subba Reddy and Vakada, Lakshmi Sireesha
               and Chinni, Venkata Charan and Mamidi, Radhika},
  booktitle = {2022 international joint conference on neural networks (IJCNN)},
  pages     = {1--8},
  year      = {2022},
  organization = {IEEE}
}
Subba Reddy Oota, Jashn Arora, Veeral Agarwal, Mounika Marreddy, Manish Gupta and Bapi Raju Surampudi. 2022. Neural language taskonomy: Which NLP tasks are the most predictive of fMRI brain activity?. arXiv preprint arXiv:2205.01404.
BibTeX
@article{Oota:et:al:2022,
  title     = {Neural language taskonomy: Which NLP tasks are the most predictive
               of fMRI brain activity?},
  author    = {Oota, Subba Reddy and Arora, Jashn and Agarwal, Veeral and Marreddy, Mounika
               and Gupta, Manish and Surampudi, Bapi Raju},
  journal   = {arXiv preprint arXiv:2205.01404},
  url       = {https://arxiv.org/pdf/2205.01404},
  year      = {2022},
  abstract  = {Several popular Transformer based language models have been found
               to be successful for text-driven brain encoding. However, existing
               literature leverages only pretrained text Transformer models and
               has not explored the efficacy of task-specific learned Transformer
               representations. In this work, we explore transfer learning from
               representations learned for ten popular natural language processing
               tasks (two syntactic and eight semantic) for predicting brain
               responses from two diverse datasets: Pereira (subjects reading
               sentences from paragraphs) and Narratives (subjects listening
               to the spoken stories). Encoding models based on task features
               are used to predict activity in different regions across the whole
               brain. Features from coreference resolution, NER, and shallow
               syntax parsing explain greater variance for the reading activity.
               On the other hand, for the listening activity, tasks such as paraphrase
               generation, summarization, and natural language inference show
               better encoding performance. Experiments across all 10 task representations
               provide the following cognitive insights: (i) language left hemisphere
               has higher predictive brain activity versus language right hemisphere,
               (ii) posterior medial cortex, temporoparieto-occipital junction,
               dorsal frontal lobe have higher correlation versus early auditory
               and auditory association cortex, (iii) syntactic and semantic
               tasks display a good predictive performance across brain regions
               for reading and listening stimuli resp},
  pdf       = {https://arxiv.org/pdf/2205.01404}
}
Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Charan Chinni and Radhika Mamidi. 2022. Am I a resource-poor language? Data sets, embeddings, models and analysis for four different NLP tasks in telugu language. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(1).
BibTeX
@article{Marreddy:et:al:2022,
  title     = {Am I a resource-poor language? Data sets, embeddings, models and
               analysis for four different NLP tasks in telugu language},
  author    = {Marreddy, Mounika and Oota, Subba Reddy and Vakada, Lakshmi Sireesha
               and Chinni, Venkata Charan and Mamidi, Radhika},
  journal   = {ACM Transactions on Asian and Low-Resource Language Information Processing},
  volume    = {22},
  number    = {1},
  numpages  = {34},
  articleno = {18},
  year      = {2022},
  issn      = {2375-4699},
  url       = {https://doi.org/10.1145/3531535},
  doi       = {10.1145/3531535},
  publisher = {Association for Computing Machinery},
  abstract  = {Due to the lack of a large annotated corpus, many resource-poor
               Indian languages struggle to reap the benefits of recent deep
               feature representations in Natural Language Processing (NLP).
               Moreover, adopting existing language models trained on large English
               corpora for Indian languages is often limited by data availability,
               rich morphological variation, syntax, and semantic differences.
               In this paper, we explore the traditional to recent efficient
               representations to overcome the challenges of a low resource language,
               Telugu. In particular, our main objective is to mitigate the low-resource
               problem for Telugu. Overall, we present several contributions
               to a resource-poor language viz. Telugu. (i) a large annotated
               data (35,142 sentences in each task) for multiple NLP tasks such
               as sentiment analysis, emotion identification, hate-speech detection,
               and sarcasm detection, (ii) we create different lexicons for sentiment,
               emotion, and hate-speech for improving the efficiency of the models,
               (iii) pretrained word and sentence embeddings, and (iv) different
               pretrained language models for Telugu such as ELMo-Te, BERT-Te,
               RoBERTa-Te, ALBERT-Te, and DistilBERT-Te on a large Telugu corpus
               consisting of 8,015,588 sentences (1,637,408 sentences from Telugu
               Wikipedia and 6,378,180 sentences crawled from different Telugu
               websites). Further, we show that these representations significantly
               improve the performance of four NLP tasks and present the benchmark
               results for Telugu. We argue that our pretrained embeddings are
               competitive or better than the existing multilingual pretrained
               models: mBERT, XLM-R, and IndicBERT. Lastly, the fine-tuning of
               pretrained models show higher performance than linear probing
               results on four NLP tasks with the following F1-scores: Sentiment
               (68.72), Emotion (58.04), Hate-Speech (64.27), and Sarcasm (77.93).
               We also experiment on publicly available Telugu datasets (Named
               Entity Recognition, Article Genre Classification, and Sentiment
               Analysis) and find that our Telugu pretrained language models
               (BERT-Te and RoBERTa-Te) outperform the state-of-the-art system
               except for the sentiment task. We open-source our corpus, four
               different datasets, lexicons, embeddings, and code  https://github.com/Cha14ran/DREAM-T.
               The pretrained Transformer models for Telugu are available at
                https://huggingface.co/ltrctelugu.},
  pdf       = {https://dl.acm.org/doi/pdf/10.1145/3531535}
}

2021

Mounika Marreddy, Subba Reddy Oota, Lakshmi Sireesha Vakada, Venkata Charan Chinni and Radhika Mamidi. 2021. Clickbait detection in telugu: Overcoming nlp challenges in resource-poor languages using benchmarked techniques. 2021 International Joint Conference on Neural Networks (IJCNN), 1–8.
BibTeX
@inproceedings{Marreddy:et:al:2011,
  title     = {Clickbait detection in telugu: Overcoming nlp challenges in resource-poor
               languages using benchmarked techniques},
  author    = {Marreddy, Mounika and Oota, Subba Reddy and Vakada, Lakshmi Sireesha
               and Chinni, Venkata Charan and Mamidi, Radhika},
  booktitle = {2021 International Joint Conference on Neural Networks (IJCNN)},
  pages     = {1--8},
  year      = {2021},
  organization = {IEEE},
  doi       = {10.1109/IJCNN52387.2021.9534382},
  url       = {https://ieeexplore.ieee.org/document/9534382},
  abstract  = {Clickbait headlines have become a nudge in social media and news
               websites. The methods to identify clickbaits are largely being
               developed for En- glish. There is a need for the same in other
               languages as well with the increase in the usage of social me-
               dia platforms in different languages. In this work, we present
               an annotated clickbait dataset of 112,657 headlines that can be
               used for building an automated clickbait detection system for
               Telugu, a resource-poor language. Our contribution in this paper
               includes (i) generation of the latest pre-trained language models,
               including RoBERTa, ALBERT, and ELECTRA trained on a large Telugu
               corpora of 8,015,588 sentences that we had collected, (ii) data
               analysis and benchmarking the performance of different approaches
               ranging from hand-crafted features to state-of-the-art models.
               We show that the pre-trained language models trained on Telugu
               outperform the existing pre-trained models viz. BERT-Mulingual-Case,
               XLM-MLM, and XLM-R on clickbait task. On a large Telugu clickbait
               dataset of 112,657 samples, the Light Gradient Boosted Machines
               (LGBM) model achieves an F1- score of 0.94 for clickbait headlines.
               For Non-Clickbait headlines, F1-score of 0.93 is obtained which
               is similar to that of Clickbait class. We open-source our dataset,
               pre-trained models, and code}
}