Student Research Topics

On this page you can find a series of potential topics for undergraduate qualification theses (Bachelor / Master) and research papers as well as opportunities for participation in research projects.

Please be reminded that these suggestions are intended as a selection and can be supplemented by your own suggestions. If you are interested and require further information, please contact the responsible staff member or arrange an appointment directly with Prof. Dr. Alexander Mehler.

In addition, we provide a mailing list for free, which we use to inform regularly about updates on new qualification and research work as well as other information relating to Texttechnology.

2025

Bachelor Thesis: Full-text Scientific Argument Mining using Large Language Models.
Description
Scientific articles contain a mix of argumentative and non-argumentative content, yet only argumentative sentences, particularly claims, contribute to the scientific discourse and are therefore central to argument mining. A key challenge is not only to identify whether a sentence expresses a claim, but also to distinguish between own claims (novel contributions by the author), background claims (statements grounded in prior work, often signaled by citations), data or evidence (empirical results that support claims), and non-argumentative content (methodological or descriptive text). This project proposes to address the task of claim detection and classification in full-text scientific articles by leveraging large language models, beginning with binary classification of claim versus non-claim sentences and extending to multi-class classification across the four categories. The approach will explore prompt-based classification and domain-specific fine-tuning, with the potential integration of citation-aware heuristics, aiming to establish a robust baseline for scientific claim detection as a foundation for downstream argument mining tasks.

See also:
Corresponding Lab Member: Bhuvanesh Verma and Alexander Mehler.
Bachelor/Master Thesis: Constructing and Evaluating Human Digital Twins from Browsing Data.
Description
The topic of digital twins, specifically Human Digital Twins (HDTs) is growing fast, with many works surveying the architectures, applications and ethical issues. One research problem concerns the creation of these HDTs. Browsing histories and search behavior provide a rich source of data detailing what people read and search. Utilizing this data a HDT might be able to predict a users behavior or responses, analogous to how advertisers profile users for targeted recommendations. The task of this thesis is to design a systems for creating such HDTs from browsing data as well as evaluating the effectiveness. For this a dataset of existing browsing data and user responses can be used although extending on it is encouraged. The system can be further extended into the area of survey studies to validate the accuracy and perceived relevance of the Human Digital Twin by comparing its predictions with self-reported user responses. This work lies at the intersection of human-computer interaction (HCI), generative modeling and computational social science. See also:
Corresponding Lab Member: Patrick Schrottenbacher and Alexander Mehler.
Bachelor/Master Thesis: Holodecks: Real-Time Interactive 3D Scene Generation with Large Language Models.
Description
The creation of immersive digital environments lies at the core of every immersive interaction. Designing such spaces, however, is time expensive and most systems do not support environment creation in real time or at runtime. Various research as has explored the generation of environments through the use of Large Language Models(LLMs) and pre-defined object databases, enabling a user to describe a scene which the LLM then reconstruct in 3D space. This remains an unsolved challenge, as the adjustable dimensions and flexibility of generated spaces are still limited. Furthermore as object generation based on LLM prompts grows ever more capable integrating such capabilities with scene generation becomes increasingly relevant. Real-time deployment of these systems could enable the creation of holodeck-like experiences, in which anything a user describes can be dynamically generated and interacted with in 3D space. The task of this thesis is to design an LLM-based framework that allows users to request the generation of 3D scenes through natural language input. This includes both partial descriptions of individual elements and complete, coherent scene specifications. Furthermore, the elements within the scene may optionally be derived from 3D models generated by a separate generative model. Furthermore these environments should not be confined to a singular user but instead be usable and sharable with multiple users. This work lies at the intersection of computational design, generative modeling, and human-computer interaction (HCI). It aims to contribute toward more intuitive, language-driven methods for creating and manipulating virtual environments. See also:
Corresponding Lab Member: Patrick Schrottenbacher and Alexander Mehler.
Master Thesis: Multi-Modal AI Agents for Immersive Virtual Reality.
Description
As virtual reality (VR) systems become ever more capable and immersive, the need for highly interactive environments continues to grow. One area of active research is that of AI agents, software entities that dynamically interact with their surroundings to perform tasks aligned with predefined goals. The incorporation of such agents into the landscape of virtual environments has met an equivalent high interest even in enterprise spaces e.g. with NVIDIAs Autonomous Game Characters. These systems can seamlessly converse with a user through natural language and even execute predefined tasks. However, in the multi-modal landscape enabled by VR technology, these predominantly auditory interactions remain limited compared to the modalities such a system could leverage. Furthermore, the rise in more modular and capable LLM-systems through systems like the Model Context Protocol allow for highly adaptable and extensible agents. The task of this thesis is to design and implement an AI agent that is not solely mono-modal. This could include aspects besides understanding natural spoken language, such as facial expressions, the environment surrounding the agent as well as physical interactions. These expanded input and output modalities should be reflected in the agent’s range of actions and behaviors.This work lies at the intersection of virtual reality, embodied artificial intelligence, and human-computer interaction (HCI). It aims to contribute to the development of more immersive, expressive, and responsive AI agents that bridge the gap between language-driven interaction and embodied presence in virtual environments. See also:
Corresponding Lab Member: Patrick Schrottenbacher and Alexander Mehler.
Bachelor Thesis: Language Forensics in the Age of AI: Retrospective Watermarking for Text Authenticity.
Description
Recent advances in large language models (LLMs) have made it increasingly difficult to distinguish between human-written and AI-generated text. While proactive watermarking techniques can embed detectable patterns during text generation, they rely on control over the generation process — a condition often unmet in real-world scenarios where AI-generated texts circulate freely online without prior tagging. This thesis explores retrospective watermarking and tagging — the post-hoc identification and labeling of AI-generated text after its creation and distribution. You investigate methods that combine linguistic stylometry, statistical signal analysis, and semantic fingerprinting to identify traces of machine generation. Furthermore, it examines how artificial "watermarks" can be retroactively embedded or inferred to improve downstream detection models and content source attribution. This work lies at the intersection of computational linguistics, machine learning, and digital forensics, and aims to address pressing societal concerns regarding misinformation and authorship transparency in the age of generative AI.
Corresponding Lab Member: Kevin Bönisch and Alexander Mehler.
Master Thesis: Streaming Multimodal Understanding with Incremental Prediction and Uncertainty Modeling.
Description
Multimodal signals, such as video and audio, are ordered chronologically. Processing these signals requires systems capable of performing incremental predictions and dynamic uncertainty assessments. This master thesis is about implementing and testing a framework that tracks events or states in real time, predicts them, interprets intermodal interactions, and provides confidence estimates. The proposed topic focuses on multimodal stream learning, wherein models update their internal representations continuously as new data arrives without access to the complete sequence. Unlike offline approaches, streaming systems process asynchronous, context-dependent signals where the same signals can have different meanings depending on accompanying visual or acoustic cues. For instance, the greeting "hello" (audio) accompanied by a smile (image) may imply friendliness, whereas the same greeting accompanied by a clenched fist (image) may imply a threat. This project has many potential applications, one of which is emotion recognition in video streams. Tasks include specifying a streaming-compatible application, acquiring relevant literature, implementing a basic system, and testing extensions such as online cross-modal fusion, temporal attention, and uncertainty-aware prediction layers.
Corresponding Lab Member: Ali Abusaleh and Alexander Mehler.
Bachelor Thesis: Multimodal Sentiment Analysis via Cross-Attention Fusion in Latent Space.
Description
Accurately estimating sentiments in video data is a complex challenge that requires integrating visual, audio, and textual cues. This proposal involves developing a multimodal sentiment analysis pipeline that uses state-of-the-art models to extract embeddings from different modalities: Qwen2.5 for text, JEPA 2 for visuals, and WhisperX (pyannote) for audio. The embeddings then undergo projection into a shared latent space and fusion via cross-attention mechanisms to capture intermodal dependencies. A probabilistic output layer will then estimate sentiment distributions over fine-grained emotions. To accelerate the analysis process for real-time applications, the pipeline should use a stream-like vector projection method that updates latent space representations incrementally instead of reprocessing entire sequences. The pipeline will be built using DUUI to ensure modularity, scalability, and reproducibility. The goal of this thesis is to tackle limitations of unimodal sentiment analysis and traditional fusion methods to achieve efficient, accurate, and scalable multimodal sentiment analysis.
Corresponding Lab Member: Ali Abusaleh and Alexander Mehler.
Master Thesis: Negation and LLM Reasoning.
Description
As lexical and logical negation appears to play a crucial role in human reasoning and inquiry, we are interested in analyzing negation patterns in reasoning traces produced by large language models (LLMs), as well as in LLM reasoning frameworks that explicitly incorporate negation, with the goal of better mimicking human reasoning. Possible directions for this thesis include: (1) The development of LLM reasoning frameworks centered around the phenomenon of negation and their evaluation against existing frameworks such as Chain-of-Thought (CoT) or Tree-of-Thought (ToT). (2) Negation-centered fine-tuning of LLM reasoning. (3) Qualitative and quantitative analysis of reasoning traces produced by LLMs, focusing on negation patterns.
Corresponding Lab Member: Leon Hammerla and Alexander Mehler.
Bachelor Thesis: Detecting the negated Event/Detecting the Focus of Negation.
Description
Classical negation annotation in computational linguistics involves identifying the negation cue, determining the scope of the negation, and detecting both the negated event and the most prominent part of the scope that is negated (the focus). While reliable systems already exist for detecting negation cues and scopes, current frameworks need to be extended to identify the negated event and/or the focus. For a bachelor thesis, addressing one of these two aspects is sufficient; for a master thesis, both should be tackled. A Python-based pipeline for cue and scope detection is already available, and the newly developed detection modules can be integrated into this existing framework (python).
Corresponding Lab Member: Leon Hammerla and Alexander Mehler.
Bachelor Thesis: Can we use scientific mentions to reconstruct or identify scientific argumentative text?.
Description
Scientific articles contain numerous mentions of datasets, methods, tasks, and metrics, which capture essential elements of the scientific discourse. A key question is whether these scientific mentions and their interrelations can be leveraged to reconstruct or identify argumentative text, such as claims and supporting evidence. Existing resources like SciER and SciREX provide annotations for such mentions and their relations, which can be used to detect how claims are formulated or to identify sentences that express claims in context. Beyond leveraging existing mentions, identifying additional scientific entities and their relations could further enrich the representation of scientific arguments. Given the current lack of full-text scientific argument mining datasets, this task has the potential to support the creation of a large-scale corpus of argumentative sentences and their relational structure, providing a foundation for downstream tasks in scientific argument mining and automated knowledge extraction. See also:
Corresponding Lab Member: Bhuvanesh Verma and Alexander Mehler.
Bachelor Thesis: Modelling Task Effects: Human-AI Judgements.
Description
Motivation
Given both abstract and concrete words, does a word (eg: Apartment) always have the same representation? If the word appeared in different views (with picture, in a sentence, and in a word cloud), does the task (can you take rest in it?) under which the word presented with different views alter the word representation? The idea is to label these task effects associated with words labelled by humans align with labels generated by LLMs?

Tasks
  1. Create questions related to daily life activities and how relevant these questions with abstract and concrete words.
  2. The dataset is periera dataset consisting of 180 words (120 concrete and 60 abstract). Every word is presented in three views.
  3. Using human annotators we can annotate how relevant a word w.r.t task in a given view.
  4. Do the similar analysis with LLMs where we can annotate relevance of a word w.r.t task in a given view. 5. Evaluate the human judgements with LLM based annotators and provide unique capabilities and drawbacks of these LLMs (GPT-3.5,4, and Mistral models)

Main Research Questions
To what extent do large language model (LLM) annotations of task relevance for concepts align with human annotations across different representational views and concept types?

View-Based Alignment: How does the similarity between human and LLM annotations of concept relevance vary across different presentation views (e.g., sentences, pictures, word clouds)? b. In which representational view is the alignment between human and LLM relevance ratings most pronounced?

Concept Type Sensitivity: Do LLMs align more closely with human judgments for concrete concepts than for abstract concepts, or vice versa? b. How does the type of concept (abstract vs. concrete) affect the degree of alignment between human and LLM annotations?

Interaction Effects: Is there an interaction between concept type and representational view that influences the alignment of task-relevance judgments between humans and LLMs? b. Are certain combinations of view and concept type particularly conducive to high alignment between LLM and human annotations?

Cognitive Modeling Potential: Can LLMs effectively model human-like concept relevance judgments across varying contexts of presentation and abstraction?

Goal
Should work towards publishing the work in a conference or workshop.

References


Corresponding Lab Member: Mounika Marreddy and Alexander Mehler.
Bachelor Thesis: Human–AI Annotator Alignment across Multiple NLP Tasks.
Description
Motivation
Recent large language models now outperform many complex benchmark NLP tasks like reading comprehension, mathschool grading, coding, and boolean algebra. This shift has been fruitful and yield human benchmark performance on reasoning tasks. However, the extent to which the labels annotated by humans align with those used by LLMs remains unclear? Do large language models that are better at annotating longer user conversations and naturally learn more human-like conceptual representations? Do scaling of parameters in LLMs helps in alignment with human judgements?

Tasks
  1. Explore pre-existing LLMs based approaches to pre-label or pre-tag parts of data for various NLP tasks e.g. text classification, natural language inference, sentiment analysis and subjectivity tasks like stance and so on. Here we can consider gold datasets as well.
  2. Investigation of how LLMs as annotators process tasks in the world in the same way that humans do.
  3. Development of a user-friendly interface for input of annotation guidelines, formation of prompts, review/modify model generated annotations and provide feedback.
  4. Exploration of evaluations methods for LLM-assisted annotations. 5. Shared and unique capabilities of LLMs and their alignment with human judgments.
Main Research Questions
  1. Does LLM parameter scaling significantly improve alignment with human labels in subjective NLP tasks?
  2. How do annotation prompt designs and task instructions influence the degree of alignment between LLM-generated labels and human labels across different NLP tasks?
  3. How do different demographic backgrounds of annotators (e.g., race, gender, cultural context) affect the alignment between LLM-generated annotations and human labels in subjective or socially sensitive NLP tasks?
Goal
Should work towards publishing the work in a conference or workshop.

References
Tan, Zhen, et al. Large Language Models for Data Annotation: A Survey. https://arxiv.org/pdf/2402.13446.pdf
Corresponding Lab Member: Mounika Marreddy and Alexander Mehler.
Bachelor Thesis: Exploring Pretrained Retrievers and Embedding-Based Search for Accurate Book Metadata Retrieval in RAG Pipelines.
Description
Retrieving accurate book metadata is essential for enhancing the performance of Retrieval-Augmented Generation (RAG) pipelines. This project explores modern, non-heuristic approaches to metadata retrieval, focusing on the use of pretrained retrievers and embedding-based similarity search. Instead of relying on manually crafted heuristics, these methods leverage embeddings generated by state-of-the-art models to identify the most relevant metadata and associated texts. The experiment will utilize large indexed corpora, such as Wikipedia and online library databases, to evaluate the efficacy of pretrained retrievers and embedding similarity for matching input metadata with incomplete or ambiguous information. The project will involve indexing metadata and textual content from publicly available sources (e.g., Open Library, Google Books, Wikipedia) using vector-based search frameworks. Pretrained models, such as dense retrievers (e.g., DPR, SentenceTransformers), will be used to generate embeddings for both input metadata and indexed corpora. The results will be compared to traditional heuristic-based methods to evaluate retrieval accuracy, scalability, and adaptability to incomplete metadata scenarios. This research addresses a significant bottleneck in RAG pipelines, where retrieval systems must efficiently integrate external knowledge to improve language model performance in answering specific queries. While this study focuses on bibliographic data, the proposed methods are generalizable and applicable to other domains requiring accurate and scalable metadata retrieval. The outcomes will provide insights into the trade-offs between heuristic and non-heuristic approaches and contribute to advancing metadata retrieval techniques for knowledge-intensive NLP tasks. References:
Corresponding Lab Member: Alexander Mehler.
Bachelor Thesis: Developing a Heuristic for Retrieving Specific Book Metadata in Retrieval-Augmented Generation (RAG) Pipelines.
Description
Accurate retrieval of book metadata is a critical challenge in the development of Retrieval-Augmented Generation (RAG) pipelines. This project aims to develop a heuristic-based procedure for retrieving the most valid metadata - and potentially the text - of books from various online library databases using publicly available APIs. These databases contain large collections of book records, often with incomplete or inconsistent metadata. This makes querying and matching a specific publication a complex task, especially when dealing with incomplete input metadata. The procedure will address cases where multiple books share similar metadata, such as the same title and author, but belong to different editions or publications. The proposed heuristic will analyze and rank the results of API queries to identify the best match for the input data. The approach involves a detailed study of metadata patterns in online libraries and the development of robust matching criteria that account for variations and gaps in the data. This work contributes to an emerging area in natural language processing where RAG pipelines rely on external knowledge sources to augment large language models (LLMs) with domain-specific information. By addressing the challenge of metadata retrieval, this project will improve the accuracy and reliability of downstream tasks, such as answering questions about specific books. Although the focus of this work is on bibliographic data, the developed heuristic has the potential to be generalized for metadata retrieval in other domains. The outcome of this project will be a validated methodology that can be seamlessly integrated into RAG pipelines, representing a significant step forward in leveraging external databases for high quality contextual information retrieval. References:
Corresponding Lab Member: Alexander Mehler.
Bachelor Thesis: How does Language Bias Affect Pretrained Language Models?.
Description
Does language bias exist in pretrained large language models, such as those trained using a masked language modeling objective? What are the core components of these models that tend to produce this bias? Language bias refers to the tendency of multilingual models to prefer answering or selecting responses (e.g., in question-answering or information retrieval tasks) in the same language as the query, even when more likely candidate answers are available in other languages. What are the primary causes of this behavior? Are they linguistic, embedded in the training objective, or influenced by the loss function? These questions remain unresolved. Bachelor's and Master's theses are invited to explore these or related questions. References:
Corresponding Lab Member: Alexander Mehler.
Bachelor Thesis: A comparative study of methodologies that are used to identifying human vs automatic generated text.
Description
With the advent of large language models such as ChatGPT, growing ethical concerns have emerged, highlighting the need for approaches to address automatic text recognition models. These models are becoming increasingly popular but remain underexplored and not well established. A study is needed to provide an overview of existing work in this area and evaluate its usefulness. Bachelor's and Master's theses are invited to explore this field through a comparative approach by reimplementing and testing a range of established methods. References:
Corresponding Lab Member: Alexander Mehler.
Master Thesis: Can Adversarial Text Snippets Achieve Refusal Dimension Deletion?.
Description
The threat of abuse through determined adversaries makes safety of public-facing LLMs a key priority for developers and researcher alike.
Despite intensive efforts, recent research shows that "refusal in language models [may be] mediated by a [one-dimensional subspace in the model's weights]" (Arditi et al., 2024) and that it is possible to create text-snippets that circumvent harmful response prevention in open- and closed-source LLMs using adversarial algorithms (Zou et al., 2023). This beckons the question, whether these two methods of "jailbreaking" LLMs align; i.e. whether adversarially generated text segments can shift a model's hidden states into a position that effectively approach refusal dimension deletion.

Related Work

Corresponding Lab Member: Manuel Schaaf and Alexander Mehler.
Master Thesis: Unlocking Wikipedia for Research: A Modular Toolkit for Structured NLP Applications.
Description
Wikipedia serves as a vast and diverse resource that is widely used in research domains to address a variety of tasks and questions. However, its size, semi-structured form, inconsistent formatting, and noisy elements (e.g., infoboxes) pose significant challenges to its accessibility and usability in structured research applications. This thesis aims to develop a comprehensive framework to overcome these challenges and enable researchers to effectively use Wikipedia's content for NLP and other structured research purposes. The proposed work focuses on the design of a modular, database-driven toolkit that supports the local use of Wikipedia for NLP processing. Key objectives include exploring existing tools and databases, integrating Wikidata, and leveraging different database solutions to address different use cases. Specific tasks include selecting and evaluating databases, designing database schemas, processing Wikipedia dump files as source data, and implementing robust mechanisms for data extraction, parsing (e.g., Wikitext), and updating. Additional challenges such as constructing category and social graphs, managing interlanguage links, handling revisions, and integrating DUUI (Docker Unified UIMA Interface) will also be addressed. The goal of this thesis is to provide a practical toolkit for researchers that facilitates the effective and flexible use of Wikipedia's content for a wide range of applications. See also:
Corresponding Lab Member: Daniel Baumartz and Alexander Mehler.
Bachelor Thesis: Multimodal data integration and processing in DUUI.
Description
The Docker Unified UIMA Interface (DUUI) is a tool designed for the automated analysis of large corpora using a variety of NLP tools. Currently, DUUI supports the processing of text, audio, and video data. To extend its capabilities, additional support for multimodal data, such as that provided by Va.Si.Li-Lab – which includes motion data, object interaction data, and more – should be integrated into DUUI. All integrated data will need to be linked through a new type system tailored to each modality. Furthermore, processes such as motion detection must be incorporated to effectively process and analyze these new data types within DUUI. Bachelor's and Master's theses are invited to explore this multimodal model extension and integration. References:
Corresponding Lab Member: Mevlüt Bagci and Alexander Mehler.
Bachelor Thesis: Affiliation of Speech and Gesture through LLMs.
Description
Most "referential" gestures have a docking point in accompanying speech, known as the lexical affiliate. This bachelor’s thesis leverages this empirical fact to utilize large language models (LLMs) for gesture annotation. Each occurrence of a referential gesture in a multimodal dataset is presented to an LLM, which is tasked with identifying the corresponding affiliate expression in speech. Through this process, a gesture interpretation is derived. Additionally, the approach aims to detect gestures that lack an overt affiliate. Building on the strong performance of LLMs in handling bridging relations, the thesis proposes a frame-based interpretation for such gestures. This work makes a central topic of multimodal communication accessible to modern computational techniques, provides quantitative insights into speech-gesture affiliation, and lays the foundation for further gesture classifications.
Corresponding Lab Member: Andy Lücking and Alexander Mehler.
Master Thesis: Aristotelian Modification of Nominals.
Description
The standard semantics of noun-modifying adjectives is typically explained in terms of set membership in one way or another. Modern theories often incorporate scales, particularly for measure adjectives. This master's thesis will generalize such approaches by employing more general property spaces, which can be conceptualized as accidental qualities, a notion derived from Aristotle’s linguistic work. The accidental qualities of nominals will be determined by clustering adjectives from large corpora, thereby enriching lexical entries. This thesis complements computational linguistic research on the generative lexicon, has relevance for multimodal speech-gesture integration, and offers a novel perspective on the metaphoric use of adjectives.
Corresponding Lab Member: Andy Lücking and Alexander Mehler.
Bachelor/Master Thesis: Multimodal VR Data Meets DUUI.
Description
The processing of large and extensive unstructured corpora is a constant challenge for various scientific disciplines. For this purpose, the Docker Unified UIMA Interface (DUUI) was developed, which provides NLP analysis methods based on container services to perform horizontally and vertically distributed big data analyses in a unified, standardized and reusable and schema-based process. The first steps towards multimodality have also already been taken. The task of this thesis is to adapt DUUI processing so that it can also be used to process multimodal data collected through VR experiments. The main difficulty lies in the alignment of speech, transcription and movements.
See also:
Master Thesis: Natural Human interactions with LLM’s per Audio.
Description
Natural conversations between people are standard, and this is also possible with large language models (LLMs). Human speech can be converted to text, which can then be used as input for the LLM. The output of the LLM is then converted back to audio. However, due to latency and the nature of audio output, it is still a major challenge to integrate a chatbot that can communicate naturally in both text and audio without human interlocutors noticing this latency, especially in multilingual environments. Therefore, Bachelor's or Master's are invited that address these latency issues. See also:
Corresponding Lab Member: Mevlüt Bagci and Alexander Mehler.
Bachelor Thesis: Diversification of the container landscape for DUUI.
Description
The processing of large and extensive unstructured corpora remains a significant challenge for various scientific disciplines. To address this, the Docker Unified UIMA Interface (DUUI) was developed. DUUI provides NLP methods through container services to perform horizontally and vertically distributed big data analysis in a unified, standardized, reusable, and schema-based process. In the medium to long term, DUUI can leverage a variety of container services to implement optimal processing solutions tailored to specific scenarios and environmental parameters. This involves the creation, implementation, and evaluation of container services for DUUI that have not yet been integrated. Bachelor's or Master's theses are invited to address this task of services integration. See also:
Corresponding Lab Member: Giuseppe Abrami and Alexander Mehler.
Bachelor Thesis: Retrieval-Augmented Generation (RAG): Synthesizing Knowledge from Large Corpora.
Description
The increase of textual data in scientific and other domains has created an urgent need for tools that can efficiently retrieve accurate information from large corpora. Can large language models help researchers identify critical information - metaphorically, "needles in a haystack"? This research explores Retrieval-Augmented Generation (RAG) as a framework for proposing pipelines and models capable of locating specific units of information in response to user queries. Crucially, this approach avoids the need for explicit fine-tuning of large language models on domain-specific data. Instead, it emphasizes techniques such as prompt engineering, advanced data retrieval mechanisms, and innovative query formulation. Possible methodologies include the use of embedding spaces, graph databases, or hybrid architectures to improve retrieval accuracy and synthesis capabilities. Bachelor's or Master's theses are invited to contribute novel solutions to this interdisciplinary challenge. See also: OPEN SCHOLAR: SYNTHESIZING SCIENTIFIC LITERATURE WITH RETRIEVAL-AUGMENTED LMS; CCC-BERT | Kaggle
Corresponding Lab Member: Kevin Boenisch and Alexander Mehler.