Kevin Bönisch

Staff Member

Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
Room 210
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154
Phone:
Mail:
Follow me on:        

Office Hour: Thursday, 10-12 AM

You must neither be discontented with yourself—and that were poor-spirited—nor self-satisfied—and that is folly.

– Baltasar Gracián

Hi there,

I have five years of experience as a professional Software Developer (C# .NET Full-Stack) before transitioning to a Researcher role in AI and Natural Language Processing at the Text Technology Lab, while pursuing my Master’s in Computer Science. One of my main tasks at the lab was the development of the Unified Corpus Explorer. I have also worked as a tutor twice. After nearly two years at the lab and completing my Master’s studies, I returned to the industry as a Data Scientist.

My published research spans several fields, including visualization (2D & 3D) through easy-to-use user interfaces and also more complex spatial systems, virtual reality adaptations, and NLP techniques such as information retrieval, topic extraction, text classification, and fine-tuning or training of (Large) Language Models. I have also worked with traditional machine learning and deep learning methods for regression and classification tasks. Whatever research I do, I try to combine the more practical and hands-on craft of classical Sofware Engineering with the more scientific and theoretical side of research. You can find my publications and projects down below.

Finally, I like to participate in software development and data science competitions on platforms like kaggle, with results published on my GitHub and other outlets. Out of those competitions, my biggest accomplishments are:

Whatever project I do, I approach it with a strong sense of purpose, genuine enjoyment and the desire to become better, which is why I take pride in the things I create.
If you have found some common ground from this description that resonates with you, feel free to contact me – I’d like to hear from you as well.


Projects

As governments worldwide continue to release vast amounts of textual information, the need for efficient and insightful tools to extract, interpret and present this data has become increasingly critical. Towards solving this issue, we present the Bundestags-Mine: an environment that periodically retrieves pertinent data from the German parliament, parses and analyzes it using pipelines for natural language processing, and then displays the results in a web application that is publicly accessible. Bundestags-Mine helps to extract key information from parliamentary documents in a visually appealing matter for many use cases. For instance, the tool can be leveraged by journalists for news detection, lawyers for compliance checking, linguists for discourse analysis, and the broad public to inform themselves about the positions of political party members on a topic.

In order to more precisely research the major societal challenges of the coming decades, including digitization, climate change, and war- and pandemic-related societal changes, and to be able to identify the need for political action on this basis, the social sciences need innovative research data and methods. The DFG has established the long-term infrastructure priority programme “New Data Spaces” (SPP 2431) to open up and develop such new data spaces.

It is managed by the programme committee, consisting of Prof. Dr. Cordula Artelt (spokesperson) and Prof. Dr. Corinna Kleinert (both LIfBi), Prof. Dr. Reinhard Pollak (GESIS), Prof. Dr. Stefan Liebig (FU Berlin) and Prof. Dr. Alexander Mehler (Goethe University Frankfurt).

Viki LibraRy, is a first implementation for generating and exploring online information based on hypertext systems in a three-dimensional environment using virtual reality. Thereby a virtual library, based on Wikipedia, is created, in which Rooms are dynamically created with data, which is provided via a RESTful backend. In these Rooms the user can browse through all kind of different articles of the category in the form of Books. In addition, users can access different Rooms, through virtual portals. Beyond that, the explorations can be done alone or collaboratively, using Ubiq.

The specialised information service BIOfid (www.biofid.de) is oriented towards the special needs of scientists researching biodiversity topics at research institutions and in natural history collections. Since 2017, BIOfid has been building an infrastructure that contributes to the provision and mobilisation of research-relevant data in a variety of ways in the context of current developments in biodiversity research.

Publications

Total: 8

2025

Kevin Bönisch, Giuseppe Abrami and Alexander Mehler. 2025. Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), 522–534. Best Demo Award.
BibTeX
@inproceedings{Boenisch:et:al:2025,
  title     = {Towards Unified, Dynamic and Annotation-based Visualisations and
               Exploration of Annotated Big Data Corpora with the Help of Unified
               Corpus Explorer},
  author    = {B{\"o}nisch, Kevin and Abrami, Giuseppe and Mehler, Alexander},
  editor    = {Dziri, Nouha and Ren, Sean (Xiang) and Diao, Shizhe},
  booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas
               Chapter of the Association for Computational Linguistics: Human
               Language Technologies (System Demonstrations)},
  year      = {2025},
  address   = {Albuquerque, New Mexico},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.naacl-demo.42/},
  pages     = {522--534},
  isbn      = {979-8-89176-191-9},
  abstract  = {The annotation and exploration of large text corpora, both automatic
               and manual, presents significant challenges across multiple disciplines,
               including linguistics, digital humanities, biology, and legal
               science. These challenges are exacerbated by the heterogeneity
               of processing methods, which complicates corpus visualization,
               interaction, and integration. To address these issues, we introduce
               the Unified Corpus Explorer (UCE), a standardized, dockerized,
               open-source and dynamic Natural Language Processing (NLP) application
               designed for flexible and scalable corpus navigation. Herein,
               UCE utilizes the UIMA format for NLP annotations as a standardized
               input, constructing interfaces and features around those annotations
               while dynamically adapting to the corpora and their extracted
               annotations. We evaluate UCE based on a user study and demonstrate
               its versatility as a corpus explorer based on generative AI.},
  note      = {Best Demo Award},
  pdf       = {https://aclanthology.org/2025.naacl-demo.42.pdf}
}
Kevin Bönisch, Alexander Mehler, Shaduan Babbili, Yannick Heinrich, Philipp Stephan and Giuseppe Abrami. 2025. Viki LibraRy: Collaborative Hypertext Browsing and Navigation in Virtual Reality. New Review of Hypermedia and Multimedia, 31(1-2):45–75.
BibTeX
@article{Boenisch:et:al:2025:b,
  author    = {B\"{o}nisch, Kevin and Mehler, Alexander and Babbili, Shaduan
               and Heinrich, Yannick and Stephan, Philipp and Abrami, Giuseppe},
  abstract  = {We present Viki LibraRy, a dynamically built library in virtual
               reality (VR) designed to visualize hypertext systems, with an
               emphasis on collaborative interaction and spatial immersion. Viki
               LibraRy goes beyond traditional methods of text distribution by
               providing a platform where users can share, process, and engage
               with textual information. It operates at the interface of VR,
               collaborative learning and spatial data processing to make reading
               tangible and memorable in a spatially mediated way. The article
               describes the building blocks of Viki LibraRy, its underlying
               architecture, and several use cases. It evaluates Viki LibraRy
               in comparison to a conventional web interface for text retrieval
               and reading. The article shows that Viki LibraRy provides users
               with spatial references for structuring their recall, so that
               they can better remember consulted texts and their meta-information
               (e.g. in terms of subject areas and content categories)},
  title     = {{Viki LibraRy: Collaborative Hypertext Browsing and Navigation
               in Virtual Reality}},
  journal   = {New Review of Hypermedia and Multimedia},
  volume    = {31},
  number    = {1-2},
  pages     = {45--75},
  year      = {2025},
  publisher = {Taylor \& Francis},
  doi       = {10.1080/13614568.2024.2383581},
  url       = {https://doi.org/10.1080/13614568.2024.2383581},
  eprint    = {https://doi.org/10.1080/13614568.2024.2383581}
}

2024

Alexander Mehler, Mevlüt Bagci, Patrick Schrottenbacher, Alexander Henlein, Maxim Konca, Giuseppe Abrami, Kevin Bönisch, Manuel Stoeckel, Christian Spiekermann and Juliane Engel. 2024. Towards New Data Spaces for the Study of Multiple Documents with Va.Si.Li-Lab: A Conceptual Analysis. In: Students', Graduates' and Young Professionals' Critical Use of Online Information: Digital Performance Assessment and Training within and across Domains, 259–303. Ed. by Olga Zlatkin-Troitschanskaia, Marie-Theres Nagel, Verena Klose and Alexander Mehler. Springer Nature Switzerland.
BibTeX
@inbook{Mehler:et:al:2024:a,
  author    = {Mehler, Alexander and Bagci, Mevl{\"u}t and Schrottenbacher, Patrick
               and Henlein, Alexander and Konca, Maxim and Abrami, Giuseppe and B{\"o}nisch, Kevin
               and Stoeckel, Manuel and Spiekermann, Christian and Engel, Juliane},
  editor    = {Zlatkin-Troitschanskaia, Olga and Nagel, Marie-Theres and Klose, Verena
               and Mehler, Alexander},
  title     = {Towards New Data Spaces for the Study of Multiple Documents with
               Va.Si.Li-Lab: A Conceptual Analysis},
  booktitle = {Students', Graduates' and Young Professionals' Critical Use of
               Online Information: Digital Performance Assessment and Training
               within and across Domains},
  year      = {2024},
  publisher = {Springer Nature Switzerland},
  address   = {Cham},
  pages     = {259--303},
  abstract  = {The constitution of multiple documents has so far been studied
               essentially as a process in which a single learner consults a
               number (of segments) of different documents in the context of
               the task at hand in order to construct a mental model for the
               purpose of completing the task. As a result of this research focus,
               the constitution of multiple documents appears predominantly as
               a monomodal, non-interactive process in which mainly textual units
               are studied, supplemented by images, text-image relations and
               comparable artifacts. This approach is reflected in the contextual
               fixity of the research design, in which the learners under study
               search for information using suitably equipped computers. If,
               on the other hand, we consider the openness of multi-agent learning
               situations, this scenario lacks the aspects of interactivity,
               contextual openness and, above all, the multimodality of information
               objects, information processing and information exchange. This
               is where the chapter comes in. It describes Va.Si.Li-Lab as an
               instrument for multimodal measurement for studying and modeling
               multiple documents in the context of interactive learning in a
               multi-agent environment. To this end, the chapter places Va.Si.Li-Lab
               in the spectrum of evolutionary approaches that vary the combination
               of human and machine innovation and selection. It also combines
               the requirements of multimodal representational learning with
               various aspects of contextual plasticity to prepare Va.Si.Li-Lab
               as a system that can be used for experimental research. The chapter
               is conceptual in nature, designing a system of requirements using
               the example of Va.Si.Li-Lab to outline an experimental environment
               in which the study of Critical Online Reasoning (COR) as a group
               process becomes possible. Although the chapter illustrates some
               of these requirements with realistic data from the field of simulation-based
               learning, the focus is still conceptual rather than experimental,
               hypothesis-driven. That is, the chapter is concerned with the
               design of a technology for future research into COR processes.},
  isbn      = {978-3-031-69510-0},
  doi       = {10.1007/978-3-031-69510-0_12},
  url       = {https://doi.org/10.1007/978-3-031-69510-0_12}
}
Kevin Bönisch and Alexander Mehler. 2024. Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles. Proceedings of the 2nd Legal Information Retrieval meets Artificial Intelligence Workshop LIRAI 2024.
BibTeX
@inproceedings{Boenisch:Mehler:2024,
  title     = {Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval
               via Bagging and SVR Ensembles},
  author    = {B\"{o}nisch, Kevin and Mehler, Alexander},
  year      = {2024},
  booktitle = {Proceedings of the 2nd Legal Information Retrieval meets Artificial
               Intelligence Workshop LIRAI 2024},
  location  = {Poznan, Poland},
  publisher = {CEUR-WS.org},
  address   = {Aachen, Germany},
  series    = {CEUR Workshop Proceedings},
  abstract  = {We introduce a retrieval approach leveraging Support Vector Regression
               (SVR) ensembles, bootstrap aggregation (bagging), and embedding
               spaces on the German Dataset for Legal Information Retrieval (GerDaLIR).
               By conceptualizing the retrieval task in terms of multiple binary
               needle-in-a-haystack subtasks, we show improved recall over the
               baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting
               promising initial results, without training or fine-tuning any
               deep learning models. Our approach holds potential for further
               enhancement, particularly through refining the encoding models
               and optimizing hyperparameters.},
  archiveprefix = {arXiv},
  eprint    = {2501.05018},
  url       = {https://arxiv.org/pdf/2501.05018},
  keywords  = {legal information retrieval, support vector regression, word embeddings, bagging ensemble}
}
Kevin Bönisch, Manuel Stoeckel and Alexander Mehler. 2024. HyperCausal: Visualizing Causal Inference in 3D Hypertext. Proceedings of the 35th ACM Conference on Hypertext and Social Media, 330––336.
BibTeX
@inproceedings{Boenisch:et:al:2024,
  author    = {B\"{o}nisch, Kevin and Stoeckel, Manuel and Mehler, Alexander},
  title     = {HyperCausal: Visualizing Causal Inference in 3D Hypertext},
  year      = {2024},
  isbn      = {9798400705953},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  url       = {https://doi.org/10.1145/3648188.3677049},
  doi       = {10.1145/3648188.3677049},
  abstract  = {We present HyperCausal, a 3D hypertext visualization framework
               for exploring causal inference in generative Large Language Models
               (LLMs). HyperCausal maps the generative processes of LLMs into
               spatial hypertexts, where tokens are represented as nodes connected
               by probability-weighted edges. The edges are weighted by the prediction
               scores of next tokens, depending on the underlying language model.
               HyperCausal facilitates navigation through the causal space of
               the underlying LLM, allowing users to explore predicted word sequences
               and their branching. Through comparative analysis of LLM parameters
               such as token probabilities and search algorithms, HyperCausal
               provides insight into model behavior and performance. Implemented
               using the Hugging Face transformers library and Three.js, HyperCausal
               ensures cross-platform accessibility to advance research in natural
               language processing using concepts from hypertext research. We
               demonstrate several use cases of HyperCausal and highlight the
               potential for detecting hallucinations generated by LLMs using
               this framework. The connection with hypertext research arises
               from the fact that HyperCausal relies on user interaction to unfold
               graphs with hierarchically appearing branching alternatives in
               3D space. This approach refers to spatial hypertexts and early
               concepts of hierarchical hypertext structures. A third connection
               concerns hypertext fiction, since the branching alternatives mediated
               by HyperCausal manifest non-linearly organized reading threads
               along artificially generated texts that the user decides to follow
               optionally depending on the reading context.},
  booktitle = {Proceedings of the 35th ACM Conference on Hypertext and Social Media},
  pages     = {330–-336},
  numpages  = {7},
  keywords  = {3D hypertext, large language models, visualization},
  location  = {Poznan, Poland},
  series    = {HT '24},
  video     = {https://www.youtube.com/watch?v=ANHFTupnKhI}
}

2023

Kevin Bönisch. 2023. BA Thesis: Dialog generation using language models. Goethe University.
BibTeX
@bathesis{boenisch:2023,
  author    = {Kevin B{\"o}nisch},
  title     = {Dialog generation using language models},
  institution = {Goethe University},
  pages     = {28},
  year      = {2023},
  url       = {https://publikationen.ub.uni-frankfurt.de/opus4/frontdoor/index/index/docId/79165},
  repository = {https://github.com/texttechnologylab/ROBERT}
}
Kevin Bönisch, Giuseppe Abrami, Sabine Wehnert and Alexander Mehler. 2023. Bundestags-Mine: Natural Language Processing for Extracting Key Information from Government Documents. Legal Knowledge and Information Systems.
BibTeX
@inproceedings{Boenisch:et:al:2023,
  title     = {{Bundestags-Mine}: Natural Language Processing for Extracting
               Key Information from Government Documents},
  isbn      = {9781643684734},
  issn      = {1879-8314},
  url       = {http://dx.doi.org/10.3233/FAIA230996},
  doi       = {10.3233/faia230996},
  booktitle = {Legal Knowledge and Information Systems},
  publisher = {IOS Press},
  author    = {B\"{o}nisch, Kevin and Abrami, Giuseppe and Wehnert, Sabine and Mehler, Alexander},
  year      = {2023}
}
Shaduan Babbili, Kevin Bönisch, Yannick Heinrich, Philipp Stephan, Giuseppe Abrami and Alexander Mehler. 2023. Viki LibraRy: A Virtual Reality Library for Collaborative Browsing and Navigation through Hypertext. Proceedings of the 34th ACM Conference on Hypertext and Social Media.
BibTeX
@inproceedings{Babbili:et:al:2023,
  author    = {Babbili, Shaduan and B\"{o}nisch, Kevin and Heinrich, Yannick
               and Stephan, Philipp and Abrami, Giuseppe and Mehler, Alexander},
  title     = {Viki LibraRy: A Virtual Reality Library for Collaborative Browsing
               and Navigation through Hypertext},
  year      = {2023},
  isbn      = {9798400702327},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  url       = {https://doi.org/10.1145/3603163.3609079},
  doi       = {10.1145/3603163.3609079},
  abstract  = {We present Viki LibraRy, a virtual-reality-based system for generating
               and exploring online information as a spatial hypertext. It creates
               a virtual library based on Wikipedia in which Rooms are used to
               make data available via a RESTful backend. In these Rooms, users
               can browse through all articles of the corresponding Wikipedia
               category in the form of Books. In addition, users can access different
               Rooms, through virtual portals. Beyond that, the explorations
               can be done alone or collaboratively, using Ubiq.},
  booktitle = {Proceedings of the 34th ACM Conference on Hypertext and Social Media},
  articleno = {6},
  numpages  = {3},
  keywords  = {virtual reality simulation, virtual reality, virtual hypertext, virtual museum},
  location  = {Rome, Italy},
  series    = {HT '23},
  pdf       = {https://dl.acm.org/doi/pdf/10.1145/3603163.3609079}
}

Available Thesis Proposals for Students

If any of the following thesis proposals interest you, please feel free to contact me.

2025

Bachelor Thesis: Retrieval-Augmented Generation (RAG): Synthesizing Knowledge from Large Corpora.
Description
The increase of textual data in scientific and other domains has created an urgent need for tools that can efficiently retrieve accurate information from large corpora. Can large language models help researchers identify critical information - metaphorically, "needles in a haystack"? This research explores Retrieval-Augmented Generation (RAG) as a framework for proposing pipelines and models capable of locating specific units of information in response to user queries. Crucially, this approach avoids the need for explicit fine-tuning of large language models on domain-specific data. Instead, it emphasizes techniques such as prompt engineering, advanced data retrieval mechanisms, and innovative query formulation. Possible methodologies include the use of embedding spaces, graph databases, or hybrid architectures to improve retrieval accuracy and synthesis capabilities. Bachelor's or Master's theses are invited to contribute novel solutions to this interdisciplinary challenge. See also: OPEN SCHOLAR: SYNTHESIZING SCIENTIFIC LITERATURE WITH RETRIEVAL-AUGMENTED LMS; CCC-BERT | Kaggle
Corresponding Lab Member: Kevin Boenisch and Alexander Mehler.

2025

Bachelor Thesis: Language Forensics in the Age of AI: Retrospective Watermarking for Text Authenticity.
Description
Recent advances in large language models (LLMs) have made it increasingly difficult to distinguish between human-written and AI-generated text. While proactive watermarking techniques can embed detectable patterns during text generation, they rely on control over the generation process — a condition often unmet in real-world scenarios where AI-generated texts circulate freely online without prior tagging. This thesis explores retrospective watermarking and tagging — the post-hoc identification and labeling of AI-generated text after its creation and distribution. You investigate methods that combine linguistic stylometry, statistical signal analysis, and semantic fingerprinting to identify traces of machine generation. Furthermore, it examines how artificial "watermarks" can be retroactively embedded or inferred to improve downstream detection models and content source attribution. This work lies at the intersection of computational linguistics, machine learning, and digital forensics, and aims to address pressing societal concerns regarding misinformation and authorship transparency in the age of generative AI.
Corresponding Lab Member: Kevin Bönisch and Alexander Mehler.