The TTLab (Text Technology Lab), headed by Prof. Alexander Mehler, is part of the Department of Computer Science and Mathematics (Fachbereich Informatik und Mathematik) at the Goethe Universität in Frankfurt. It investigates formal, algorithmic models to deepen our understanding of information processing in the humanities. We examine diachronic, time-dependent as well as synchronic aspects of processing linguistic and non-linguistic, multimodal signs. The Lab works across several disciplines to bridge between computer science on the one hand and corpus-based research in the humanities on the other. To this end, we develop information models and algorithms for the analysis of texts, images, and other objects relevant to research in the humanities.
News
-
New workshop publications at LREC 2026
by
We are pleased to inform you about the acceptance of papers at the Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) as well as the Workshop on Structured Linguistic Data and Evaluation (SLiDE), co-located with the Language Resources and Evaluation Conference (LREC 2026)
TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic
2026. TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic. Proceedings of the 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7), co-located with the Language Resources and Evaluation Conference (LREC 2026). accepted.BibTeX@inproceedings{Abusaleh:et:al:2026:sarf, title = {TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic}, author = {Abusaleh, Ali and Verma, Bhuvanesh and Mehler, Alexander}, booktitle = {Proceedings of the 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7), co-located with the Language Resources and Evaluation Conference (LREC 2026)}, eventdate = {May, 2026}, location = {Palma, Mallorca, Spain}, year = {2026}, keywords = {NLP, Sentiment Analysis, Arabic analysis, new-data-spaces, circlet, satek}, abstract = {Arabic sentiment analysis is challenged by morphological complexity and lexical variation across Arabic dialects, compounded by subjectivity in how speakers and writers express sentiment. In this paper, we present our submission for the AraSentEval 2026 Shared Task on Arabic Dialect Sentiment Analysis. We propose SARF (صرف) a multi-view architectural framework that integrates surface-level context with stemmed and rooted morphological perspectives using a shared MARBERTv2 encoder. Our system employs a hybrid BERT-CNN-BiLSTM-Attention architecture to capture both local sentiment n-grams and global sequential dependencies. Experimental results show that while individual morphological normalization strategies (stemming or rooting) may degrade performance, their joint integration via cross-morphological attention provides robust features across diverse dialects. Our final system achieved a competitive macro-F1-score of 0.9263, ranking 2nd out of 15 participating teams.}, note = {accepted} }Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP
2026. Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP. Proceedings Workshop on Structured Linguistic Data and Evaluation (SLiDE 2026), co-located with the Language Resources and Evaluation Conference (LREC 2026). accepted.BibTeX@inproceedings{Hammerla:Mehler:2026:a, title = {{Gutenberg+}: A More Temporally Faithful Corpus for Diachronic {NLP}}, author = {Leon Hammerla and ALexander Mehler}, booktitle = {Proceedings Workshop on Structured Linguistic Data and Evaluation (SLiDE 2026), co-located with the Language Resources and Evaluation Conference (LREC 2026)}, address = {Palma de Mallorca (Spain)}, year = {2026}, keywords = {neglab}, note = {accepted} } -
Second phase for proposals: “New Data Spaces for the Social Sciences” (SPP 2431)
by

We are pleased to announce that the call for proposals for the second funding period of the DFG Infrastructure Priority Programme “New Data Spaces for the Social Sciences” (SPP 2431) is now open. SPP 2431 promotes methodological innovations at the intersection of the social and computer sciences to future-proof panel studies and survey research through new methodological approaches.
Key Deadlines:
- April 10, 2026: Submit a one-page project sketch to the program management via email.
- September 16, 2026: Final deadline for full proposals.
Further Information: Details regarding funding priorities, the submission process, and an upcoming preparation workshop for applicants can be found here:
We look forward to your contributions.
Wir freuen uns, Ihnen mitteilen zu können, dass die Ausschreibung für die zweite Förderperiode des DFG-Infrastruktur-Schwerpunktprogramms „New Data Spaces for the Social Sciences“ (SPP 2431) ab sofort offen ist. Das SPP 2431 fördert methodische Innovationen an der Schnittstelle von Sozial- und Informatikwissenschaften, um Panelstudien und die Umfrageforschung durch neue methodische Ansätze zukunftssicher zu gestalten.
Wichtige Fristen:
- 10. April 2026: Einreichung einer einseitigen Projektskizze per E-Mail an die Geschäftsführung.
- 16. September 2026: Frist für die Einreichung der vollständigen Projektanträge.
Weitere Informationen: Details zu den Förderschwerpunkten, dem Einreichungsverfahren sowie einem geplanten Vorbereitungsworkshop für Antragstellende finden Sie hier:
Wir freuen uns auf Ihre Beiträge.
-
Three publications accepted at LREC 2026
by
The following papers have been accepted for publication in the proceedings of the Language Resources and Evaluation Conference 2026.
GhostWriter: Hidden AI-Generated Texts Over Multiple Languages, Domains and Generators
2026. GhostWriter: Hidden AI-Generated Texts Over Multiple Languages, Domains and Generators. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.BibTeX@inproceedings{Schaaf:et:al:2026, title = {{GhostWriter}: Hidden {AI}-Generated Texts Over Multiple Languages, Domains and Generators}, booktitle = {Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)}, year = {2026}, author = {Schaaf, Manuel and Bönisch, Kevin and Mehler, Alexander}, keywords = {Corpus, Natural Language Generation; Validation of LRs, AI-generated Text Detection, core, core_b05}, abstract = {The advent of Transformer-based Large Language Models (LLMs) has led to an unprecedented surge of AI-generated text (AIGT) across online platforms and academic domains. While these models exhibit near-human fluency and stylistic coherence, their widespread adoption has raised concerns about authorship integrity, research quality, and the recursive contamination of training corpora with synthetic data. These developments underscore the need for reliable AIGT detection methods and benchmark datasets, particularly for malicious or deceptive ghostwriting scenarios where AIGT is intentionally crafted to evade detection. To address this, we present GhostWriter, a large-scale, bilingual (German and English), multi-generator, and multi-domain dataset for AIGT detection. The dataset comprises human- and AI-authored texts produced under domain-specific ghostwriting conditions, including examples intentionally embedded within otherwise human-written texts to obscure their AI origin. With GhostWriter, we (i) aim to expand the resources available for German AIGT datasets, (ii) emphasize mixed or fused synthesizations---since most existing corpora are limited to the document level---and (iii) introduce specifically crafted malicious ghostwriting scenarios across multiple domains and generators.}, note = {accepted} }Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer
2026. Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.BibTeX@inproceedings{Dahmann:et:al:2026, title = {Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer}, booktitle = {Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)}, year = {2026}, author = {Dahmann, Thiemo and Schneider, Julian and Stephan, Philipp and Abrami, Giuseppe and Mehler, Alexander}, keywords = {NLP, UIMA, Annotations, dynamic visualization, uce}, abstract = {The automatic and manual annotation of unstructured corpora is a daily task in various scientific fields, which is supported by a variety of existing software solutions. Despite this variety, there are currently only limited solutions for visualizing annotations, especially with regard to dynamic generation and interaction. To bridge this gap and to visualize and provide annotated corpora based on user-, project- or corpus-specific aspects, Unified Dynamic Annotation Visualizer (UDAV) was developed. UDAV is designed as a web-based solution that implements a number of essential features which comparable tools do not support to enable a customizable and extensible toolbox for interacting with annotations, allowing the integration into existing big data frameworks.}, note = {accepted} }Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus
2026. Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.BibTeX@inproceedings{Verma:Mehler:2026, title = {Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus}, booktitle = {Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)}, year = {2026}, author = {Verma, Bhuvanesh and Mehler, Alexander}, keywords = {Topic Evolution, Topic Network,Time-aware Networks, Temporal Autocorrelation, Project Gutenberg, satek}, abstract = {Although temporal topic modeling has been widely applied to scientific and legal texts, literary corpora have largely been overlooked in this regard. To address this issue, we analyze topic evolution in a subset of the Project Gutenberg (PG) corpus. We model this subset as a sequence of topic networks that capture the emergence, persistence, and interaction of thematic structures over decades. Using supervised topic representations, we predict nodes (topics) and edges (topic pairings) to forecast future topics and their co-occurrence. Our experiments demonstrate moderate to strong temporal persistence in topic connectivity patterns across three topic systems, with ROC-AUC and AP values consistently above 0.85. We find that the temporal span of topic networks significantly impacts predictive performance: longer spans improve the stability and recall of topic presence, while shorter spans better capture evolving topic relationships. Overall, our findings demonstrate the predictability of topics in literary texts over time.}, note = {accepted} }
Sign up to our mailing list to receive news updates.
Click here to see all recent news.
