General

Three publications accepted at LREC 2026

The following papers have been accepted for publication in the proceedings of the Language Resources and Evaluation Conference 2026.

GhostWriter: Hidden AI-Generated Texts Over Multiple Languages, Domains and Generators

Manuel Schaaf, Kevin Bönisch and Alexander Mehler. May, 2026. GhostWriter: Hidden AI-Generated Texts over Multiple Languages, Domains and Generators. Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 10497–10516.
BibTeX
@inproceedings{Schaaf:et:al:2026,
  title     = {GhostWriter: Hidden AI-Generated Texts over Multiple Languages,
               Domains and Generators},
  author    = {Schaaf, Manuel and Bönisch, Kevin and Mehler, Alexander},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation
               Conference (LREC 2026)},
  month     = {May},
  year      = {2026},
  pages     = {10497--10516},
  keywords  = {Corpus, Natural Language Generation; Validation of LRs, AI-generated Text Detection, core, core_b05},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy
               and Krek, Simon and Toral, Antonio},
  doi       = {10.63317/57fd7juh5zek},
  abstract  = {The advent of Transformer-based Large Language Models (LLMs) has
               led to an unprecedented surge of AI-generated text (AIGT) across
               online platforms and academic domains. While these models exhibit
               near-human fluency and stylistic coherence, their widespread adoption
               has raised concerns about authorship integrity, research quality,
               and the recursive contamination of training corpora with synthetic
               data. These developments underscore the need for reliable AIGT
               detection methods and benchmark datasets, particularly for malicious
               or deceptive *ghostwriting* scenarios where AIGT is intentionally
               crafted to evade detection. To address this, we present **GhostWriter**,
               a large-scale, bilingual (German and English), multi-generator,
               and multi-domain dataset for AIGT detection. The dataset comprises
               human- and AI-authored texts produced under domain-specific *ghostwriting*
               conditions, including examples intentionally embedded within otherwise
               human-written texts to obscure their AI origin. With **GhostWriter**,
               we (i) aim to expand the resources available for German AIGT datasets,
               (ii) emphasize mixed or fused synthesizations—since most existing
               corpora are limited to the document level—and (iii) introduce
               specifically crafted malicious ghostwriting scenarios across multiple
               domains and generators.}
}

Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer

Thiemo Dahmann, Julian Schneider, Philipp Stephan, Giuseppe Abrami and Alexander Mehler. 2026. Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer. Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 6695–6705.
BibTeX
@inproceedings{Dahmann:et:al:2026,
  title     = {Towards the Generation and Application of Dynamic Web-Based Visualization
               of UIMA-based Annotations for Big-Data Corpora with the Help of
               Unified Dynamic Annotation Visualizer},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation
               Conference (LREC 2026)},
  year      = {2026},
  pages     = {6695--6705},
  author    = {Dahmann, Thiemo and Schneider, Julian and Stephan, Philipp and Abrami, Giuseppe
               and Mehler, Alexander},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy
               and Krek, Simon and Toral, Antonio},
  doi       = {10.63317/5ce2aaity4yz},
  keywords  = {NLP, UIMA, Annotations, dynamic visualization, uce},
  abstract  = {The automatic and manual annotation of unstructured corpora is
               a routine task in many scientific fields and is supported by a
               variety of existing software solutions. Despite this variety,
               few solutions currently support annotation visualization, especially
               for dynamic generation and interaction. To bridge this gap and
               visualize annotated corpora based on user-, project-, or corpus-specific
               aspects, we developed Unified Dynamic Annotation Visualizer (UDAV).
               UDAV is a web-based solution that implements features not supported
               by comparable tools, enabling a customizable and extensible toolbox
               for interacting with annotations and allowing integration into
               existing big-data frameworks. We exemplify UDAV through a range
               of visualizations and also provide an evaluation of corpus import
               and processing performance.},
  pdf       = {http://www.lrec-conf.org/proceedings/lrec2026/pdf/2026.lrec2026-1.533.pdf},
  video     = {https://www.youtube.com/watch?v=LFBiGlmEDog}
}

Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus

Bhuvanesh Verma and Alexander Mehler. 2026. Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus. Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 860–869.
BibTeX
@inproceedings{Verma:Mehler:2026,
  title     = {Predicting Topic (Co-)Occurrence Using Topic Networks Built from
               the Project Gutenberg Corpus},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation
               Conference (LREC 2026)},
  pages     = {860--869},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy
               and Krek, Simon and Toral, Antonio},
  year      = {2026},
  author    = {Verma, Bhuvanesh and Mehler, Alexander},
  doi       = {10.63317/58x3h7gjbpb4},
  keywords  = {Topic Evolution, Topic Network,Time-aware Networks, Temporal Autocorrelation, Project Gutenberg, satek},
  abstract  = {Although temporal topic modeling has been widely applied to scientific
               and legal texts, literary corpora have largely been overlooked
               in this regard. To address this issue, we analyze topic evolution
               in a subset of the Project Gutenberg (PG) corpus. We model this
               subset as a sequence of topic networks that capture the emergence,
               persistence, and interaction of thematic structures over decades.
               Using supervised topic representations, we predict nodes (topics)
               and edges (topic pairings) to forecast future topics and their
               co-occurrence. Our experiments demonstrate moderate to strong
               temporal persistence in topic connectivity patterns across three
               topic systems, with ROC-AUC and AP values consistently above 0.85.
               We find that the temporal span of topic networks significantly
               impacts predictive performance: longer spans improve the stability
               and recall of topic presence, while shorter spans better capture
               evolving topic relationships. Overall, our findings demonstrate
               the predictability of topics in literary texts over time.} pdf
               = {http://www.lrec-conf.org/proceedings/lrec2026/pdf/2026.lrec2026-1.65.pdf}
}

New publication accepted at IEEE ICNLP 2026

We are pleased to inform you about the acceptance of a new paper at IEEE’s 2026 8th International Conference on Natural Language Processing (ICNLP) entitled:

Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution

Ali Abusaleh, Leon Hammerla and Alexander Mehler. 2026. Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution. 2026 8th International Conference on Natural Language Processing (ICNLP). accepted.
BibTeX
@inproceedings{Abusaleh:et:al:2026,
  title     = {Learning to Detect Cross-Modal Negation: An Analysis of Latent
               Representations and an Attention-Based Solution},
  author    = {Abusaleh, Ali and Hammerla, Leon and Mehler, Alexander},
  booktitle = {2026 8th International Conference on Natural Language Processing (ICNLP)},
  eventdate = {2026-03-20/2026-03-22},
  location  = {Xi'an,China},
  year      = {2026},
  keywords  = {Vision language model, Natural language processing, Cross-modal retrieval, negation detection, video analysis, Multimodal analysis, Political Communication, neglab, new-data-spaces, circlet},
  abstract  = {Detecting high-level semantic concepts like negation across modalities
               remains a challenge for current multimodal systems. We analyze
               this as a fundamental representation learning problem, providing
               the first evidence that negation does not form a linearly or non-linearly
               separable class in the latent spaces of standard vision-language
               models (VLMs). We demonstrate that pretrained embeddings primarily
               encode modality-specific features, lacking a generalizable negation
               signal. To overcome this, we propose a novel cross-modal attention
               architecture that explicitly models inter-modal dependencies,
               achieving performance gains of up to +7.03% F1 over unimodal baselines.
               Our analysis reveals a key asymmetry: while textual negation often
               appears independently, visual negation is semantically dependent
               on linguistic context, a finding validated through our statistical
               analysis of 3,222 political video-text pairs automatically annotated
               via Qwen2.5-VL. By combining this analysis with self-supervised
               video representations (JEPA2), we advance the modeling of temporal
               negation. This work provides new methods and insights for learning
               robust, semantically-aligned representations in multimodal systems.},
  note      = {accepted}
}

New article published at SoftwareX

The following article is published in the journal SoftwareX:

DUUIgateway: A Web Service for Platform-independent, Ubiquitous Big Data NLP

Cedric Borkowski, Giuseppe Abrami, Dawit Terefe, Daniel Baumartz and Alexander Mehler. 2026. DUUIgateway: A Web Service for Platform-independent, Ubiquitous Big Data NLP. SoftwareX, 34:102549.
BibTeX
@article{Borkowski:et:al:2026,
  title     = {{DUUIgateway}: A Web Service for Platform-independent, Ubiquitous Big Data NLP},
  journal   = {SoftwareX},
  volume    = {34},
  pages     = {102549},
  year      = {2026},
  issn      = {2352-7110},
  doi       = {https://doi.org/10.1016/j.softx.2026.102549},
  url       = {https://www.sciencedirect.com/science/article/pii/S2352711026000439},
  author    = {Borkowski, Cedric and Abrami, Giuseppe and Terefe, Dawit and Baumartz, Daniel
               and Mehler, Alexander},
  keywords  = {duui, neglab, core, core_b05, core_c08, new-data-spaces, circlet},
  abstract  = {Distributed processing of unstructured text data is a challenge
               in the rapidly changing and evolving natural language processing
               (NLP) landscape. This landscape is characterized by heterogeneous
               systems, models, and formats, and especially by the increasing
               influence of AI systems. While many of these systems handle text
               data, there are also unified systems that process multiple input
               and output formats, while allowing for distributed corpus processing.
               However, there are hardly any user-friendly interfaces that allow
               existing NLP frameworks to be used flexibly and extended in a
               user-controlled manner. Due to this gap and the increasing importance
               of NLP for various scientific disciplines, there has been a demand
               for a web and API based flexible software solution for deploying,
               managing and monitoring NLP systems. Such a solution is provided
               by Docker Unified UIMA-gateway. We introduce DUUIgateway and evaluate
               its API and user-driven approach to encapsulation. We also describe
               how these features improve the usability and accessibility of
               the NLP framework DUUI. We illustrate DUUIgateway in the field
               of process modeling in higher education and show how it closes
               the latter gap in NLP by making a variety of systems for processing
               text and multimodal data accessible to non-experts.}
}

Invited talk at DaFWEBKON26

Andy Lücking and Alexander Mehler have been invited to give a talk at the Web Conference for German Teachers 2026. The topic of the speech is: “Language-accompanying gestures, AI and virtual reality – multimodal communication research at the intersection of linguistics and computer science”.

Andy Lücking and Alexander Mehler. 2026–01–28/2026–01–30. Sprachbegleitende Gesten, KI und Virtuelle Realität. Invited talk.
BibTeX
@misc{Luecking:Mehler:2026,
  author    = {Lücking, Andy and Mehler, Alexander},
  title     = {{Sprachbegleitende Gesten, KI und Virtuelle Realität}},
  subtitle  = {{Multimodale Kommunikationsforschung im Schnittfeld von Linguistik und Computerwissenschaft}},
  howpublished = {Invited talk at DaFWEBKON26, Webkonferenz für
                  Deutschlehrende},
  date      = {2026-01-28/2026-01-30},
  url       = {https://dafwebkon.com/events/sprachbegleitende-gesten/},
  keywords  = {talk, cosgrin-vr},
  note      = {Invited talk},
  abstract  = {Alltagskommunikation ist üblicherweise multimodal (d.h., nutzt
               mehr als einen Informationskanal). Gesprochene Sprache wird beispielsweise
               von manuellen Gesten begleitet. Diese Gesten wiederum können über
               die linguistische Bedeutung hinausgehende Information beitragen.
               Sie sind also semantisch interessant.<br><br>Der Vortrag skizziert
               eine räumliche Gestensemantik und führt in KI-gestützte Gestenklassifikation
               ein. Um multimodale Verhaltensdaten zu erfassen und auszuwerten,
               werden zunehmend Methoden der Virtuellen Realität (VR) eingesetzt.
               Das Frankfurter Va.Si.Li-Lab kombiniert KI und VR für Multimodalitätsforschung.
               Auf diese Weise lassen sich z.B. mutlimodal, avatarbasierte VR-Interaktionen
               untersuchen und mit Face-to-face-Interaktionen vergleichen. Der
               Vortrag stellt erste Ergebnisse vor.}
}