- Three publications accepted at LREC 2026by Leon Hammerla
The following papers have been accepted for publication in the proceedings of the Language Resources and Evaluation Conference 2026.
GhostWriter: Hidden AI-Generated Texts Over Multiple Languages, Domains and Generators
Manuel Schaaf, Kevin Bönisch and Alexander Mehler. 2026. GhostWriter: Hidden AI-Generated Texts Over Multiple Languages, Domains and Generators. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.BibTeX@inproceedings{Schaaf:et:al:2026, title = {{GhostWriter}: Hidden {AI}-Generated Texts Over Multiple Languages, Domains and Generators}, booktitle = {Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)}, year = {2026}, author = {Schaaf, Manuel and Bönisch, Kevin and Mehler, Alexander}, keywords = {Corpus, Natural Language Generation; Validation of LRs, AI-generated Text Detection, core, core_b05}, abstract = {The advent of Transformer-based Large Language Models (LLMs) has led to an unprecedented surge of AI-generated text (AIGT) across online platforms and academic domains. While these models exhibit near-human fluency and stylistic coherence, their widespread adoption has raised concerns about authorship integrity, research quality, and the recursive contamination of training corpora with synthetic data. These developments underscore the need for reliable AIGT detection methods and benchmark datasets, particularly for malicious or deceptive ghostwriting scenarios where AIGT is intentionally crafted to evade detection. To address this, we present GhostWriter, a large-scale, bilingual (German and English), multi-generator, and multi-domain dataset for AIGT detection. The dataset comprises human- and AI-authored texts produced under domain-specific ghostwriting conditions, including examples intentionally embedded within otherwise human-written texts to obscure their AI origin. With GhostWriter, we (i) aim to expand the resources available for German AIGT datasets, (ii) emphasize mixed or fused synthesizations---since most existing corpora are limited to the document level---and (iii) introduce specifically crafted malicious ghostwriting scenarios across multiple domains and generators.}, note = {accepted} }Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer
Thiemo Dahmann, Julian Schneider, Philipp Stephan, Giuseppe Abrami and Alexander Mehler. 2026. Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.BibTeX@inproceedings{Dahmann:et:al:2026, title = {Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer}, booktitle = {Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)}, year = {2026}, author = {Dahmann, Thiemo and Schneider, Julian and Stephan, Philipp and Abrami, Giuseppe and Mehler, Alexander}, keywords = {NLP, UIMA, Annotations, dynamic visualization, uce}, abstract = {The automatic and manual annotation of unstructured corpora is a daily task in various scientific fields, which is supported by a variety of existing software solutions. Despite this variety, there are currently only limited solutions for visualizing annotations, especially with regard to dynamic generation and interaction. To bridge this gap and to visualize and provide annotated corpora based on user-, project- or corpus-specific aspects, Unified Dynamic Annotation Visualizer (UDAV) was developed. UDAV is designed as a web-based solution that implements a number of essential features which comparable tools do not support to enable a customizable and extensible toolbox for interacting with annotations, allowing the integration into existing big data frameworks.}, note = {accepted} }Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus
Bhuvanesh Verma and Alexander Mehler. 2026. Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus. Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026). accepted.BibTeX@inproceedings{Verma:Mehler:2026, title = {Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus}, booktitle = {Proceedings of the 15th International Conference on Language Resources and Evaluation (LREC 2026)}, year = {2026}, author = {Verma, Bhuvanesh and Mehler, Alexander}, keywords = {Topic Evolution, Topic Network,Time-aware Networks, Temporal Autocorrelation, Project Gutenberg, satek}, abstract = {Although temporal topic modeling has been widely applied to scientific and legal texts, literary corpora have largely been overlooked in this regard. To address this issue, we analyze topic evolution in a subset of the Project Gutenberg (PG) corpus. We model this subset as a sequence of topic networks that capture the emergence, persistence, and interaction of thematic structures over decades. Using supervised topic representations, we predict nodes (topics) and edges (topic pairings) to forecast future topics and their co-occurrence. Our experiments demonstrate moderate to strong temporal persistence in topic connectivity patterns across three topic systems, with ROC-AUC and AP values consistently above 0.85. We find that the temporal span of topic networks significantly impacts predictive performance: longer spans improve the stability and recall of topic presence, while shorter spans better capture evolving topic relationships. Overall, our findings demonstrate the predictability of topics in literary texts over time.}, note = {accepted} } - New publication accepted at IEEE ICNLP 2026by Ali Abusaleh
We are pleased to inform you about the acceptance of a new paper at IEEE’s 2026 8th International Conference on Natural Language Processing (ICNLP) entitled:
Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution
Ali Abusaleh, Leon Hammerla and Alexander Mehler. 2026. Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution. 2026 8th International Conference on Natural Language Processing (ICNLP). accepted.BibTeX@inproceedings{Abusaleh:et:al:2026, title = {Learning to Detect Cross-Modal Negation: An Analysis of Latent Representations and an Attention-Based Solution}, author = {Abusaleh, Ali and Hammerla, Leon and Mehler, Alexander}, booktitle = {2026 8th International Conference on Natural Language Processing (ICNLP)}, eventdate = {2026-03-20/2026-03-22}, location = {Xi'an,China}, year = {2026}, keywords = {Vision language model, Natural language processing, Cross-modal retrieval, negation detection, video analysis, Multimodal analysis, Political Communication, neglab, new-data-spaces, circlet}, abstract = {Detecting high-level semantic concepts like negation across modalities remains a challenge for current multimodal systems. We analyze this as a fundamental representation learning problem, providing the first evidence that negation does not form a linearly or non-linearly separable class in the latent spaces of standard vision-language models (VLMs). We demonstrate that pretrained embeddings primarily encode modality-specific features, lacking a generalizable negation signal. To overcome this, we propose a novel cross-modal attention architecture that explicitly models inter-modal dependencies, achieving performance gains of up to +7.03% F1 over unimodal baselines. Our analysis reveals a key asymmetry: while textual negation often appears independently, visual negation is semantically dependent on linguistic context, a finding validated through our statistical analysis of 3,222 political video-text pairs automatically annotated via Qwen2.5-VL. By combining this analysis with self-supervised video representations (JEPA2), we advance the modeling of temporal negation. This work provides new methods and insights for learning robust, semantically-aligned representations in multimodal systems.}, note = {accepted} }
