Publication

General, News, Publication

New publication within the journal PLOS ONE

We are pleased to announce that the article Syntactic language change in English and German: Metrics, parsers, and convergences has been published in PLOS ONE.

Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Dominik Schlechtweg and Steffen Eger. April, 2026. Syntactic language change in English and German: Metrics, parsers, and convergences. PLOS ONE, 21(4):1–33.

BibTeX

@article{Chen:et:al:2026,
  doi       = {10.1371/journal.pone.0346096},
  author    = {Chen, Yanran and Zhao, Wei and Breitbarth, Anne and Stoeckel, Manuel
               and Mehler, Alexander and Schlechtweg, Dominik and Eger, Steffen},
  journal   = {PLOS ONE},
  publisher = {Public Library of Science},
  title     = {Syntactic language change in English and German: Metrics, parsers,
               and convergences},
  year      = {2026},
  month     = {04},
  volume    = {21},
  url       = {https://doi.org/10.1371/journal.pone.0346096},
  pages     = {1-33},
  abstract  = {Syntactic language change has gained increasing attention in recent
               years. Previous computational work based on dependency relations
               has focused on diachronic trends in dependency distance, which
               measures the linear distance between dependent words, using dependency
               trees automatically predicted by a dependency parser (mostly the
               Stanford CoreNLP parser). In this work, we introduce a set of
               15 syntax metrics that extend the analysis beyond linear distance
               by incorporating both linear and tree graph properties of dependency
               trees, such as tree height and degree. Besides, we propose a multi-parser
               approach to reduce the impact of using specific parsers, thereby
               increasing the robustness of the detected language changes. Through
               a cross-lingual investigation of English and German in parliamentary
               debates from the last 160 years, using 6 different parsers (CoreNLP
               and five newer alternatives), we demonstrate that: (1) Relying
               on one single parser can be problematic, as the agreement on predicted
               trends can be low across parsers. (2) Our set of metrics can capture
               subtle patterns of syntactic changes. Our analysis shows that
               syntactic change over the time period inspected is largely similar
               between English and German, with only 2.2% of cases yielding opposite
               trends in these metrics. (3) We also show that changes in syntactic
               metrics seem to be more frequent at the tails of sentence length
               distributions and often move in opposite directions for short
               and long sentences. To our best knowledge, ours is the most comprehensive
               computational analysis of syntactic language change using modern
               NLP technology in recent corpora of English and German.},
  number    = {4}
}

Giuseppe Abrami
29. April 2026

General, News, Publication

New publications at SemEval-2026

We are pleased to inform you about the acceptance of papers at the International Workshop on Semantic Evaluation (SemEval-2026):

Yahya Missaoui, Solomon Kebede, Mounika Marreddy and Alexander Mehler. 2026. SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2026). accepted.

BibTeX

@inproceedings{Missaoui:et:al:2026,
  title     = {SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis},
  author    = {Missaoui, Yahya and Kebede, Solomon and Marreddy, Mounika and Mehler, Alexander},
  booktitle = {Proceedings of the International Workshop on Semantic Evaluation (SemEval-2026)},
  year      = {2026},
  publisher = {Association for Computational Linguistics},
  note      = {accepted}
}

Noah Tratzsch, Asmaa Al-Raian, Mounika Marreddy and Alexander Mehler. 2026. SemEval-2026 Task 11: Reducing Content Effects Using Layered Activation Steering. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2026). accepted.

BibTeX

@inproceedings{Tratzsch:et:al2026,
  title     = {SemEval-2026 Task 11: Reducing Content Effects Using Layered Activation Steering},
  author    = {Tratzsch, Noah and Al-Raian, Asmaa and Marreddy, Mounika and Mehler, Alexander},
  booktitle = {Proceedings of the International Workshop on Semantic Evaluation (SemEval-2026)},
  year      = {2026},
  publisher = {Association for Computational Linguistics},
  note      = {accepted}
}

Samuel Richer, Mounika Marreddy and Alexander Mehler. 2026. TTLab at SemEval-2026 Task 10: Transformer-based Approaches for Psycholinguistic Conspiracy Detection in Social Media Discourse. Proceedings of the International Workshop on Semantic Evaluation (SemEval-2026). accepted.

BibTeX

@inproceedings{Richer:et:al:2026,
  title     = {TTLab at SemEval-2026 Task 10: Transformer-based Approaches for
               Psycholinguistic Conspiracy Detection in Social Media Discourse},
  author    = {Richer, Samuel and Marreddy, Mounika and Mehler, Alexander},
  booktitle = {Proceedings of the International Workshop on Semantic Evaluation (SemEval-2026)},
  year      = {2026},
  publisher = {Association for Computational Linguistics},
  note      = {accepted}
}

Giuseppe Abrami
17. April 2026

General, News, Publication

New workshop publications at LREC 2026

We are pleased to inform you about the acceptance of papers at the Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) as well as the Workshop on Structured Linguistic Data and Evaluation (SLiDE), co-located with the Language Resources and Evaluation Conference (LREC 2026)

TTLab at AraSentEval: SARF (صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic

Ali Abusaleh, Bhuvanesh Verma and Alexander Mehler. May, 2026. TTLab at AraSentEval: SARF( صرف) Sentiment Analysis via Root-based Fusion for Multi-Dialectal Arabic. The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7) with 5 Shared Tasks, 262–268.

BibTeX

@inproceedings{Abusaleh:et:al:2026:sarf,
  title     = {TTLab at AraSentEval: SARF( صرف) Sentiment Analysis via Root-based
               Fusion for Multi-Dialectal Arabic},
  author    = {Abusaleh, Ali and Verma, Bhuvanesh and Mehler, Alexander},
  booktitle = {The 7th Workshop on Open-Source Arabic Corpora and Processing
               Tools (OSACT7) with 5 Shared Tasks},
  month     = {May},
  year      = {2026},
  pages     = {262--268},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Al-Khalifa, Hend and El-Haj, Mo and Ezzini, Saad},
  doi       = {10.63317/4wj6s3ys5osk},
  keywords  = {NLP, Sentiment Analysis, Arabic analysis, new-data-spaces, circlet, satek},
  abstract  = {Arabic sentiment analysis is challenged by morphological complexity
               and lexical variation across Arabic dialects, compounded by subjectivity
               in how speakers and writers express sentiment. In this paper,
               we present our submission for the AraSentEval 2026 Shared Task
               on Arabic Dialect Sentiment Analysis. We propose SARF (صرف) a
               multi-view architectural framework that integrates surface-level
               context with stemmed and rooted morphological perspectives using
               a shared MARBERTv2 encoder. Our system employs a hybrid BERT-CNN-BiLSTM-Attention
               architecture to capture both local sentiment n-grams and global
               sequential dependencies. Experimental results show that while
               individual morphological normalization strategies (stemming or
               rooting) may degrade performance, their joint integration via
               cross-morphological attention provides robust features across
               diverse dialects. Our final system achieved a competitive macro-F1-score
               of 0.9263, ranking 2nd out of 15 participating teams.}
}

Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP

Leon Hammerla and Alexander Mehler. 2026. Gutenberg+: A More Temporally Faithful Corpus for Diachronic NLP. Proceedings Workshop on Structured Linguistic Data and Evaluation (SLiDE 2026), co-located with the Language Resources and Evaluation Conference (LREC 2026), 86–92.

BibTeX

@inproceedings{Hammerla:Mehler:2026:a,
  title     = {{Gutenberg+}: A More Temporally Faithful Corpus for Diachronic {NLP}},
  author    = {Leon Hammerla and Alexander Mehler},
  booktitle = {Proceedings Workshop on Structured Linguistic Data and Evaluation
               (SLiDE 2026), co-located with the Language Resources and Evaluation
               Conference (LREC 2026)},
  year      = {2026},
  keywords  = {neglab},
  pages     = {86--92},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Erhard Hinrichs (Tübingen University, Germany) and Joakim Nivre (Uppsala University, Sweden)
               and Petya Osenova (Sofia University, Bulgaria) and James Pustejovsky (Brandeis University, USA)
               and Claus Zinn (Tübingen University, Germany)},
  doi       = {10.63317/2kjofgrkkbt9},
  abstract  = {We introduce Gutenberg+, a temporally more faithful version of
               the Project Gutenberg (PG) corpus, one of the most widely used
               resources for diachronic text analysis. Despite its popularity,
               the PG corpus contains a major yet overlooked flaw: around 15%
               of its entries are collections (e.g., anthologies of books, letters,
               or poems) rather than atomic works, which distorts temporal analyses
               since such collections may span multiple decades. We present an
               automatic method to detect and split these collections into their
               constituent works, producing a finer-grained and temporally consistent
               corpus. We further re-annotate publication years using LLM-based
               retrieval-augmented generative methods, demonstrating the potential
               of LLMs to enhance structured linguistic resources. To illustrate
               the utility of Gutenberg+, we conduct a small-scale diachronic
               case study on negation, showing that our refined corpus captures
               more nuanced cross-linguistic variation than the original PG data.
               Finally, we release the corpus in UIMA format with full metadata
               and linguistic annotations, providing a standardized resource
               for future research on diachronic language change.}
}

Ali Abusaleh
19. March 2026

General, News, Publication

Three publications accepted at LREC 2026

The following papers have been accepted for publication in the proceedings of the Language Resources and Evaluation Conference 2026.

GhostWriter: Hidden AI-Generated Texts Over Multiple Languages, Domains and Generators

Manuel Schaaf, Kevin Bönisch and Alexander Mehler. May, 2026. GhostWriter: Hidden AI-Generated Texts over Multiple Languages, Domains and Generators. Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 10497–10516.

BibTeX

@inproceedings{Schaaf:et:al:2026,
  title     = {GhostWriter: Hidden AI-Generated Texts over Multiple Languages,
               Domains and Generators},
  author    = {Schaaf, Manuel and Bönisch, Kevin and Mehler, Alexander},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation
               Conference (LREC 2026)},
  month     = {May},
  year      = {2026},
  pages     = {10497--10516},
  keywords  = {Corpus, Natural Language Generation; Validation of LRs, AI-generated Text Detection, core, core_b05},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy
               and Krek, Simon and Toral, Antonio},
  doi       = {10.63317/57fd7juh5zek},
  abstract  = {The advent of Transformer-based Large Language Models (LLMs) has
               led to an unprecedented surge of AI-generated text (AIGT) across
               online platforms and academic domains. While these models exhibit
               near-human fluency and stylistic coherence, their widespread adoption
               has raised concerns about authorship integrity, research quality,
               and the recursive contamination of training corpora with synthetic
               data. These developments underscore the need for reliable AIGT
               detection methods and benchmark datasets, particularly for malicious
               or deceptive *ghostwriting* scenarios where AIGT is intentionally
               crafted to evade detection. To address this, we present **GhostWriter**,
               a large-scale, bilingual (German and English), multi-generator,
               and multi-domain dataset for AIGT detection. The dataset comprises
               human- and AI-authored texts produced under domain-specific *ghostwriting*
               conditions, including examples intentionally embedded within otherwise
               human-written texts to obscure their AI origin. With **GhostWriter**,
               we (i) aim to expand the resources available for German AIGT datasets,
               (ii) emphasize mixed or fused synthesizations—since most existing
               corpora are limited to the document level—and (iii) introduce
               specifically crafted malicious ghostwriting scenarios across multiple
               domains and generators.}
}

Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer

Thiemo Dahmann, Julian Schneider, Philipp Stephan, Giuseppe Abrami and Alexander Mehler. 2026. Towards the Generation and Application of Dynamic Web-Based Visualization of UIMA-based Annotations for Big-Data Corpora with the Help of Unified Dynamic Annotation Visualizer. Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 6695–6705.

BibTeX

@inproceedings{Dahmann:et:al:2026,
  title     = {Towards the Generation and Application of Dynamic Web-Based Visualization
               of UIMA-based Annotations for Big-Data Corpora with the Help of
               Unified Dynamic Annotation Visualizer},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation
               Conference (LREC 2026)},
  year      = {2026},
  pages     = {6695--6705},
  author    = {Dahmann, Thiemo and Schneider, Julian and Stephan, Philipp and Abrami, Giuseppe
               and Mehler, Alexander},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy
               and Krek, Simon and Toral, Antonio},
  doi       = {10.63317/5ce2aaity4yz},
  keywords  = {NLP, UIMA, Annotations, dynamic visualization, uce},
  abstract  = {The automatic and manual annotation of unstructured corpora is
               a routine task in many scientific fields and is supported by a
               variety of existing software solutions. Despite this variety,
               few solutions currently support annotation visualization, especially
               for dynamic generation and interaction. To bridge this gap and
               visualize annotated corpora based on user-, project-, or corpus-specific
               aspects, we developed Unified Dynamic Annotation Visualizer (UDAV).
               UDAV is a web-based solution that implements features not supported
               by comparable tools, enabling a customizable and extensible toolbox
               for interacting with annotations and allowing integration into
               existing big-data frameworks. We exemplify UDAV through a range
               of visualizations and also provide an evaluation of corpus import
               and processing performance.},
  pdf       = {http://www.lrec-conf.org/proceedings/lrec2026/pdf/2026.lrec2026-1.533.pdf},
  video     = {https://www.youtube.com/watch?v=LFBiGlmEDog}
}

Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus

Bhuvanesh Verma and Alexander Mehler. 2026. Predicting Topic (Co-)Occurrence Using Topic Networks Built from the Project Gutenberg Corpus. Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), 860–869.

BibTeX

@inproceedings{Verma:Mehler:2026,
  title     = {Predicting Topic (Co-)Occurrence Using Topic Networks Built from
               the Project Gutenberg Corpus},
  booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation
               Conference (LREC 2026)},
  pages     = {860--869},
  address   = {Palma, Mallorca, Spain},
  publisher = {European Language Resources Association (ELRA)},
  editor    = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy
               and Krek, Simon and Toral, Antonio},
  year      = {2026},
  author    = {Verma, Bhuvanesh and Mehler, Alexander},
  doi       = {10.63317/58x3h7gjbpb4},
  keywords  = {Topic Evolution, Topic Network,Time-aware Networks, Temporal Autocorrelation, Project Gutenberg, satek},
  abstract  = {Although temporal topic modeling has been widely applied to scientific
               and legal texts, literary corpora have largely been overlooked
               in this regard. To address this issue, we analyze topic evolution
               in a subset of the Project Gutenberg (PG) corpus. We model this
               subset as a sequence of topic networks that capture the emergence,
               persistence, and interaction of thematic structures over decades.
               Using supervised topic representations, we predict nodes (topics)
               and edges (topic pairings) to forecast future topics and their
               co-occurrence. Our experiments demonstrate moderate to strong
               temporal persistence in topic connectivity patterns across three
               topic systems, with ROC-AUC and AP values consistently above 0.85.
               We find that the temporal span of topic networks significantly
               impacts predictive performance: longer spans improve the stability
               and recall of topic presence, while shorter spans better capture
               evolving topic relationships. Overall, our findings demonstrate
               the predictability of topics in literary texts over time.} pdf
               = {http://www.lrec-conf.org/proceedings/lrec2026/pdf/2026.lrec2026-1.65.pdf}
}

Leon Hammerla
16. February 2026