New Data Spaces – SPP 2431

In order to more precisely research the major societal challenges of the coming decades, including digitization, climate change, and war- and pandemic-related societal changes, and to be able to identify the need for political action on this basis, the social sciences need innovative research data and methods.

ENTAILab

ENTAILab is the core infrastructural service and research centre of the New Data Spaces programme.

The Research Infrastructure and Innovation Lab (ENTAILab) is dedicated to the use of existing research infrastructures, their advancement and the demand-oriented generation of a new research infrastructure for the needs of the InfPP projects and the development of new data spaces. ENTAILab aims to create a unique infrastructure for research-based innovations in the field of survey data and beyond.

ENTAILab consists of a set of four infrastructure measures that provide a successful and supportive environment for research within and across the projects of InfPP. Together, they will systematically feed results back into different kinds of panel applications and studies and social science research in general.   

CIRCLET

ENTAILab involves the implementation, testing and provision of a strong research-oriented tool in the form of a research-driven infrastructure for advanced survey-related data (CIRCLET). CIRCLET will ensure the reproducibility and interoperability of methods working with survey data. This is done through a multi-phase strategy that drives, scales and evaluates the development of methods based on new survey data over the course of InfPP. CIRCLET develops, tests and provides generic services to open up new data and methodological horizons according to the evolving needs of InfPP.

CIRCLET is preferably used by all InfPP projects to share data and methods, test their reproducibility and interoperability, and enrich their methods. Using the Docker Unified UIMA Interface (DUUI), CIRCLET provides a distributed multi-server infrastructure that allows InfPP to containerize methods and facilitate their operation in server clusters to make them reusable. This contributes to the coherence of all InfPP projects and to making innovations available in such a way that they can be reused outside the innovating project as quickly and extensively as possible. Collaboration between projects using CIRCLET as a common platform will be massively strengthened.

CIRCLET is research-driven; it focuses on the needs of the InfPP for which there is currently no or insufficient provision, and go beyond what is offered by the NFDIs with which the InfPP collaborates in order to maximize synergies. CIRCLET includes several means to model and enhance the survey data research cycle: a multimodal data acquisition system, a machine learning system that leverages large language models and related technologies and a hub technology for securing reproducibility. 

Publications

Kevin Bönisch, Giuseppe Abrami and Alexander Mehler. 2025. Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), 522–534. Best Demo Award.
BibTeX
@inproceedings{Boenisch:et:al:2025,
  title     = {Towards Unified, Dynamic and Annotation-based Visualisations and
               Exploration of Annotated Big Data Corpora with the Help of Unified
               Corpus Explorer},
  author    = {B{\"o}nisch, Kevin and Abrami, Giuseppe and Mehler, Alexander},
  editor    = {Dziri, Nouha and Ren, Sean (Xiang) and Diao, Shizhe},
  booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas
               Chapter of the Association for Computational Linguistics: Human
               Language Technologies (System Demonstrations)},
  year      = {2025},
  address   = {Albuquerque, New Mexico},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2025.naacl-demo.42/},
  pages     = {522--534},
  isbn      = {979-8-89176-191-9},
  abstract  = {The annotation and exploration of large text corpora, both automatic
               and manual, presents significant challenges across multiple disciplines,
               including linguistics, digital humanities, biology, and legal
               science. These challenges are exacerbated by the heterogeneity
               of processing methods, which complicates corpus visualization,
               interaction, and integration. To address these issues, we introduce
               the Unified Corpus Explorer (UCE), a standardized, dockerized,
               open-source and dynamic Natural Language Processing (NLP) application
               designed for flexible and scalable corpus navigation. Herein,
               UCE utilizes the UIMA format for NLP annotations as a standardized
               input, constructing interfaces and features around those annotations
               while dynamically adapting to the corpora and their extracted
               annotations. We evaluate UCE based on a user study and demonstrate
               its versatility as a corpus explorer based on generative AI.},
  note      = {Best Demo Award},
  pdf       = {https://aclanthology.org/2025.naacl-demo.42.pdf},
  keywords  = {uce,new-data-spaces,circlet}
}

News

  • Best Demo Award at NAACL 2025

    by

    We are delighted that our paper “Towards Unified, Dynamic, and Annotation-based Visualizations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer” has been awarded the Best Demo Paper at this year’s annual conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025).

    Kevin Bönisch, Giuseppe Abrami and Alexander Mehler. 2025. Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), 522–534. Best Demo Award.
    BibTeX
    @inproceedings{Boenisch:et:al:2025,
      title     = {Towards Unified, Dynamic and Annotation-based Visualisations and
                   Exploration of Annotated Big Data Corpora with the Help of Unified
                   Corpus Explorer},
      author    = {B{\"o}nisch, Kevin and Abrami, Giuseppe and Mehler, Alexander},
      editor    = {Dziri, Nouha and Ren, Sean (Xiang) and Diao, Shizhe},
      booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas
                   Chapter of the Association for Computational Linguistics: Human
                   Language Technologies (System Demonstrations)},
      year      = {2025},
      address   = {Albuquerque, New Mexico},
      publisher = {Association for Computational Linguistics},
      url       = {https://aclanthology.org/2025.naacl-demo.42/},
      pages     = {522--534},
      isbn      = {979-8-89176-191-9},
      abstract  = {The annotation and exploration of large text corpora, both automatic
                   and manual, presents significant challenges across multiple disciplines,
                   including linguistics, digital humanities, biology, and legal
                   science. These challenges are exacerbated by the heterogeneity
                   of processing methods, which complicates corpus visualization,
                   interaction, and integration. To address these issues, we introduce
                   the Unified Corpus Explorer (UCE), a standardized, dockerized,
                   open-source and dynamic Natural Language Processing (NLP) application
                   designed for flexible and scalable corpus navigation. Herein,
                   UCE utilizes the UIMA format for NLP annotations as a standardized
                   input, constructing interfaces and features around those annotations
                   while dynamically adapting to the corpora and their extracted
                   annotations. We evaluate UCE based on a user study and demonstrate
                   its versatility as a corpus explorer based on generative AI.},
      note      = {Best Demo Award},
      pdf       = {https://aclanthology.org/2025.naacl-demo.42.pdf},
      keywords  = {uce,new-data-spaces,circlet}
    }