
The New Data Spaces for the Social Sciences programme aims to drive a surge in innovation by improving, enhancing and combining existing panel data infrastructures and emerging data sources to develop new data spaces for social science research. It integrates and consolidates skills, knowledge and expertise from different fields of empirical social research and computer science and provides the means to test new methods and procedures of data generation and data analytics.
www.new-data-spaces.de
In order to more precisely research the major societal challenges of the coming decades, including digitization, climate change, and war- and pandemic-related societal changes, and to be able to identify the need for political action on this basis, the social sciences need innovative research data and methods.
ENTAILab
ENTAILab is the core infrastructural service and research centre of the New Data Spaces programme.
The Research Infrastructure and Innovation Lab (ENTAILab) is dedicated to the use of existing research infrastructures, their advancement and the demand-oriented generation of a new research infrastructure for the needs of the InfPP projects and the development of new data spaces. ENTAILab aims to create a unique infrastructure for research-based innovations in the field of survey data and beyond.
ENTAILab consists of a set of four infrastructure measures that provide a successful and supportive environment for research within and across the projects of InfPP. Together, they will systematically feed results back into different kinds of panel applications and studies and social science research in general.
CIRCLET
ENTAILab involves the implementation, testing and provision of a strong research-oriented tool in the form of a research-driven infrastructure for advanced survey-related data (CIRCLET). CIRCLET will ensure the reproducibility and interoperability of methods working with survey data. This is done through a multi-phase strategy that drives, scales and evaluates the development of methods based on new survey data over the course of InfPP. CIRCLET develops, tests and provides generic services to open up new data and methodological horizons according to the evolving needs of InfPP.
CIRCLET is preferably used by all InfPP projects to share data and methods, test their reproducibility and interoperability, and enrich their methods. Using the Docker Unified UIMA Interface (DUUI), CIRCLET provides a distributed multi-server infrastructure that allows InfPP to containerize methods and facilitate their operation in server clusters to make them reusable. This contributes to the coherence of all InfPP projects and to making innovations available in such a way that they can be reused outside the innovating project as quickly and extensively as possible. Collaboration between projects using CIRCLET as a common platform will be massively strengthened.
CIRCLET is research-driven; it focuses on the needs of the InfPP for which there is currently no or insufficient provision, and go beyond what is offered by the NFDIs with which the InfPP collaborates in order to maximize synergies. CIRCLET includes several means to model and enhance the survey data research cycle: a multimodal data acquisition system, a machine learning system that leverages large language models and related technologies and a hub technology for securing reproducibility.
Publications
BibTeX
@inproceedings{Boenisch:et:al:2025,
title = {Towards Unified, Dynamic and Annotation-based Visualisations and
Exploration of Annotated Big Data Corpora with the Help of Unified
Corpus Explorer},
author = {B{\"o}nisch, Kevin and Abrami, Giuseppe and Mehler, Alexander},
editor = {Dziri, Nouha and Ren, Sean (Xiang) and Diao, Shizhe},
booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas
Chapter of the Association for Computational Linguistics: Human
Language Technologies (System Demonstrations)},
year = {2025},
address = {Albuquerque, New Mexico},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2025.naacl-demo.42/},
pages = {522--534},
isbn = {979-8-89176-191-9},
abstract = {The annotation and exploration of large text corpora, both automatic
and manual, presents significant challenges across multiple disciplines,
including linguistics, digital humanities, biology, and legal
science. These challenges are exacerbated by the heterogeneity
of processing methods, which complicates corpus visualization,
interaction, and integration. To address these issues, we introduce
the Unified Corpus Explorer (UCE), a standardized, dockerized,
open-source and dynamic Natural Language Processing (NLP) application
designed for flexible and scalable corpus navigation. Herein,
UCE utilizes the UIMA format for NLP annotations as a standardized
input, constructing interfaces and features around those annotations
while dynamically adapting to the corpora and their extracted
annotations. We evaluate UCE based on a user study and demonstrate
its versatility as a corpus explorer based on generative AI.},
note = {Best Demo Award},
pdf = {https://aclanthology.org/2025.naacl-demo.42.pdf},
keywords = {uce,new-data-spaces,circlet}
}
News
-
Best Demo Award at NAACL 2025
by


We are delighted that our paper “Towards Unified, Dynamic, and Annotation-based Visualizations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer” has been awarded the Best Demo Paper at this year’s annual conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025).
2025. Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations), 522–534. Best Demo Award.BibTeX@inproceedings{Boenisch:et:al:2025, title = {Towards Unified, Dynamic and Annotation-based Visualisations and Exploration of Annotated Big Data Corpora with the Help of Unified Corpus Explorer}, author = {B{\"o}nisch, Kevin and Abrami, Giuseppe and Mehler, Alexander}, editor = {Dziri, Nouha and Ren, Sean (Xiang) and Diao, Shizhe}, booktitle = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)}, year = {2025}, address = {Albuquerque, New Mexico}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.naacl-demo.42/}, pages = {522--534}, isbn = {979-8-89176-191-9}, abstract = {The annotation and exploration of large text corpora, both automatic and manual, presents significant challenges across multiple disciplines, including linguistics, digital humanities, biology, and legal science. These challenges are exacerbated by the heterogeneity of processing methods, which complicates corpus visualization, interaction, and integration. To address these issues, we introduce the Unified Corpus Explorer (UCE), a standardized, dockerized, open-source and dynamic Natural Language Processing (NLP) application designed for flexible and scalable corpus navigation. Herein, UCE utilizes the UIMA format for NLP annotations as a standardized input, constructing interfaces and features around those annotations while dynamically adapting to the corpora and their extracted annotations. We evaluate UCE based on a user study and demonstrate its versatility as a corpus explorer based on generative AI.}, note = {Best Demo Award}, pdf = {https://aclanthology.org/2025.naacl-demo.42.pdf}, keywords = {uce,new-data-spaces,circlet} }
