Research – Text Technology Lab

Current Projects

BIOfid

BIOfid (FID Biodiversity Research) aims to make both historical and contemporary biodiversity literature available to researchers in modern, machine-readable formats across Germany. The Johann Christian Senckenberg University Library leads the project in collaboration with the Senckenberg Society for Nature Research and the Text Technology Working Group at the Goethe University Frankfurt. Scope and Objectives In…
CORE

https://core.uni-mainz.de About CORE Approximately three million students at more than 420 institutions of higher education in Germany use the Internet daily to obtain information to complete assignments and prepare for exams. Studies reveal that students lack the skills to properly search, filter, evaluate, and integrate information gained from the highly diverse and overabundant online content. The need to understand, evaluate and consequently strengthen the skills…
NegLaB

Negation is a fundamental property of human language that is tightly intertwined with human cognitive capacity. Negation allows speakers and hearers to reason about what is not the case, a unique property of human language. Thus, negation not only expresses a clearly defined and well-circumscribed grammatical function, it also interacts with various aspects of grammar…
New Data Spaces – SPP 2431

The New Data Spaces for the Social Sciences programme aims to drive a surge in innovation by improving, enhancing and combining existing panel data infrastructures and emerging data sources to develop new data spaces for social science research. It integrates and consolidates skills, knowledge and expertise from different fields of empirical social research and computer…
SATEK

This DFG-funded project (project number: 531750631) develops new methods for the thematic classification of very large text corpora in the digital humanities and social sciences. Focusing on the German Reference Corpus (DeReKo), the world’s largest collection of German-language texts, it addresses the lack of reliable topic-based metadata for heterogeneous and rapidly growing corpora. By combining…
ViCom

ViCom investigates the special features and linguistic significance of visual communication. This comprises sign languages as fully developed natural languages which exclusively rely on the visual channel for communication, but also visual means that enhance spoken language such as gestures. It aims at disclosing the specific characteristics of the visual modality as a communication channel…

Semiautomated Thematic Text Classification as a Basis for Corpus Linguistic Value-Added Services. 2024 – . Funded by DFG (531750631).

Description

The goal of the DFG-funded project “Semiautomated Thematic Text Classification as a Basis for Corpus Linguistic Value-Added Services” aims to develop a deep-learning-based system for topic classification using the Wikipedia category system and data from the Wikidata project in order to develop and test topic models for the automatic classification of texts, including those from the Leibniz Institute for the German Language (IDS) in Mannheim, on the basis of this classification system. The project is being carried out in collaboration with the IDS and the Saxon Academy of Sciences in Leipzig. The research focus is on state-of-the-art AI methods, in particular generative AI methods.

BibTeX

@project{pr-semi-automated_thematic_text_classification,
  name      = {Semiautomated Thematic Text Classification as a Basis for Corpus Linguistic Value-Added Services},
  abstract  = {The goal of the DFG-funded project “Semiautomated Thematic Text
               Classification as a Basis for Corpus Linguistic Value-Added Services”
               aims to develop a deep-learning-based system for topic classification
               using the Wikipedia category system and data from the Wikidata
               project in order to develop and test topic models for the automatic
               classification of texts, including those from the Leibniz Institute
               for the German Language (IDS) in Mannheim, on the basis of this
               classification system. The project is being carried out in collaboration
               with the IDS and the Saxon Academy of Sciences in Leipzig. The
               research focus is on state-of-the-art AI methods, in particular
               generative AI methods.},
  year      = {2024},
  keywords  = {current_project},
  funded_by = {DFG (531750631)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/531750631}
}

ENTAILab - Research Infrastructure and Innovation Lab. 2024 – . Funded by DFG (539634240).

Description

The DFG Infrastructure Priority Program New Data Spaces for the Social Sciences (InfPP) was established in order to meet the challenges of traditional survey research and to make use of newly available data sources. The ENTAILab as the central feature of the InfPP offers infrastructure services, excellent research opportunities and will promote the dissemination of results into existing and emerging research and data provision programs. There are clear upscaling and synergy effects of ENTAILab for the projects within the InfPP, leading to opportunities and successful realizations that - partly due to high costs for individual projects and a lack of infrastructural equipment - would otherwise not exist. By providing the infrastructure for a range of projects at the same time, e.g. the connection to existing data, samples and panels, server capacities, code and tools, expert counseling, and research overviews, we make use of synergies and create added value, while at the same time conducting cross-cutting research on these infrastructures. The measures combined in ENTAILab represent the central and unifying element for the research and development and transfer activities of the priority program. ENTAILab will unlock and sustainably use emerging opportunities of the new data spaces and to meet the requirements of the InfPP in general.

BibTeX

@project{spp-2431-entailab,
  name      = {ENTAILab - Research Infrastructure and Innovation Lab},
  abstract  = {The DFG Infrastructure Priority Program New Data Spaces for the
               Social Sciences (InfPP) was established in order to meet the challenges
               of traditional survey research and to make use of newly available
               data sources. The ENTAILab as the central feature of the InfPP
               offers infrastructure services, excellent research opportunities
               and will promote the dissemination of results into existing and
               emerging research and data provision programs. There are clear
               upscaling and synergy effects of ENTAILab for the projects within
               the InfPP, leading to opportunities and successful realizations
               that - partly due to high costs for individual projects and a
               lack of infrastructural equipment - would otherwise not exist.
               By providing the infrastructure for a range of projects at the
               same time, e.g. the connection to existing data, samples and panels,
               server capacities, code and tools, expert counseling, and research
               overviews, we make use of synergies and create added value, while
               at the same time conducting cross-cutting research on these infrastructures.
               The measures combined in ENTAILab represent the central and unifying
               element for the research and development and transfer activities
               of the priority program. ENTAILab will unlock and sustainably
               use emerging opportunities of the new data spaces and to meet
               the requirements of the InfPP in general.},
  year      = {2024},
  keywords  = {current_project},
  funded_by = {DFG (539634240)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/539634240},
  url       = {https://www.new-data-spaces.de/de-de/Start/Infrastructure-Priority-Programme/ENTAILab},
  logo      = {/wp-content/uploads/2024/01/logo-NewDataSpaces-long.png}
}

FACES: Feasibility, Acceptance, and Data Quality of New Multimodal Surveys. 2024 – . Funded by DFG (539621548).

Description

The FACES project aims to create a multimodal data space for survey research that can expand and replace face-to-face interviews in the future through the use of virtual reality (VR) and artificial intelligence (AI). This multi-interface system for online surveys is designed to offer a high degree of variability in terms of avatars, situational parameters, interfaces and AI technologies for the automatic processing of speech and behavioural data.

BibTeX

@project{spp-2431-faces,
  name      = {FACES: Feasibility, Acceptance, and Data Quality of New Multimodal Surveys},
  abstract  = {The FACES project aims to create a multimodal data space for survey
               research that can expand and replace face-to-face interviews in
               the future through the use of virtual reality (VR) and artificial
               intelligence (AI). This multi-interface system for online surveys
               is designed to offer a high degree of variability in terms of
               avatars, situational parameters, interfaces and AI technologies
               for the automatic processing of speech and behavioural data.},
  year      = {2024},
  keywords  = {current_project, new-data-spaces,faces},
  funded_by = {DFG (539621548)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/539621548},
  url       = {https://www.lifbi.de/en-us/Start/Research/Projects/FACES},
  logo      = {/wp-content/uploads/2025/07/new-data-spaces_Logo_kurz_v3-1.png}
}

Negation in Language and Beyond (NegLaB). 2024 – . Funded by DFG (SFB 1629).

Description

Negation is a fundamental and unique property of human language since it allows us to reason about what is not the case. It does not only express a clearly defined grammatical function, it also interacts with various aspects of grammar and cognition. The acquisition and processing of negation encompass linguistic as well as non-linguistic cognitive procedures. Hence, negation constitutes an ideal testing ground to differentiate cognitive mechanisms that are grammatical in nature from those that are shared with other cognitive domains, such as memory, attention, decision making and cognitive control. We intend to explore how the expression of negation is cross-linguistically associated with grammatical and non-linguistic cognitive operations and also whether the operations observed in negative utterances are part of negation itself or, rather, arises as an effect of the grammatical system and cognitive functions. While the semantics of negation is generally analyzed as a unique propositional operator, its morphosyntactic expression is much more varied and often involves more than one morphological exponent. Hence, there is a tension between a rich morphosyntax and a more straightforward semantics. The semantics of negation leads one to expect negation to be expressed by a single morpheme positioned at the beginning of the clause (Neg-Only Hypothesis). The rich and variable morphosyntax leads us to expect that negation requires a number of conditions in the semantics (Neg-Plus Hypothesis). We aim to solve this puzzle covering several empirical domains. More grammatical effects than semantics would lead us to expect are visible in the interaction between negative utterances and the cognitive processing and semantic evaluation of alternative propositions. This is reflected in acquisition, since children produce negative utterances relatively early, but all the aspects of negation take a rather long time to be acquired. Downstream effects of this can be seen in adult processing as the comprehension of negative sentences is costlier than for positive sentences. This is supposedly due to the inhibition of the corresponding positive sentence that is necessary for the interpretation of negative statements. Our exploration into the way negation and other grammatical categories or non-linguistic cognitive functions interact will lead us to identify how negation functions in natural language and how it favors or hinders other (extra-)grammatical components or processes. Why do some of them need to occur together with negation (e.g., negative polarity items) and why are others incompatible with it (as some types of imperatives)? Our general aim is to develop a theoretical perspective on the way negation manifests itself in natural language, how it is acquired and processed, and why it varies so much across languages. Thereby, we will gain a better understanding of the connections between linguistic competence and general cognition.

BibTeX

@project{sfb-neglab,
  name      = {Negation in Language and Beyond (NegLaB)},
  abstract  = {Negation is a fundamental and unique property of human language
               since it allows us to reason about what is not the case. It does
               not only express a clearly defined grammatical function, it also
               interacts with various aspects of grammar and cognition. The acquisition
               and processing of negation encompass linguistic as well as non-linguistic
               cognitive procedures. Hence, negation constitutes an ideal testing
               ground to differentiate cognitive mechanisms that are grammatical
               in nature from those that are shared with other cognitive domains,
               such as memory, attention, decision making and cognitive control.
               We intend to explore how the expression of negation is cross-linguistically
               associated with grammatical and non-linguistic cognitive operations
               and also whether the operations observed in negative utterances
               are part of negation itself or, rather, arises as an effect of
               the grammatical system and cognitive functions. While the semantics
               of negation is generally analyzed as a unique propositional operator,
               its morphosyntactic expression is much more varied and often involves
               more than one morphological exponent. Hence, there is a tension
               between a rich morphosyntax and a more straightforward semantics.
               The semantics of negation leads one to expect negation to be expressed
               by a single morpheme positioned at the beginning of the clause
               (Neg-Only Hypothesis). The rich and variable morphosyntax leads
               us to expect that negation requires a number of conditions in
               the semantics (Neg-Plus Hypothesis). We aim to solve this puzzle
               covering several empirical domains. More grammatical effects than
               semantics would lead us to expect are visible in the interaction
               between negative utterances and the cognitive processing and semantic
               evaluation of alternative propositions. This is reflected in acquisition,
               since children produce negative utterances relatively early, but
               all the aspects of negation take a rather long time to be acquired.
               Downstream effects of this can be seen in adult processing as
               the comprehension of negative sentences is costlier than for positive
               sentences. This is supposedly due to the inhibition of the corresponding
               positive sentence that is necessary for the interpretation of
               negative statements. Our exploration into the way negation and
               other grammatical categories or non-linguistic cognitive functions
               interact will lead us to identify how negation functions in natural
               language and how it favors or hinders other (extra-)grammatical
               components or processes. Why do some of them need to occur together
               with negation (e.g., negative polarity items) and why are others
               incompatible with it (as some types of imperatives)? Our general
               aim is to develop a theoretical perspective on the way negation
               manifests itself in natural language, how it is acquired and processed,
               and why it varies so much across languages. Thereby, we will gain
               a better understanding of the connections between linguistic competence
               and general cognition.},
  year      = {2024},
  keywords  = {current_project, neglab},
  funded_by = {DFG (SFB 1629)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/509468465?language=en},
  url       = {https://www.neglab.de/},
  logo      = {wp-content/uploads/2025/12/logo-NegLaB.png}
}

C08: Integration von Prozess- und Textdaten der Studierenden zur Messung der Interdependenz von domänenspezifischem und generischem kritischen Online Reasoning (DOM-COR und GEN-COR). 2023 – . Funded by DFG (462702138).

Description

Der Standardansatz zur Bewertung von Lernergebnissen sieht Bewertung als einen Prozess an, bei dem aus den notwendigerweise begrenzten Nachweisen zu den Aktivitäten von Lernenden Aussagen über ihr Wissen und ihre Fähigkeiten gemacht werden können. Die Analyse von Prozess- und Textdaten, die beim Lernen als zusammenhängende Verhaltenssequenzen generiert werden, gilt demgegenüber als realitätsnähere Alternative. Diese multimodalen Daten haben das Potenzial, ein vollständigeres Bild von kritischen Online-Reasoning-Prozessen (COR) von Studierenden wiederzugeben und können zugleich mit datenwissenschaftlichen Methoden analysiert werden. Dabei stellt sich die Frage, inwieweit datenwissenschaftliche Methoden mit Standardbewertungsansätzen vergleichbar sind, um COR-Prozesse näher zu untersuchen. C08 verfolgt drei Hauptziele: (1) Bereitstellung einer authentischen digitalen Bewertungs- und Lernumgebung in der AZURE-Cloud, in der sich Studierende so verhalten können, wie sie es auf ihren Computern tun; (2) Integration der Aktivitäten von Studierenden anhand von multimodalen Text- und Antwortprozess-Daten in einer Forschungsinfrastruktur namens Multimodal Learning Data Science System (MLDS) – dies ermöglicht die Analyse von Prozessdaten (z.B. Scrollen von Webseiten, Browsing-Historie, Zeitaufwand) und Textdaten (z.B. genutzte Webseiten, geschriebener Text) von Studierenden bei Bearbeitung von generischen (GEN) und domänenspezifischen (DOM) COR-Aufgaben; (3) Analyse der multimodalen Daten, um latente Beziehungen zwischen den von Studierenden bearbeiteten oder geschriebenen Textdaten und ihren Verhaltensdaten bei der Lösung von COR-Aufgaben aufzudecken. Die digitale Bewertungs- und Lernumgebung von C08 wird die Erfassung von Lernverhalten bei COR-Aufgaben der A-Projekte in realen Internetszenarien und vergleichbaren Simulationen erlauben. C08 wird Text- und Prozessdaten in seiner MLDS-Forschungsinfrastruktur erfassen, ihre Rolle und Interaktion bei der Bearbeitung von COR-Aufgaben untersuchen und klären, inwiefern sie mit dem Fachwissen und persönlichen Eigenschaften der Studierenden zusammenhängen. C08 prüft die Bedeutung von datenwissenschaftlichen Methoden im Bildungsbereich. Es identifiziert den Mehrwert und die Grenzen datenwissenschaftlicher Methoden für die Verarbeitung multimodaler Text- und Prozessdaten, die in GEN- und DOM-COR-Bewertungen generiert werden, um neue Erkenntnisse und Methoden für die erziehungswissenschaftliche Forschung beizutragen. C08 wird mit allen Projekten in der Forschungsgruppe (FOR) zusammenarbeiten, um einen neuartigen Big-Data-Datensatz für die GEN- und DOM-COR-Studien zu erstellen und auszuwerten, und zwar mittels einer für diese Analysen zu entwickelnden neuartigen Infrastruktur. C08 bringt datenwissenschaftliche Expertise in die FOR ein und erfordert die Expertise der Bildungswissenschaftler*innen, um seine Methoden anzupassen und zu kalibrieren.

BibTeX

@project{core-c08-integration,
  name      = {C08: Integration von Prozess- und Textdaten der Studierenden zur Messung der Interdependenz von domänenspezifischem und generischem kritischen Online Reasoning (DOM-COR und GEN-COR)},
  abstract  = {Der Standardansatz zur Bewertung von Lernergebnissen sieht Bewertung
               als einen Prozess an, bei dem aus den notwendigerweise begrenzten
               Nachweisen zu den Aktivitäten von Lernenden Aussagen über ihr
               Wissen und ihre Fähigkeiten gemacht werden können. Die Analyse
               von Prozess- und Textdaten, die beim Lernen als zusammenhängende
               Verhaltenssequenzen generiert werden, gilt demgegenüber als realitätsnähere
               Alternative. Diese multimodalen Daten haben das Potenzial, ein
               vollständigeres Bild von kritischen Online-Reasoning-Prozessen
               (COR) von Studierenden wiederzugeben und können zugleich mit datenwissenschaftlichen
               Methoden analysiert werden. Dabei stellt sich die Frage, inwieweit
               datenwissenschaftliche Methoden mit Standardbewertungsansätzen
               vergleichbar sind, um COR-Prozesse näher zu untersuchen. C08 verfolgt
               drei Hauptziele: (1) Bereitstellung einer authentischen digitalen
               Bewertungs- und Lernumgebung in der AZURE-Cloud, in der sich Studierende
               so verhalten können, wie sie es auf ihren Computern tun; (2) Integration
               der Aktivitäten von Studierenden anhand von multimodalen Text-
               und Antwortprozess-Daten in einer Forschungsinfrastruktur namens
               Multimodal Learning Data Science System (MLDS) – dies ermöglicht
               die Analyse von Prozessdaten (z.B. Scrollen von Webseiten, Browsing-Historie,
               Zeitaufwand) und Textdaten (z.B. genutzte Webseiten, geschriebener
               Text) von Studierenden bei Bearbeitung von generischen (GEN) und
               domänenspezifischen (DOM) COR-Aufgaben; (3) Analyse der multimodalen
               Daten, um latente Beziehungen zwischen den von Studierenden bearbeiteten
               oder geschriebenen Textdaten und ihren Verhaltensdaten bei der
               Lösung von COR-Aufgaben aufzudecken. Die digitale Bewertungs-
               und Lernumgebung von C08 wird die Erfassung von Lernverhalten
               bei COR-Aufgaben der A-Projekte in realen Internetszenarien und
               vergleichbaren Simulationen erlauben. C08 wird Text- und Prozessdaten
               in seiner MLDS-Forschungsinfrastruktur erfassen, ihre Rolle und
               Interaktion bei der Bearbeitung von COR-Aufgaben untersuchen und
               klären, inwiefern sie mit dem Fachwissen und persönlichen Eigenschaften
               der Studierenden zusammenhängen. C08 prüft die Bedeutung von datenwissenschaftlichen
               Methoden im Bildungsbereich. Es identifiziert den Mehrwert und
               die Grenzen datenwissenschaftlicher Methoden für die Verarbeitung
               multimodaler Text- und Prozessdaten, die in GEN- und DOM-COR-Bewertungen
               generiert werden, um neue Erkenntnisse und Methoden für die erziehungswissenschaftliche
               Forschung beizutragen. C08 wird mit allen Projekten in der Forschungsgruppe
               (FOR) zusammenarbeiten, um einen neuartigen Big-Data-Datensatz
               für die GEN- und DOM-COR-Studien zu erstellen und auszuwerten,
               und zwar mittels einer für diese Analysen zu entwickelnden neuartigen
               Infrastruktur. C08 bringt datenwissenschaftliche Expertise in
               die FOR ein und erfordert die Expertise der Bildungswissenschaftler*innen,
               um seine Methoden anzupassen und zu kalibrieren.},
  year      = {2023},
  keywords  = {current_project},
  funded_by = {DFG (462702138)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/520631675},
  url       = {https://de.core.uni-mainz.de/c08/},
  logo      = {/wp-content/uploads/2024/11/CORE_Logo_neu.png}
}

B05: Modellierung der Informationslandschaft (IL) zur Bewertung und Analyse von domänenspezifischem und generischem Critical Online Reasoning (DOM-COR und GEN-COR). 2023 – . Funded by DFG (462702138).

Description

Die Rolle linguistischer Indikatoren für die Lesbarkeit von Texten oder die Glaubwürdigkeit von Webquellen wurde bereits intensiv erforscht. Die Antragssteller konnten anhand eines Korpus kurzer Offline-Texte zudem zeigen, dass linguistische Merkmale Vorhersagen über die Ergebnisse von Studierenden in domänenspezifischen Wissenstests erlauben. Inwieweit solche Zusammenhänge für Testaufgaben in komplexen offenen Informationslandschaften generalisierbar sind, ist ein wichtiges Desiderat. Daher fokussiert B05 auf die Modellierung linguistischer Merkmale der online Informationslandschaft (IL), in der sich Studierende zur Lösung von Aufgaben zum Critical Online-Reasoning (COR) bewegen. B05 zielt auf die Entwicklung eines theoretisch fundierten linguistischen Merkmalsmodells ab, das auf den Texten basiert, die Studierende als Komponenten der online IL bei der Lösung von COR-Aufgaben nutzen bzw. produzieren. Das Modell soll Vorhersagen über COR-Prozesse und COR-Leistungen erlauben. Dabei fokussiert B05 auf drei Forschungsfragen: (i) Inwieweit unterscheiden sich die linguistischen Merkmale beim generischen vs. domänenspezifischen COR (GEN-COR/DOM-COR) sowie innerhalb der Domänen Wirtschaft, Medizin, Soziologie und Physik? (ii) Wie unterscheiden sich diese Merkmale bezüglich der drei kognitiven COR-Facetten: Online-Informationsbeschaffung, kritische Informationsbewertung sowie evidenzbasiertes Argumentieren und Synthetisieren von Informationen. (iii) Auf welchen Ebenen wirken diese Merkmale: auf der Ebene einzelner Texte, multipler Texte, Domänen, Genres, der IL oder der zugrundeliegenden Sprache (z.B. Deutsch)? B05 geht von der qualitativen Auswahl linguistischer Merkmale zum Evidenzstatus, zur Informationsquelle und zur Textorganisation aus. Der quantitative Teil operationalisiert diese Merkmale mittels eines erweiterten maschinellen Lernmodells und testet ihre Vorhersagekraft und Spezifität bezüglich der drei Forschungsfragen. Die Verknüpfung von qualitativen und quantitativen Analysen erfolgt über einen computationellen hermeneutischen Zirkel, in dem der quantitative Teil statistische Auswertungen und Vorhersagen generiert, deren Interpretierbarkeit auf qualitativen linguistischen Analysen fußt. B05 stellt für die Forschungsgruppe (FOR) maschinelle Lernmodelle bereit, die die linguistischen Merkmale multipler Texte als Teil der IL automatisch analysieren und auf einer linguistischen Theorie zu COR basieren, die die Ebene feinkörniger linguistischer Informationseinheiten adressiert. Die A-Projekte stellen Texte und Informationen über die COR-Testergebnisse von Studierenden zur Verfügung und erhalten linguistische Analysen von B05. Als die detailliertesten Informationseinheiten in der FOR sind linguistische Merkmale für die anderen B-Projekte bezüglich Medien- und Inhaltseigenschaften (B04) und narrative Strukturen (B06) relevant. Das Multimodal Learning Data Science System von C08 ist zentral für die Integration aller Text- und Leistungsdaten in B05.

BibTeX

@project{core-b05-modellierung,
  name      = {B05: Modellierung der Informationslandschaft (IL) zur Bewertung und Analyse von domänenspezifischem und generischem Critical Online Reasoning (DOM-COR und GEN-COR)},
  abstract  = {Die Rolle linguistischer Indikatoren für die Lesbarkeit von Texten
               oder die Glaubwürdigkeit von Webquellen wurde bereits intensiv
               erforscht. Die Antragssteller konnten anhand eines Korpus kurzer
               Offline-Texte zudem zeigen, dass linguistische Merkmale Vorhersagen
               über die Ergebnisse von Studierenden in domänenspezifischen Wissenstests
               erlauben. Inwieweit solche Zusammenhänge für Testaufgaben in komplexen
               offenen Informationslandschaften generalisierbar sind, ist ein
               wichtiges Desiderat. Daher fokussiert B05 auf die Modellierung
               linguistischer Merkmale der online Informationslandschaft (IL),
               in der sich Studierende zur Lösung von Aufgaben zum Critical Online-Reasoning
               (COR) bewegen. B05 zielt auf die Entwicklung eines theoretisch
               fundierten linguistischen Merkmalsmodells ab, das auf den Texten
               basiert, die Studierende als Komponenten der online IL bei der
               Lösung von COR-Aufgaben nutzen bzw. produzieren. Das Modell soll
               Vorhersagen über COR-Prozesse und COR-Leistungen erlauben. Dabei
               fokussiert B05 auf drei Forschungsfragen: (i) Inwieweit unterscheiden
               sich die linguistischen Merkmale beim generischen vs. domänenspezifischen
               COR (GEN-COR/DOM-COR) sowie innerhalb der Domänen Wirtschaft,
               Medizin, Soziologie und Physik? (ii) Wie unterscheiden sich diese
               Merkmale bezüglich der drei kognitiven COR-Facetten: Online-Informationsbeschaffung,
               kritische Informationsbewertung sowie evidenzbasiertes Argumentieren
               und Synthetisieren von Informationen. (iii) Auf welchen Ebenen
               wirken diese Merkmale: auf der Ebene einzelner Texte, multipler
               Texte, Domänen, Genres, der IL oder der zugrundeliegenden Sprache
               (z.B. Deutsch)? B05 geht von der qualitativen Auswahl linguistischer
               Merkmale zum Evidenzstatus, zur Informationsquelle und zur Textorganisation
               aus. Der quantitative Teil operationalisiert diese Merkmale mittels
               eines erweiterten maschinellen Lernmodells und testet ihre Vorhersagekraft
               und Spezifität bezüglich der drei Forschungsfragen. Die Verknüpfung
               von qualitativen und quantitativen Analysen erfolgt über einen
               computationellen hermeneutischen Zirkel, in dem der quantitative
               Teil statistische Auswertungen und Vorhersagen generiert, deren
               Interpretierbarkeit auf qualitativen linguistischen Analysen fußt.
               B05 stellt für die Forschungsgruppe (FOR) maschinelle Lernmodelle
               bereit, die die linguistischen Merkmale multipler Texte als Teil
               der IL automatisch analysieren und auf einer linguistischen Theorie
               zu COR basieren, die die Ebene feinkörniger linguistischer Informationseinheiten
               adressiert. Die A-Projekte stellen Texte und Informationen über
               die COR-Testergebnisse von Studierenden zur Verfügung und erhalten
               linguistische Analysen von B05. Als die detailliertesten Informationseinheiten
               in der FOR sind linguistische Merkmale für die anderen B-Projekte
               bezüglich Medien- und Inhaltseigenschaften (B04) und narrative
               Strukturen (B06) relevant. Das Multimodal Learning Data Science
               System von C08 ist zentral für die Integration aller Text- und
               Leistungsdaten in B05.},
  year      = {2023},
  keywords  = {current_project},
  funded_by = {DFG (462702138)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/520621868},
  url       = {https://de.core.uni-mainz.de/b05/},
  logo      = {/wp-content/uploads/2024/11/CORE_Logo_neu.png}
}

New Data Spaces for the Social Sciences. 2023 – . Funded by DFG (SPP 2431).

Description

In order to more precisely research the major societal challenges of the coming decades, including digitization, climate change, and war- and pandemic-related societal changes, and to be able to identify the need for political action on this basis, the social sciences need innovative research data and methods.

BibTeX

@project{spp-2431-new-data-spaces,
  name      = {New Data Spaces for the Social Sciences},
  abstract  = {In order to more precisely research the major societal challenges
               of the coming decades, including digitization, climate change,
               and war- and pandemic-related societal changes, and to be able
               to identify the need for political action on this basis, the social
               sciences need innovative research data and methods.},
  year      = {2023},
  keywords  = {current_project},
  funded_by = {DFG (SPP 2431)},
  funded_by_url = {https://www.dfg.de/de/aktuelles/neuigkeiten-themen/info-wissenschaft/2023/info-wissenschaft-23-20},
  url       = {https://www.new-data-spaces.de/en-us/},
  logo      = {/wp-content/uploads/2025/07/new-data-spaces_Logo_kurz_v3-1.png}
}

Virtual Reality Sustained Multimodal Distributional Semantics for Gestures in Dialogue (GeMDiS). 2021 – . Funded by DFG (SPP 2392).

Description

Both corpus-based linguistics and contemporary computational linguistics rely on the use of often large, linguistic resources. The expansion of the linguistic subject area to include visual means of communication such as gesticulation has not yet been backed up with corresponding corpora. This means that “multimodal linguistics” and dialogue theory cannot participate in established distributional methods of corpus linguistics and computational semantics. The main reason for this is the difficulty of collecting multimodal data in an appropriate way and at an appropriate scale. Using the latest VR-based recording methods, the GeMDiS project aims to close this data gap and to investigate visual communication by means of machine-based methods and innovative use of neuronal and active learning for small data using the systematic reference dimensions of associativity and contiguity of the features of visual and non-visual communicative signs.

BibTeX

@project{vicom-gemdis,
  name      = {Virtual Reality Sustained Multimodal Distributional Semantics for Gestures in Dialogue (GeMDiS)},
  abstract  = {Both corpus-based linguistics and contemporary computational linguistics
               rely on the use of often large, linguistic resources. The expansion
               of the linguistic subject area to include visual means of communication
               such as gesticulation has not yet been backed up with corresponding
               corpora. This means that “multimodal linguistics” and dialogue
               theory cannot participate in established distributional methods
               of corpus linguistics and computational semantics. The main reason
               for this is the difficulty of collecting multimodal data in an
               appropriate way and at an appropriate scale. Using the latest
               VR-based recording methods, the GeMDiS project aims to close this
               data gap and to investigate visual communication by means of machine-based
               methods and innovative use of neuronal and active learning for
               small data using the systematic reference dimensions of associativity
               and contiguity of the features of visual and non-visual communicative
               signs.},
  year      = {2021},
  keywords  = {current_project},
  funded_by = {DFG (SPP 2392)},
  funded_by_url = {https://www.dfg.de/en/research_funding/announcements_proposals/2021/info_wissenschaft_21_45/index.html},
  url       = {https://vicom.info/projects/virtual-reality-sustained-multimodal-distributional-semantics-for-gestures-in-dialogue-gemdis/},
  logo      = {/wp-content/uploads/2024/01/ViComGeMDis.png}
}

Specialised Information Service Biodiversity Research (BIOfid). 2017 – . Funded by DFG (FID 326061700).

Description

The specialised information service BIOfid (www.biofid.de) is oriented towards the special needs of scientists researching biodiversity topics at research institutions and in natural history collections. Since 2017, BIOfid has been building an infrastructure that contributes to the provision and mobilisation of research-relevant data in a variety of ways in the context of current developments in biodiversity research.

BibTeX

@project{biofid,
  name      = {Specialised Information Service Biodiversity Research (BIOfid)},
  abstract  = {The specialised information service BIOfid (www.biofid.de) is
               oriented towards the special needs of scientists researching biodiversity
               topics at research institutions and in natural history collections.
               Since 2017, BIOfid has been building an infrastructure that contributes
               to the provision and mobilisation of research-relevant data in
               a variety of ways in the context of current developments in biodiversity
               research.},
  year      = {2017},
  funded_by = {DFG (FID 326061700)},
  funded_by_url = {https://gepris.dfg.de/gepris/projekt/326061700},
  url       = {https://www.biofid.de/en/},
  repository = {https://github.com/FID-Biodiversity},
  logo      = {/wp-content/uploads/2024/01/logo-BIOfid.png},
  keywords  = {current_project,biofid,biodiversity}
}

Past Projects

LOEWE-Schwerpunkt "Minderheitenstudien: Sprache und Identität". 2020 – 2024. Funded by LOEWE.

Description

Der LOEWE-Schwerpunkt "Minderheiten: Sprache und Identität" erarbeitet eine interdisziplinäre Untersuchung der Problematik von Identitätsbildung bei Minderheiten. Dazu untersuchen wir drei Arten von Relationen: die Relation zwischen Minderheiten "im eigenen Land" und Minderheiten "im Ausland"; die Relation zwischen Selbstwahrnehmung und Fremdwahrnehmung von Minderheiten (sowohl "im eigenen Land" als auch im "Ausland"); und die wechselseitige Relation der identitätsbedingenden Vorgaben Sprache, Religion, Kultur und Ethnos, in Selbstsicht und Fremdsicht "im eigenen Land" und "im Ausland".

BibTeX

@project{loewe-minderheitenstudien,
  name      = {LOEWE-Schwerpunkt "Minderheitenstudien: Sprache und Identität"},
  abstract  = {Der LOEWE-Schwerpunkt "Minderheiten: Sprache und Identität" erarbeitet
               eine interdisziplinäre Untersuchung der Problematik von Identitätsbildung
               bei Minderheiten. Dazu untersuchen wir drei Arten von Relationen:
               die Relation zwischen Minderheiten "im eigenen Land" und Minderheiten
               "im Ausland"; die Relation zwischen Selbstwahrnehmung und Fremdwahrnehmung
               von Minderheiten (sowohl "im eigenen Land" als auch im "Ausland");
               und die wechselseitige Relation der identitätsbedingenden Vorgaben
               Sprache, Religion, Kultur und Ethnos, in Selbstsicht und Fremdsicht
               "im eigenen Land" und "im Ausland".},
  year      = {2020},
  until     = {2024},
  keywords  = {completed_project},
  funded_by = {LOEWE},
  funded_by_url = {https://proloewe.de/de/loewe-vorhaben/nach-themen/minderheitenstudien/},
  url       = {https://sprache-identitaet.uni-frankfurt.de/},
  logo      = {/wp-content/uploads/2024/02/logo-loewe-minderheitenstudien-blau.png}
}

Berufspraktische Bildungsprozesse im Recht- und Lehramtsreferendariat sowie der Medizin unter Nutzung digitaler Medien (BRIDGE). 2020 – 2023. Funded by BMBF (01JD1906B).

Description

Die Nutzung von Onlinemedien steigt in allen Bildungsbereichen zunehmend an. Auch für den Berufseinstieg nutzen Lernende immer häufiger online verfügbare Medien. Ob sich das Nutzungsverhalten je nach Beruf unterscheidet und ob es berufsspezifische Unterschiede gibt, ist bislang nicht erforscht. Hier setzt das Forschungsprojekt der Johannes Gutenberg-Universität Mainz und der Johann Wolfgang Goethe-Universität Frankfurt an. Die Wissenschaftlerinnen und Wissenschaftler untersuchen, wie Berufseinsteigende der Medizin im Praktischen Jahr, Lehramts- sowie Rechtsreferendarinnen und -referendare online verfügbare Medien nutzen. Hierbei vergleicht das interdisziplinäre Team die generelle mit der berufsspezifischen Nutzung von Onlinemedien anhand einer repräsentativen Stichprobe im Längsschnitt. Mithilfe von innovativen Ansätzen aus der Computerlinguistik und aus dem Bereich der Learning Analytics werden Online-Trainings entwickelt und die Lernprozesse der Probanden untersucht. Um die Daten multiperspektivisch zu interpretieren, wird eng mit der Praxis zusammengearbeitet. Das Team der Universität Mainz koordiniert das Verbundprojekt und bringt die wirtschaftspädagogische und rechtswissenschaftliche Perspektive sowie Expertise zur Kompetenzentwicklung ein. Die repräsentative und methodenintegrative Studie liefert wissenschaftliche Erkenntnisse zum Einfluss und zur digitalen Förderbarkeit der berufsbezogenen Mediennutzung in der Berufspraxis. Außerdem sind Ergebnisse zu erwarten, die zur Gestaltung der Verbindung von formalen und non-formalen Lerngelegenheiten genutzt werden können. Praxispartner, wie zum Beispiel Ausbildnerinnen und Ausbildner, können die entwickelten adaptiven Trainingskonzepte praktisch anwenden und nutzen.

BibTeX

@project{bridge,
  name      = {Berufspraktische Bildungsprozesse im Recht- und Lehramtsreferendariat sowie der Medizin unter Nutzung digitaler Medien (BRIDGE)},
  abstract  = {Die Nutzung von Onlinemedien steigt in allen Bildungsbereichen
               zunehmend an. Auch für den Berufseinstieg nutzen Lernende immer
               häufiger online verfügbare Medien. Ob sich das Nutzungsverhalten
               je nach Beruf unterscheidet und ob es berufsspezifische Unterschiede
               gibt, ist bislang nicht erforscht. Hier setzt das Forschungsprojekt
               der Johannes Gutenberg-Universität Mainz und der Johann Wolfgang
               Goethe-Universität Frankfurt an. Die Wissenschaftlerinnen und
               Wissenschaftler untersuchen, wie Berufseinsteigende der Medizin
               im Praktischen Jahr, Lehramts- sowie Rechtsreferendarinnen und
               -referendare online verfügbare Medien nutzen. Hierbei vergleicht
               das interdisziplinäre Team die generelle mit der berufsspezifischen
               Nutzung von Onlinemedien anhand einer repräsentativen Stichprobe
               im Längsschnitt. Mithilfe von innovativen Ansätzen aus der Computerlinguistik
               und aus dem Bereich der Learning Analytics werden Online-Trainings
               entwickelt und die Lernprozesse der Probanden untersucht. Um die
               Daten multiperspektivisch zu interpretieren, wird eng mit der
               Praxis zusammengearbeitet. Das Team der Universität Mainz koordiniert
               das Verbundprojekt und bringt die wirtschaftspädagogische und
               rechtswissenschaftliche Perspektive sowie Expertise zur Kompetenzentwicklung
               ein. Die repräsentative und methodenintegrative Studie liefert
               wissenschaftliche Erkenntnisse zum Einfluss und zur digitalen
               Förderbarkeit der berufsbezogenen Mediennutzung in der Berufspraxis.
               Außerdem sind Ergebnisse zu erwarten, die zur Gestaltung der Verbindung
               von formalen und non-formalen Lerngelegenheiten genutzt werden
               können. Praxispartner, wie zum Beispiel Ausbildnerinnen und Ausbildner,
               können die entwickelten adaptiven Trainingskonzepte praktisch
               anwenden und nutzen.},
  year      = {2020},
  until     = {2023},
  keywords  = {completed_project},
  funded_by = {BMBF (01JD1906B)},
  funded_by_url = {https://www.empirische-bildungsforschung-bmbf.de/de/Themenfinder-1720.html/projekt/01JD1906A},
  url       = {https://bridge.uni-mainz.de/},
  logo      = {/wp-content/uploads/2024/01/logo-BRIDGE.png}
}

Current Projects

BIOfid

CORE

NegLaB

New Data Spaces – SPP 2431

SATEK

ViCom

Past Projects