TTLab – Text Technology Lab

The TTLab (Text Technology Lab), headed by Prof. Alexander Mehler, is part of the Department of Computer Science and Mathematics (Fachbereich Informatik und Mathematik) at the Goethe Universität in Frankfurt. It investigates formal, algorithmic models to deepen our understanding of information processing in the humanities. We examine diachronic, time-dependent as well as synchronic aspects of processing linguistic and non-linguistic, multimodal signs. The Lab works across several disciplines to bridge between computer science on the one hand and corpus-based research in the humanities on the other. To this end, we develop information models and algorithms for the analysis of texts, images, and other objects relevant to research in the humanities.

News

  • Open Full-Time Position (E13) for ENTAILab in SPP 2431: New Data Spaces for the Social Sciences

    by

    At the Institute for Computer Science (Prof. Dr. Alexander Mehler), Department of Computer Science and Mathematics at Goethe University Frankfurt, a position for a

    Research Assistants (m/f/d)
    (E 13 TV-G-U, 100% full-time)

    is available at the earliest possible date for a period of three years within the ENTAILab project – research infrastructure and innovation lab. The salary scale is based on the job characteristics of the collective agreement applicable to Goethe University (TV-G-U).

    The project is part of the priority program (SPP) New Data Spaces for the Social Sciences, which is funded by the German Research Foundation (DFG) (see https://www.new-data-spaces.de). The aim of the project is to establish a research-oriented infrastructure for novel data in survey research. To this end, a method-oriented innovation laboratory for novel methods in survey research is to be set up, which will develop and test methods of machine learning and artificial intelligence in cooperation with the projects of the SPP. The subject of the methods to be developed is multimodal data and thus not primarily or exclusively linguistic research data.

    You are expected to collaborate in the project and actively participate in the workshops and events of the SPP. We are looking for a highly qualified individual with a keen interest in working in the field of cutting-edge research infrastructures and in the team-oriented development and application of innovative, research-oriented methods in the field of survey research and the social sciences. With the SPP New Data Spaces for the Social Sciences and the Text-Technology Lab, in which the position will be embedded, we offer two research-strong, internationally oriented working environments in the areas of computational humanities, multimodal computing, machine learning and artificial intelligence. This also includes financial resources for conference participation and individual career development.

    Requirements

    • Completed academic university degree (e.g. Master’s) in a relevant subject with a focus on information science
    • Very good knowledge of English (C1)
    • Proven experience in the field of databases and machine learning or artificial intelligence methods
    • Extensive programming knowledge in Java, Python or similar
    • Knowledge of virtualization technologies such as Docker, Kubernetes or similar
    • An interest in social science issues is desirable.

    Please send your application with the usual documents (cover letter, CV, copies of certificates) electronically in a combined PDF document by 08.10.2024 to Prof. Dr. Alexander Mehler: mehler@em.uni-frankfurt.de.

  • Two new papers at SemDial 2024 — TrentoLogue

    by

    The Semantics and Pragmatics of Dialogue, September 11th – 12th, 2024

    On gesture semantics:

    Andy Lücking, Alexander Mehler and Alexander Henlein. 2024. The Linguistic Interpretation of Non-emblematic Gestures Must be agreed in Dialogue: Combining Perceptual Classifiers and Grounding/Clarification Mechanisms. Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue.
    BibTeX
    @inproceedings{Luecking:Mehler:Henlein:2024-classifier,
      title     = {The Linguistic Interpretation of Non-emblematic Gestures Must
                   be agreed in Dialogue: Combining Perceptual Classifiers and Grounding/Clarification
                   Mechanisms},
      author    = {Lücking, Andy and Mehler, Alexander and Henlein, Alexander},
      year      = {2024},
      booktitle = {Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue},
      series    = {SemDial'24 -- TrentoLogue},
      location  = {Università di Trento, Palazzo Piomarta, Rovereto}
    }

    On brain-based semantics:

    Jonathan Ginzburg, Chris Eliasmith and Andy Lücking. 2024. Swann's name: Towards a Dialogical Brain Semantics. Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue.
    BibTeX
    @inproceedings{Ginzburg:Eliasmith:Luecking:2024-swann,
      title     = {Swann's name: {Towards} a Dialogical Brain Semantics},
      author    = {Ginzburg, Jonathan and Eliasmith, Chris and Lücking, Andy},
      year      = {2024},
      booktitle = {Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue},
      series    = {SemDial'24 -- TrentoLogue},
      location  = {Università di Trento, Palazzo Piomarta, Rovereto}
    }
  • New Publication Accepted for the 2nd Workshop on Legal Information Retrieval meets AI (LIRAI24)

    by

    Our paper, “Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles,” has been accepted to the 2nd Workshop on Legal Information Retrieval Meets AI. In this work, we present an approach that leverages embedding spaces, bootstrap aggregation, and SVR ensembles to retrieve legal passages efficiently, demonstrating improved recall compared to baseline methods (0.849 > 0.803 | 0.829):

    Kevin Bönisch and Alexander Mehler. 2024. Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles. Proceedings of the 2nd Legal Information Retrieval meets Artificial Intelligence Workshop LIRAI 2024. accepted.
    BibTeX
    @inproceedings{Boenisch:Mehler:2024,
      title     = {Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval
                   via Bagging and SVR Ensembles},
      author    = {B\"{o}nisch, Kevin and Mehler, Alexander},
      year      = {2024},
      booktitle = {Proceedings of the 2nd Legal Information Retrieval meets Artificial
                   Intelligence Workshop LIRAI 2024},
      location  = {Poznan, Poland},
      publisher = {CEUR-WS.org},
      address   = {Aachen, Germany},
      series    = {CEUR Workshop Proceedings},
      note      = {accepted},
      abstract  = {We introduce a retrieval approach leveraging Support Vector Regression
                   (SVR) ensembles, bootstrap aggregation (bagging), and embedding
                   spaces on the German Dataset for Legal Information Retrieval (GerDaLIR).
                   By conceptualizing the retrieval task in terms of multiple binary
                   needle-in-a-haystack subtasks, we show improved recall over the
                   baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting
                   promising initial results, without training or fine-tuning any
                   deep learning models. Our approach holds potential for further
                   enhancement, particularly through refining the encoding models
                   and optimizing hyperparameters.},
      keywords  = {legal information retrieval, support vector regression, word embeddings, bagging ensemble}
    }

Sign up to our mailing list to receive news updates.