The TTLab (Text Technology Lab), headed by Prof. Alexander Mehler, is part of the Department of Computer Science and Mathematics (Fachbereich Informatik und Mathematik) at the Goethe Universität in Frankfurt. It investigates formal, algorithmic models to deepen our understanding of information processing in the humanities. We examine diachronic, time-dependent as well as synchronic aspects of processing linguistic and non-linguistic, multimodal signs. The Lab works across several disciplines to bridge between computer science on the one hand and corpus-based research in the humanities on the other. To this end, we develop information models and algorithms for the analysis of texts, images, and other objects relevant to research in the humanities.
News
-
Two new papers at SemDial 2024 — TrentoLogue
by
The Semantics and Pragmatics of Dialogue, September 11th – 12th, 2024
On gesture semantics:
2024. The Linguistic Interpretation of Non-emblematic Gestures Must be agreed in Dialogue: Combining Perceptual Classifiers and Grounding/Clarification Mechanisms. Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue.BibTeX
@inproceedings{Luecking:Mehler:Henlein:2024-classifier, title = {The Linguistic Interpretation of Non-emblematic Gestures Must be agreed in Dialogue: Combining Perceptual Classifiers and Grounding/Clarification Mechanisms}, author = {Lücking, Andy and Mehler, Alexander and Henlein, Alexander}, year = {2024}, booktitle = {Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue}, series = {SemDial'24 -- TrentoLogue}, location = {Università di Trento, Palazzo Piomarta, Rovereto} }
On brain-based semantics:
2024. Swann's name: Towards a Dialogical Brain Semantics. Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue.BibTeX
@inproceedings{Ginzburg:Eliasmith:Luecking:2024-swann, title = {Swann's name: {Towards} a Dialogical Brain Semantics}, author = {Ginzburg, Jonathan and Eliasmith, Chris and Lücking, Andy}, year = {2024}, booktitle = {Proceedings of the 28th Workshop on The Semantics and Pragmatics of Dialogue}, series = {SemDial'24 -- TrentoLogue}, location = {Università di Trento, Palazzo Piomarta, Rovereto} }
-
Open Full-Time Position (E13) in Semi-Automated Thematic Text Classification as a Basis for Corpus Linguistic Value-Added Services
by
At the Institute of Computer Science (Prof. Dr. Alexander Mehler, TTLab, https://www.texttechnologylab.org/), Department of Computer Science and Mathematics at Goethe University Frankfurt, one position for a
Research Assistant (m/f/d)
(E 13 TV-G-U)is available at the next possible date for a period of three years within the project Semiautomated Thematic Text Classification as a Basis for Corpus Linguistic Value-Added Services, with the opportunity to pursue a scientific qualification (PhD). The project is funded by the German Research Foundation (DFG). The salary group classification is based on the job characteristics determined by the collective labour agreement in effect for the Goethe University (TV-G-U).
The aim of the project is to develop a deep learning-based topic classification system using the Wikipedia category system and data from the Wikidata project, and to develop and test topic models based on this classification system for the automatic classification of texts, including those from the Leibniz Institute for the German Language (IDS) in Mannheim. The project will be carried out in cooperation with the IDS and the Saxon Academy of Sciences in Leipzig. The research will focus on state-of-the-art AI methods, in particular generative AI methods.
The applicant is expected to engage in the project and actively participate in courses, workshops, and events of the project. We are looking for a highly qualified individual with a strong interest in research methods in the fields of AI and topic modeling as well as in the team-oriented development and application of innovative, research-oriented methods in the field of text modeling. With the Text-Technology Lab, in which the position will be embedded, we offer a research-oriented, internationally focused working environment in the fields of computational humanities, multimodal computing, machine learning and artificial intelligence. This includes funding for conference attendance and individual career development.
Requirements
- Completed academic degree (Master’s or equivalent) in computer science, computational humanities, computational linguistics or a field related to text modelling and AI.
- Experience in the development and testing of NLP or AI methods.
- Extensive programming knowledge in Java, Python or similar.
- An interest in issues relating to information science is desirable but not essential.
Please send your application with the usual documents (cover letter, CV, copies of certificates) electronically in a summarized PDF document by 20.09.2024 to Prof. Dr. Alexander Mehler: mehler@em.uni-frankfurt.de.
-
New Publication Accepted for the 2nd Workshop on Legal Information Retrieval meets AI (LIRAI24)
by
Our paper, “Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles,” has been accepted to the 2nd Workshop on Legal Information Retrieval Meets AI. In this work, we present an approach that leverages embedding spaces, bootstrap aggregation, and SVR ensembles to retrieve legal passages efficiently, demonstrating improved recall compared to baseline methods (0.849 > 0.803 | 0.829):
2024. Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles. Proceedings of the 2nd Legal Information Retrieval meets Artificial Intelligence Workshop LIRAI 2024. accepted.BibTeX
@inproceedings{Boenisch:Mehler:2024, title = {Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles}, author = {B\"{o}nisch, Kevin and Mehler, Alexander}, year = {2024}, booktitle = {Proceedings of the 2nd Legal Information Retrieval meets Artificial Intelligence Workshop LIRAI 2024}, location = {Poznan, Poland}, publisher = {CEUR-WS.org}, address = {Aachen, Germany}, series = {CEUR Workshop Proceedings}, note = {accepted}, abstract = {We introduce a retrieval approach leveraging Support Vector Regression (SVR) ensembles, bootstrap aggregation (bagging), and embedding spaces on the German Dataset for Legal Information Retrieval (GerDaLIR). By conceptualizing the retrieval task in terms of multiple binary needle-in-a-haystack subtasks, we show improved recall over the baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting promising initial results, without training or fine-tuning any deep learning models. Our approach holds potential for further enhancement, particularly through refining the encoding models and optimizing hyperparameters.}, keywords = {legal information retrieval, support vector regression, word embeddings, bagging ensemble} }
Sign up to our mailing list to receive news updates.