Our paper, “Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles,” has been accepted to the 2nd Workshop on Legal Information Retrieval Meets AI. In this work, we present an approach that leverages embedding spaces, bootstrap aggregation, and SVR ensembles to retrieve legal passages efficiently, demonstrating improved recall compared to baseline methods (0.849 > 0.803 | 0.829):
2024.
Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval
via Bagging and SVR Ensembles. Proceedings of the 2nd Legal Information Retrieval meets Artificial
Intelligence Workshop LIRAI 2024.
accepted.
BibTeX
@inproceedings{Boenisch:Mehler:2024,
title = {Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval
via Bagging and SVR Ensembles},
author = {B\"{o}nisch, Kevin and Mehler, Alexander},
year = {2024},
booktitle = {Proceedings of the 2nd Legal Information Retrieval meets Artificial
Intelligence Workshop LIRAI 2024},
location = {Poznan, Poland},
publisher = {CEUR-WS.org},
address = {Aachen, Germany},
series = {CEUR Workshop Proceedings},
note = {accepted},
abstract = {We introduce a retrieval approach leveraging Support Vector Regression
(SVR) ensembles, bootstrap aggregation (bagging), and embedding
spaces on the German Dataset for Legal Information Retrieval (GerDaLIR).
By conceptualizing the retrieval task in terms of multiple binary
needle-in-a-haystack subtasks, we show improved recall over the
baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting
promising initial results, without training or fine-tuning any
deep learning models. Our approach holds potential for further
enhancement, particularly through refining the encoding models
and optimizing hyperparameters.},
keywords = {legal information retrieval, support vector regression, word embeddings, bagging ensemble}
}