New Publication Accepted for the 2nd Workshop on Legal Information Retrieval meets AI (LIRAI24)

Our paper, “Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles,” has been accepted to the 2nd Workshop on Legal Information Retrieval Meets AI. In this work, we present an approach that leverages embedding spaces, bootstrap aggregation, and SVR ensembles to retrieve legal passages efficiently, demonstrating improved recall compared to baseline methods (0.849 > 0.803 | 0.829):

Kevin Bönisch and Alexander Mehler. 2024. Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval via Bagging and SVR Ensembles. Proceedings of the 2nd Legal Information Retrieval meets Artificial Intelligence Workshop LIRAI 2024. accepted.
BibTeX
@inproceedings{Boenisch:Mehler:2024,
  title     = {Finding Needles in Emb(a)dding Haystacks: Legal Document Retrieval
               via Bagging and SVR Ensembles},
  author    = {B\"{o}nisch, Kevin and Mehler, Alexander},
  year      = {2024},
  booktitle = {Proceedings of the 2nd Legal Information Retrieval meets Artificial
               Intelligence Workshop LIRAI 2024},
  location  = {Poznan, Poland},
  publisher = {CEUR-WS.org},
  address   = {Aachen, Germany},
  series    = {CEUR Workshop Proceedings},
  note      = {accepted},
  abstract  = {We introduce a retrieval approach leveraging Support Vector Regression
               (SVR) ensembles, bootstrap aggregation (bagging), and embedding
               spaces on the German Dataset for Legal Information Retrieval (GerDaLIR).
               By conceptualizing the retrieval task in terms of multiple binary
               needle-in-a-haystack subtasks, we show improved recall over the
               baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting
               promising initial results, without training or fine-tuning any
               deep learning models. Our approach holds potential for further
               enhancement, particularly through refining the encoding models
               and optimizing hyperparameters.},
  keywords  = {legal information retrieval, support vector regression, word embeddings, bagging ensemble}
}