Mevlüt Bagci

Mevlüt Bagci

Doctoral candidate
Goethe-Universität Frankfurt am Main
Robert-Mayer-Straße 10
D-60325 Frankfurt am Main
D-60054 Frankfurt am Main (use for package delivery)
Postfach / P.O. Box: 154

ContactPublications

Total: 1

2022 (1)

  • [PDF] [https://aclanthology.org/2022.lrec-1.202] G. Abrami, M. Bagci, L. Hammerla, and A. Mehler, “German Parliamentary Corpus (GerParCor),” in Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 2022, pp. 1900-1906.
    [Abstract] [Poster][BibTeX]

    Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliamentary Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.
    @InProceedings{Abrami:Bagci:Hammerla:Mehler:2022,
      author    = {Abrami, Giuseppe  and  Bagci, Mevlüt  and  Hammerla, Leon  and  Mehler, Alexander},
      title     = {German Parliamentary Corpus (GerParCor)},
      booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
      month          = {June},
      year           = {2022},
      address        = {Marseille, France},
      publisher      = {European Language Resources Association},
      pages     = {1900--1906},
      abstract  = {Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliamentary Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.},
      url       = {https://aclanthology.org/2022.lrec-1.202},
      poster   = {https://www.texttechnologylab.org/wp-content/uploads/2022/06/GerParCor_LREC_2022.pdf},
      pdf    = {http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.202.pdf}
    
    }