The customized Europarl corpus has been extracted from the Europarl corpus in order to support research on corpus-based translations. The corpus contains 3,152,650 sentences from 21 European languages of 7 language (sub-) families. The format of the corpus is TEI P5. For more details on this corpus please refer to:[1]
Multilingual Text Classification using Information-Theoretic Features
PhD Thesis; Goethe University Frankfurt; 2014
In the case that you use this corpus, please cite the paper above in conjunction with the following paper:
Copyright: We are not aware of any copyright restrictions on this resource. If you notice any problems please let us know.
Acknowledgements: The work is supported by the LOEWE Digital-Humanities Project at the Goethe University Frankfurt.
![[pdf]](https://www.texttechnologylab.org/wp-content/plugins/papercite/img/pdf.png)
[Bibtex]
@InProceedings{Islam:Mehler:2012:a,
Author = {Islam, Md. Zahurul and Mehler, Alexander},
Title = {Customization of the Europarl Corpus for Translation
Studies},
BookTitle = {Proceedings of the 8th International Conference on
Language Resources and Evaluation (LREC)},
abstract = {Currently, the area of translation studies lacks
corpora by which translation scholars can validate
their theoretical claims, for example, regarding the
scope of the characteristics of the translation
relation. In this paper, we describe a customized
resource in the area of translation studies that mainly
addresses research on the properties of the translation
relation. Our experimental results show that the
Type-Token-Ratio (TTR) is not a universally valid
indicator of the simplification of translation.},
owner = {zahurul},
pdf = {http://www.lrec-conf.org/proceedings/lrec2012/pdf/729_Paper.pdf},
timestamp = {2012.02.02},
year = 2012
}