Bangla Textbook Corpus

[Could not find the bibliography file(s)The Bangla textbook corpus has been extracted from textbooks that have been used for teaching in public schools in Banglades. The corpus is collected in the year of 2012. the corpus collected with the aim to support research multilingual text readability analysis. The corpus contains 661 documents 105,897 sentences and 1,029,354 tokens. The format of the corpus is TEI P5. For more details on this corpus please refer to:[?]

Reference
Islam, Md. Zahurul
Multilingual Text Classification using Information-Theoretic Features
PhD Thesis; Goethe University Frankfurt; 2014
In the case that you use this corpus, please cite the publications above.

Acknowledgements: The work is supported by the LOEWE Digital-Humanities Project at the Goethe University Frankfurt.

Download as ZIP archive (4.25 MB)