text2ddc

About
We present text2ddc, a website and an API that generates labeled topic classifications based on the Dewey Decimal Classification (DDC), an international standard for topic classification in libraries.
text2ddc is a largely language-independent neural network-based classifier for DDC-related topic classification, which we optimized using a wide range of linguistic features to achieve an F-score of 87.4%. To show that our approach is language-independent, we evaluate text2ddc using up to 40 different languages.
We derive a topic model based on text2ddc, which generates probability distributions over semantic units for any input on sense-, word- and text-level.
Unlike related approaches, however, these probabilities are estimated by means of text2ddc so that each dimension of the resulting vector representation is uniquely labeled by a DDC class.
In this way, we introduce a neural network-based Classifier-Induced Semantic Space.
About text2ddcUsing text2ddc
  • [PDF] D. Baumartz, T. Uslu, and A. Mehler, “LTV: Labeled Topic Vector,” in Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics: System Demonstrations, August 20-26, Santa Fe, New Mexico, USA, 2018.
    [Bibtex]
    @InProceedings{Baumartz:Uslu:Mehler:2018,
        author    = {Daniel Baumartz and Tolga Uslu and Alexander Mehler},
        title     = {{LTV}: Labeled Topic Vector},
        booktitle = {Proceedings of {COLING 2018}, the 27th International Conference on Computational Linguistics: System Demonstrations, August 20-26},
        year      = {2018},
        address   = {Santa Fe, New Mexico, USA},
        publisher = {The COLING 2018 Organizing Committee},
        abstract  = {In this paper, we present LTV, a website and an API that generate labeled topic classifications based on the Dewey Decimal Classification (DDC), an international standard for topic classification in libraries. We introduce nnDDC, a largely language-independent neural network-based classifier for DDC-related topic classification, which we optimized using a wide range of linguistic features to achieve an F-score of 87.4%. To show that our approach is language-independent, we evaluate nnDDC using up to 40 different languages. We derive a topic model based on nnDDC, which generates probability distributions over semantic units for any input on sense-, word- and text-level. Unlike related approaches, however, these probabilities are estimated by means of nnDDC so that each dimension of the resulting vector representation is uniquely labeled by a DDC class. In this way, we introduce a neural network-based Classifier-Induced Semantic Space (nnCISS).},
        pdf = {https://www.texttechnologylab.org/wp-content/uploads/2018/06/coling2018.pdf}
    }