The following new publication has been accepted:
HCI International 2022 – Late Breaking Papers. Multimodality in Advanced Interaction Environments
- [1] Introduction to the 2nd Edition of “Semantic, Artificial and Computational Interaction Studies”
Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)
- [2] Tafsir Dataset: A Novel Multi-Task Benchmark for Named Entity Recognition and Topic Modeling in Classical Arabic Literature
Total: 2
-
C. Ebert, A. Lücking, and A. Mehler, “Introduction to the 2nd Edition of “Semantic, Artificial and Computational Interaction Studies”,” in HCI International 2022 – Late Breaking Papers. Multimodality in Advanced Interaction Environments, Cham, 2022, pp. 36-47.
[Abstract] [BibTeX]``Behavioromics'' is a term that has been invented to cover the study of multimodal interaction from various disciplines and points of view. These disciplines and points of view, however, lack a platform for exchange. The workshop session on ``Semantic, artificial and computational interaction studies'' provides such a platform. We motivate behavioromics, sketch its historical background, and summarize this year's contributions.
@inproceedings{Ebert:et:al:2022, abstract = "``Behavioromics'' is a term that has been invented to cover the study of multimodal interaction from various disciplines and points of view. These disciplines and points of view, however, lack a platform for exchange. The workshop session on ``Semantic, artificial and computational interaction studies'' provides such a platform. We motivate behavioromics, sketch its historical background, and summarize this year's contributions.", address = "Cham", author = "Ebert, Cornelia and L{\"u}cking, Andy and Mehler, Alexander", booktitle = "HCI International 2022 - Late Breaking Papers. Multimodality in Advanced Interaction Environments", editor = "Kurosu, Masaaki and Yamamoto, Sakae and Mori, Hirohiko and Schmorrow, Dylan D. and Fidopiastis, Cali M. and Streitz, Norbert A. and Konomi, Shin'ichi", isbn = "978-3-031-17618-0", pages = "36--47", publisher = "Springer Nature Switzerland", title = "Introduction to the 2nd Edition of ``Semantic, Artificial and Computational Interaction Studies''", doi = {https://doi.org/10.1007/978-3-031-17618-0_3}, year = "2022" }
-
S. Ahmed, R. van der Goot, M. Rehman, C. Kruse, Ö. Özsoy, A. Mehler, and G. Roig, “Tafsir Dataset: A Novel Multi-Task Benchmark for Named Entity Recognition and Topic Modeling in Classical Arabic Literature,” in Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 2022, pp. 3753-3768.
[Abstract] [BibTeX]Various historical languages, which used to be lingua franca of science and arts, deserve the attention of current NLP research. In this work, we take the first data-driven steps towards this research line for Classical Arabic (CA) by addressing named entity recognition (NER) and topic modeling (TM) on the example of CA literature. We manually annotate the encyclopedic work of Tafsir Al-Tabari with span-based NEs, sentence-based topics, and span-based subtopics, thus creating the Tafsir Dataset with over 51,000 sentences, the first large-scale multi-task benchmark for CA. Next, we analyze our newly generated dataset, which we make open-source available, with current language models (lightweight BiLSTM, transformer-based MaChAmP) along a novel script compression method, thereby achieving state-of-the-art performance for our target task CA-NER. We also show that CA-TM from the perspective of historical topic models, which are central to Arabic studies, is very challenging. With this interdisciplinary work, we lay the foundations for future research on automatic analysis of CA literature.
@inproceedings{Ahmed:et:al:2022, title = "Tafsir Dataset: A Novel Multi-Task Benchmark for Named Entity Recognition and Topic Modeling in Classical {A}rabic Literature", author = {Ahmed, Sajawel and van der Goot, Rob and Rehman, Misbahur and Kruse, Carl and {\"O}zsoy, {\"O}mer and Mehler, Alexander and Roig, Gemma}, booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "International Committee on Computational Linguistics", url = "https://aclanthology.org/2022.coling-1.330", pages = "3753--3768", abstract = "Various historical languages, which used to be lingua franca of science and arts, deserve the attention of current NLP research. In this work, we take the first data-driven steps towards this research line for Classical Arabic (CA) by addressing named entity recognition (NER) and topic modeling (TM) on the example of CA literature. We manually annotate the encyclopedic work of Tafsir Al-Tabari with span-based NEs, sentence-based topics, and span-based subtopics, thus creating the Tafsir Dataset with over 51,000 sentences, the first large-scale multi-task benchmark for CA. Next, we analyze our newly generated dataset, which we make open-source available, with current language models (lightweight BiLSTM, transformer-based MaChAmP) along a novel script compression method, thereby achieving state-of-the-art performance for our target task CA-NER. We also show that CA-TM from the perspective of historical topic models, which are central to Arabic studies, is very challenging. With this interdisciplinary work, we lay the foundations for future research on automatic analysis of CA literature.", }