New publication accepted at SoftwareX

The following article was published in the journal SoftwareX:

Docker Unified UIMA Interface: New perspectives for NLP on big data

Giuseppe Abrami, Markos Genios, Filip Fitzermann, Daniel Baumartz and Alexander Mehler. 2025. Docker Unified UIMA Interface: New perspectives for NLP on big data. SoftwareX, 29:102033.
BibTeX
@article{Abrami:et:al:2025:a,
  title     = {Docker Unified UIMA Interface: New perspectives for NLP on big data},
  journal   = {SoftwareX},
  volume    = {29},
  pages     = {102033},
  year      = {2025},
  issn      = {2352-7110},
  doi       = {https://doi.org/10.1016/j.softx.2024.102033},
  url       = {https://www.sciencedirect.com/science/article/pii/S2352711024004047},
  author    = {Giuseppe Abrami and Markos Genios and Filip Fitzermann and Daniel Baumartz
               and Alexander Mehler},
  keywords  = {duui, Docker, Kubernetes, UIMA, Distributed NLP},
  abstract  = {Processing large amounts of natural language text using machine
               learning-based models is becoming important in many disciplines.
               This demand is being met by a variety of approaches, resulting
               in the heterogeneous deployment of separate, partly incompatible,
               not natively scalable applications. To overcome the technological
               bottleneck involved, we have developed Docker Unified UIMA Interface,
               a system for the standardized, parallel, platform-independent,
               distributed and microservices-based solution for processing large
               and extensive text corpora with any NLP method. We present DUUI
               as a framework that enables automated orchestration of GPU-based
               NLP processes beyond the existing Docker Swarm cluster variant,
               and in addition to the adaptation to new runtime environments
               such as Kubernetes. Therefore, a new driver for DUUI is introduced,
               which enables the lightweight orchestration of DUUI processes
               within a Kubernetes environment in a scalable setup. In this way,
               the paper opens up novel text-technological perspectives for existing
               practices in disciplines that deal with the scientific analysis
               of large amounts of data based on NLP.}
}