Robert Mayer Str. 10
Tel: +49 69-798-28926
Office Hour Thursday, 3-4 PM
Almost any study in corpus linguistics boils down to construct, annotate, represent and analyze linguistic data. The requirements on a proper database are often contradicting:
- It should be able to scale well with ever growing corpora such as the Wikipedia- while still being flexible for annotation and edition.
- It should serve a broad spectrum of analyses by minimizing the need to transform data for a specific kind of analysis, while still being space efficient.
- The data model should be able to mediate between standard formats while not becoming over-generic and difficult to handle.
Designing and developing linguistic databases has become a major topic for me. Realizing that there is no such thing as the ultimate solution Iam interested in all kinds of database management systems and paradigms including relational-, graph-, distributed- and NoSQL databases as well as APIs for persistent storage.