Humboldt-Universität zu Berlin - Sprach- und literaturwissenschaftliche Fakultät - Westslawische Sprachen

PolDi – Polnisches Diachrones Forschungskorpus

PolDi ist ein diachrones Textkorpus, das im DFG-Projekt "Korpuslinguistik und diachrone Syntax I-II" ausgearbeitet wurde.

The Polish Diachronic Research Corpus PolDi

was originally developed at the University of Regensburg in the project "Corpus linguistics and diachronic syntax I" (PI: Björn Hansen and Ernst Hansack) and then extended and further annotated at Humboldt-Universität zu Berlin in the project "Corpus linguistics and diachronic syntax II" (PI: Roland Meyer). Both projects were funded by the German Research Foundation (DFG).

 

PolDi feeds itself upon three sources:

  • First, the XML-annotated texts of the Korpus staropolski of the Academy of Sciences in Kraków, which contains virtually all relevant texts until 1500. This corpus was kindly made available by prof. Rafał Górski and Prof. W. Twardzik, Kraków.
  • Second, the so-called Göttingen corpus of Polish Baroque texts, containing digitised editions of middle Pol texts which we were kindly given access to by Prof. Gerd Hentschel, Oldenburg.
  • And third, some later digitised texts from online virtual libraries.

We gratefully acknowledge the contributors and their work.

 

PolDi's composition roughly follows the choices made in standard anthologies of Polish language history and must be called opportunistic. Since Polish libraries are massively digitising their collections, and efforts are under way to provide useable OCR of these texts, it is foreseeable that truly enormous historical corpora of Polish will be available in the near future. Until then, PolDi will be useful as a comparatively large and partly annotated resource.

A number of individuals worked on the present shape of the corpus, notably Arkadiusz Danszczyk, Björn Hansen and Roland Meyer. 

 

PolDi uses the search and visualization architecture ANNIS. ANNIS is described in detail in the publication

 

If you wish to use the corpus, please make reference to our new and permanent web address at http://hu.berlin/poldi

and cite the following paper in your work:

Meyer, Roland (2012): The Construction and Application of Diachronic Slavonic Corpora in Linguistic Research – RRuDi (Russian) and PolDi (Polish). In: Hansen, Björn (ed.): Diachrone Aspekte slavischer Sprachen. (= Slavolinguistica 16), München/Berlin: Verlag Otto Sagner, 223-242.

 

You may enter PolDi on our corpus server using the login/password poldi/poldi