Corpus Linguistics
The project line "Spoken Corpus" uses the complex surroundings of the development and use of a speech corpus for didactic and scientific aims.
Incentive:
There are no serious corpora of spoken language for BCMS/Albanian available. In order to develop them, trained empirical, corpus and computational linguists are needed. Such training in the framework of "Digital Humanities" is also vital for the employability of graduates.
Aims:
-
Development of a joint curriculum or teaching plan (topics, handouts, training material, literature) for the whole workflow from data collection to analysis
-
Creation of a joint working environment for corpus development and use, including the use of tools for collaboration (e.g. SLACK, Moodle, ...)
-
Construction of a corpus of spoken language with all varieties of BCMS/Albanian, including bilinguals, heritage speakers and data from experiments (like CHILDES), equipped with a multi-layered linguistic and "social" annotation
-
Training of students and staff in all respective domains of action
-
Empowerment for fieldwork, also in politically sensitive regions
Domains of action:
A Tool Creation
B Corpus Design
C Corpus Creation
D Corpus Analysis
E Social Corpus Linguistics
Competencies
linguistic fieldwork, linguistic experimentation, transcription, corpus creation, sampling, annoation, tagging, scripting, statistics, socio-, ethno- and "simple" linguistic analysis, translation, project management, ... you name it
Formats
-
Workshops for qualification of staff in Berlin and partner countries
-
Individual staff training mobilities on topics A-E
-
Individual student mobilities with integration in project
Steering Committee
- Bardh Rugova (Prishtina)
- Branimir Stanković (Niš)
- Ismail Palić (Sarajevo)
- Ivana Vučina (Beograd)
- Jelena Petković (Kragujevac)
- Philipp Wasserscheidt (HU)
Projects
- BosCO
Construction of a corpus of spoken language with all varieties of BCMS in Bosnia & Herzegovina as well as Bosnakian varieties outside BiH, including bilinguals, heritage speakers and data from experiments (like CHILDES), equipped with a multi-layered linguistic and "social" annotation. In collaboration with the University of Sarajevo and the Institut za Jezik Sarajevo. - Voices of the city
Corpus with urban varities of Central and Southern Serbian Cities. In collaboration with the universities of Niš and Kragujevac -
Corpus of Narratives
Corpus of spoken personal narratives. The material is taken from ethnological fieldwork in the Southern Banat. Annotations are made on several levels with the focus of narrative elements, the structure of personal narratives and constructions in the sense of construction grammar. In collaboration with the Balkanologic Institute of the Serbian Academie of Sciences and Arts.
http://poincare.matf.bg.ac.rs/~andjelkaz/diwna/ -
Corpus of Serbian in Hungary
Spoken and written corpus of the Serbian minority in Hungary. In collaboration with ELTE and the Serbian Institut, Budapest.