Korba - treebank

Middle Polish Dependency-Constituency Treebank

About the resource

The Middle Polish Dependency-Constituency Treebank was created in 2018-2024. It contains 2,018 syntactically annotated sentences and is still being expanded. Sentences for syntactic marking were selected from the manually annotated subcorpus of the Electronic Corpus of 17^th- and 18^th-century Polish Texts. The current version mostly contains sentences ranging from 10 to 50 tokens in length, selected from prose texts, containing no elements in foreign languages or elements marked as uncertain or unrecognized. The annotation was carried out in the dependency format in accordance with the assumptions of the Polish Dependency Bank http://zil.ipipan.waw.pl/PDB; Wróblewska 2018 i 2020). Then the resource was enriched with constituency information using the Hydra parser according to the concept of K. Krasnowska-Kieraś and M. Woliński (Krasnowska-Kieraś i Woliński 2023 i 2024).

Treebank

The team

Conceptual work and coordination:: Aleksandra Wieczorek

Dependency annotation:: Bożena Itoya; Emanuel Modrzejewski; Martyna Sabała-Bolek; Aleksandra Wieczorek

IT works:: Dorota Komosińska

Consultations:: Alina Wróblewska

Adding the constituency information using the Hydra parser:: Katarzyna Krasnowska-Kieraś; Marcin Woliński

The Arboretum search engine:: Marcin Woliński

Financing

The preparation of the first 1015 sentences was financed as part of the project The extending of the Electronic Corpus of 17^th- and 18^th-century Polish Texts and its integration with the Electronic Dictionary of the 17^th- and 18^th-century Polish (financing: Ministry of Science and Higher Education – National Program for the Development of the Humanities, project number: 0413/NPRH7/H11/86/2018, duration: December 6, 2018 – December 5, 2023). Another 1,003 sentences were annotated as part of the project Introduction to the study of word order in the Middle Polish sentence – the order of the adjective and adjective-like modifier (financing: Ministry of Science and Higher Education – Miniatura, project no.: 2023/07/X/HS2/00111, duration: July 11, 2023 – July 10, 2024). Both projects were performed at the Institute of Polish Language of the Polish Academy of Sciences. The creation of the Arboretum search engine and Hydra parser was financed by the project "Digital Research Infrastructure for the Arts and Humanities" (Dariah.lab, POIR.04.02.00-00-D006/20-00).