Middle Polish Dependency-Constituency Treebank


About the resource

The Middle Polish Dependency-Constituency Treebank was created in 2018-2024. It contains 2,018 syntactically annotated sentences and is still being expanded. Sentences for syntactic marking were selected from the manually annotated subcorpus of the Electronic Corpus of 17th- and 18th-century Polish Texts. The current version mostly contains sentences ranging from 10 to 50 tokens in length, selected from prose texts, containing no elements in foreign languages or elements marked as uncertain or unrecognized. The annotation was carried out in the dependency format in accordance with the assumptions of the Polish Dependency Bank http://zil.ipipan.waw.pl/PDB; Wróblewska 2018 i 2020). Then the resource was enriched with constituency information using the Hydra parser according to the concept of K. Krasnowska-Kieraś and M. Woliński (Krasnowska-Kieraś i Woliński 20232024).

Treebank


The team

Conceptual work and coordination:
Aleksandra Wieczorek
Dependency annotation:
Bożena Itoya
Emanuel Modrzejewski
Martyna Sabała-Bolek
Aleksandra Wieczorek
IT works:
Dorota Komosińska
Consultations:
Alina Wróblewska
Adding the constituency information using the Hydra parser:
Katarzyna Krasnowska-Kieraś
Marcin Woliński
The Arboretum search engine:
Marcin Woliński

Financing

The preparation of the first 1015 sentences was financed as part of the project The extending of the Electronic Corpus of 17th- and 18th-century Polish Texts and its integration with the Electronic Dictionary of the 17th- and 18th-century Polish (financing: Ministry of Science and Higher Education – National Program for the Development of the Humanities, project number: 0413/NPRH7/H11/86/2018, duration: December 6, 2018 – December 5, 2023). Another 1,003 sentences were annotated as part of the project Introduction to the study of word order in the Middle Polish sentence – the order of the adjective and adjective-like modifier (financing: Ministry of Science and Higher Education – Miniatura, project no.: 2023/07/X/HS2/00111, duration: July 11, 2023 – July 10, 2024). Both projects were performed at the Institute of Polish Language of the Polish Academy of Sciences. The creation of the Arboretum search engine and Hydra parser was financed by the project "Digital Research Infrastructure for the Arts and Humanities" (Dariah.lab, POIR.04.02.00-00-D006/20-00).