Zeitschrift für Germanistische Linguistik, Journal Year: 2025, Volume and Issue: 53(1), P. 166 - 214
Published: April 10, 2025
Abstract Linguistic research frequently requires the categorization of language phenomena in corpus data (annotation). Since those may occur plentifully, a partial or full automation annotation process appears attractive. The filtering and recombination existing layers seems to further provide an elegant solution deduction higher-level annotations. In this contribution, we show at example German split particle verbs that approach results number linguistic, technological, epistemological challenges related precise definition various models employed their interfaces. We argue manual is not merely preprocessing task, but itself central development linguistic theory. discuss why machine-based processing can neither mimic nor replace process; it generally reach level precision would be suitable for without integration, adaptation, correction; how its blind application systematically skews crucial areas research. close with suggestion several best practice approaches which help prevent resolve incompatibilities delays arising from common problems corpus-based modeling.
Language: Английский