D 2022

Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset

MEDKOVÁ, Helena and Aleš HORÁK

Basic information

Original name

Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset

Authors

MEDKOVÁ, Helena (203 Czech Republic, guarantor, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)

Edition

Amsterdam, Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria, p. 206-218, 13 pp. 2022

Publisher

IOS Press

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60203 Linguistics

Country of publisher

Netherlands

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14210/22:00126225

Organization unit

Faculty of Arts

ISBN

978-1-64368-320-1

UT WoS

001176503400015

Keywords in English

natural language understanding; coordinated verbs with shared argument; zeugma; BERT language model; dataset

Tags

Tags

International impact, Reviewed
Změněno: 14/5/2024 10:18, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches $88\,\%$ of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.

Links

MUNI/A/1184/2020, interní kód MU
Name: Využití strojového učení při detekci společného argumentu v koordinovaných strukturách
Investor: Masaryk University