Distinguishing the Types of Coordinated Verbs with a Shared
Argument by means of New ZeugBERT Language Model and
ZeugmaDataset

D 2022

Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset

MEDKOVÁ, Helena and Aleš HORÁK

Basic information

Original name

Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset

Authors

MEDKOVÁ, Helena (203 Czech Republic, guarantor, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)

Edition

Amsterdam, Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria, p. 206-218, 13 pp. 2022

Publisher

IOS Press

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

60203 Linguistics

Country of publisher

Netherlands

Confidentiality degree

is not subject to a state or trade secret

Publication form

printed version "print"

References:

URL

RIV identification code

RIV/00216224:14210/22:00126225

Organization unit

Faculty of Arts

ISBN

978-1-64368-320-1

DOI

http://dx.doi.org/10.3233/SSW220022

UT WoS

001176503400015

Keywords in English

natural language understanding; coordinated verbs with shared argument; zeugma; BERT language model; dataset

Abstract

V originále

Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches $88\,\%$ of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.

Links

MUNI/A/1184/2020, interní kód MU

Name: Využití strojového učení při detekci společného argumentu v koordinovaných strukturách

Investor: Masaryk University

Citovat

MEDKOVÁ, Helena and Aleš HORÁK. Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset. In Dimou, Anastasia; Neumaier, Sebastian; Pellegrini, Tassilo; Vahdati, Sahar. Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria. Amsterdam: IOS Press, 2022, p. 206-218. ISBN 978-1-64368-320-1. Available from: https://dx.doi.org/10.3233/SSW220022.

@inproceedings{1874319,
   author = {Medková, Helena and Horák, Aleš},
   address = {Amsterdam},
   booktitle = {Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria},
   doi = {http://dx.doi.org/10.3233/SSW220022},
   editor = {Dimou, Anastasia; Neumaier, Sebastian; Pellegrini, Tassilo; Vahdati, Sahar},
   keywords = {natural language understanding; coordinated verbs with shared argument; zeugma; BERT language model; dataset},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Amsterdam},
   isbn = {978-1-64368-320-1},
   pages = {206-218},
   publisher = {IOS Press},
   title = {Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset},
   url = {https://ebooks.iospress.nl/volumearticle/60724},
   year = {2022}
}

TY  - CONF
ID  - 1874319
AU  - Medková, Helena - Horák, Aleš
PY  - 2022
TI  - Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset
PB  - IOS Press
CY  - Amsterdam
SN  - 9781643683201
KW  - natural language understanding
KW  - coordinated verbs with shared argument
KW  - zeugma
KW  - BERT language model
KW  - dataset
UR  - https://ebooks.iospress.nl/volumearticle/60724
N2  - Sentences where two verbs share a single argument represent a complex and highly ambiguous syntactic phenomenon. The argument sharing relations must be considered during the detection process from both a syntactic and semantic perspective. Such expressions can represent ungrammatical constructions, denoted as zeugma, or idiomatic elliptical phrase combinations. Rule-based classification methods prove ineffective because of the necessity to reflect meaning relations of the analyzed sentence constituents. This paper presents the development and evaluation of ZeugBERT, a language model tuned for the sentence classification task using a pre-trained Czech transformer model for language representation. The model was trained with a newly prepared dataset, which is also published with this paper, of 7,849 Czech sentences to classify Czech syntactic structures containing coordinated verbs that share a valency argument (or an optional adjunct) in the context of coordination. ZeugBERT here reaches $88\,\%$ of test set accuracy. The text describes the process of the new dataset creation and annotation, and it offers a detailed error analysis of the developed classification model.
ER  -

MEDKOVÁ, Helena and Aleš HORÁK. Distinguishing the Types of Coordinated Verbs with a Shared Argument by means of New ZeugBERT Language Model and ZeugmaDataset. In Dimou, Anastasia; Neumaier, Sebastian; Pellegrini, Tassilo; Vahdati, Sahar. \textit{Towards a Knowledge-Aware AI : SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13-15 September 2022, Vienna, Austria}. Amsterdam: IOS Press, 2022, p.~206-218. ISBN~978-1-64368-320-1. Available from: https://dx.doi.org/10.3233/SSW220022.

Přehled o publikaci