D 2016

Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study

CINKOVA, Silvie, Ema KREJČOVÁ, Anna VERNEROVÁ and Vít BAISA

Basic information

Original name

Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study

Authors

CINKOVA, Silvie (203 Czech Republic), Ema KREJČOVÁ (203 Czech Republic), Anna VERNEROVÁ (203 Czech Republic) and Vít BAISA (203 Czech Republic, belonging to the institution)

Edition

Portorož, Slovenia, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), p. 848-854, 7 pp. 2016

Publisher

European Language Resources Association (ELRA)

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Slovenia

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

RIV identification code

RIV/00216224:14330/16:00090038

Organization unit

Faculty of Informatics

ISBN

978-2-9517408-9-1

Keywords in English

CPA; graded decisions; English; verbs; usage patterns; annotation; Likert scales

Tags

Tags

International impact, Reviewed
Změněno: 27/5/2016 13:35, Mgr. et Mgr. Vít Baisa, Ph.D.

Abstract

V originále

We present a pilot analysis of a new linguistic resource, VPS-GradeUp (available at http://hdl.handle.net/11234/1-1585 ). The resource contains 11,400 graded human decisions on usage patterns of 29 English lexical verbs, randomly selected from the Pattern Dictionary of English Verbs (Hanks, 2000 2014). The selection was random and based on their frequency and the number of senses their lemmas have in PDEV. This data set has been created to observe the interannotator agreement on PDEV patterns produced using the Corpus Pattern Analysis (Hanks, 2013). Apart from the graded decisions, the data set also contains traditional Word-Sense-Disambiguation (WSD) labels. We analyze the associations between the graded annotation and WSD annotation. The results of the respective annotations do not correlate with the size of the usage pattern inventory for the respective verbs lemmas, which makes the data set worth further linguistic analysis.

Links

LM2015071, research and development project
Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
7F14047, research and development project
Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
Investor: Ministry of Education, Youth and Sports of the CR