Graded and Word-Sense-Disambiguation Decisions in Corpus
Pattern Analysis: a Pilot Study

D 2016

Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study

CINKOVA, Silvie, Ema KREJČOVÁ, Anna VERNEROVÁ a Vít BAISA

Základní údaje

Originální název

Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study

Autoři

CINKOVA, Silvie (203 Česká republika), Ema KREJČOVÁ (203 Česká republika), Anna VERNEROVÁ (203 Česká republika) a Vít BAISA (203 Česká republika, domácí)

Vydání

Portorož, Slovenia, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), od s. 848-854, 7 s. 2016

Nakladatel

European Language Resources Association (ELRA)

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Slovinsko

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

elektronická verze "online"

Kód RIV

RIV/00216224:14330/16:00090038

Organizační jednotka

Fakulta informatiky

ISBN

978-2-9517408-9-1

Klíčová slova anglicky

CPA; graded decisions; English; verbs; usage patterns; annotation; Likert scales

Štítky

firank_B

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 27. 5. 2016 13:35, Mgr. et Mgr. Vít Baisa, Ph.D.

Anotace

V originále

We present a pilot analysis of a new linguistic resource, VPS-GradeUp (available at http://hdl.handle.net/11234/1-1585 ). The resource contains 11,400 graded human decisions on usage patterns of 29 English lexical verbs, randomly selected from the Pattern Dictionary of English Verbs (Hanks, 2000 2014). The selection was random and based on their frequency and the number of senses their lemmas have in PDEV. This data set has been created to observe the interannotator agreement on PDEV patterns produced using the Corpus Pattern Analysis (Hanks, 2013). Apart from the graded decisions, the data set also contains traditional Word-Sense-Disambiguation (WSD) labels. We analyze the associations between the graded annotation and WSD annotation. The results of the respective annotations do not correlate with the size of the usage pattern inventory for the respective verbs lemmas, which makes the data set worth further linguistic analysis.

Návaznosti

LM2015071, projekt VaV

Název: Jazyková výzkumná infrastruktura v České republice (Akronym: LINDAT-Clarin)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum

7F14047, projekt VaV

Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

Citovat

CINKOVA, Silvie, Ema KREJČOVÁ, Anna VERNEROVÁ a Vít BAISA. Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study. Online. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia: European Language Resources Association (ELRA), 2016, s. 848-854. ISBN 978-2-9517408-9-1.

@inproceedings{1346038,
   author = {Cinkova, Silvie and Krejčová, Ema and Vernerová, Anna and Baisa, Vít},
   address = {Portorož, Slovenia},
   booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
   editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
   keywords = {CPA; graded decisions; English; verbs; usage patterns; annotation; Likert scales},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Portorož, Slovenia},
   isbn = {978-2-9517408-9-1},
   pages = {848-854},
   publisher = {European Language Resources Association (ELRA)},
   title = {Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study},
   year = {2016}
}

TY  - CONF
ID  - 1346038
AU  - Cinkova, Silvie - Krejčová, Ema - Vernerová, Anna - Baisa, Vít
PY  - 2016
TI  - Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study
PB  - European Language Resources Association (ELRA)
CY  - Portorož, Slovenia
SN  - 9782951740891
KW  - CPA
KW  - graded decisions
KW  - English
KW  - verbs
KW  - usage patterns
KW  - annotation
KW  - Likert scales
N2  - We present a pilot analysis of a new linguistic resource, VPS-GradeUp (available at http://hdl.handle.net/11234/1-1585 ). The resource contains 11,400 graded human decisions on usage patterns of 29 English lexical verbs, randomly selected from the Pattern Dictionary of English Verbs (Hanks, 2000 2014). The selection was random and based on their frequency and the number of senses their lemmas have in PDEV. This data set has been created to observe the interannotator agreement on PDEV patterns produced using the Corpus Pattern Analysis (Hanks, 2013). Apart from the graded decisions, the data set also contains traditional Word-Sense-Disambiguation (WSD) labels. We analyze the associations between the graded annotation and WSD annotation. The results of the respective annotations do not correlate with the size of the usage pattern inventory for the respective verbs lemmas, which makes the data set worth further linguistic analysis.
ER  -

CINKOVA, Silvie, Ema KREJČOVÁ, Anna VERNEROVÁ a Vít BAISA. Graded and Word-Sense-Disambiguation Decisions in Corpus Pattern Analysis: a Pilot Study. Online. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis. \textit{Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}. Portorož, Slovenia: European Language Resources Association (ELRA), 2016, s.~848-854. ISBN~978-2-9517408-9-1.

Přehled o publikaci