Evaluating Natural Language Processing Tasks with Low
Inter-Annotator Agreement: The Case of Corpus Applications

D 2016

Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

KOVÁŘ, Vojtěch

Základní údaje

Originální název

Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

Autoři

KOVÁŘ, Vojtěch (203 Česká republika, garant, domácí)

Vydání

Brno, Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, od s. 127-134, 8 s. 2016

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Kód RIV

RIV/00216224:14330/16:00092356

Organizační jednotka

Fakulta informatiky

ISBN

978-80-263-1095-2

ISSN

UT WoS

000466886400014

Klíčová slova anglicky

NLP; inter-annotator agreement; low inter-annotator agreement; evaluation; application; application-based evaluation; word sketch; thesaurus; terminology

Změněno: 13. 5. 2020 19:13, RNDr. Pavel Šmerk, Ph.D.

Anotace

V originále

In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way.

Návaznosti

7F14047, projekt VaV

Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

Citovat

KOVÁŘ, Vojtěch. Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016. Brno: Tribun EU, 2016, s. 127-134. ISBN 978-80-263-1095-2.

@inproceedings{1365039,
   author = {Kovář, Vojtěch},
   address = {Brno},
   booktitle = {Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016},
   editor = {Aleš Horák, Pavel Rychlý, Adam Rambousek},
   keywords = {NLP; inter-annotator agreement; low inter-annotator agreement; evaluation; application; application-based evaluation; word sketch; thesaurus; terminology},
   howpublished = {tištěná verze "print"},
   language = {eng},
   location = {Brno},
   isbn = {978-80-263-1095-2},
   pages = {127-134},
   publisher = {Tribun EU},
   title = {Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications},
   year = {2016}
}

TY  - JOUR
ID  - 1365039
AU  - Kovář, Vojtěch
PY  - 2016
TI  - Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications
PB  - Tribun EU
CY  - Brno
SN  - 9788026310952
KW  - NLP
KW  - inter-annotator agreement
KW  - low inter-annotator agreement
KW  - evaluation
KW  - application
KW  - application-based evaluation
KW  - word sketch
KW  - thesaurus
KW  - terminology
N2  - In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way.
ER  -

KOVÁŘ, Vojtěch. Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications. In Aleš Horák, Pavel Rychlý, Adam Rambousek. \textit{Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016}. Brno: Tribun EU, 2016, s.~127-134. ISBN~978-80-263-1095-2.

Podrobný výpis o publikaci