D 2016

Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

KOVÁŘ, Vojtěch

Basic information

Original name

Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications

Authors

KOVÁŘ, Vojtěch (203 Czech Republic, guarantor, belonging to the institution)

Edition

Brno, Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, p. 127-134, 8 pp. 2016

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14330/16:00092356

Organization unit

Faculty of Informatics

ISBN

978-80-263-1095-2

ISSN

UT WoS

000466886400014

Keywords in English

NLP; inter-annotator agreement; low inter-annotator agreement; evaluation; application; application-based evaluation; word sketch; thesaurus; terminology
Změněno: 13/5/2020 19:13, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way.

Links

7F14047, research and development project
Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
Investor: Ministry of Education, Youth and Sports of the CR