Evaluating Natural Language Processing Tasks with Low
Inter-Annotator Agreement: The Case of Corpus Applications

KOVÁŘ, Vojtěch. Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016. Brno: Tribun EU, 2016, p. 127-134. ISBN 978-80-263-1095-2.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications
Authors	KOVÁŘ, Vojtěch (203 Czech Republic, guarantor, belonging to the institution).
Edition	Brno, Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, p. 127-134, 8 pp. 2016.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
RIV identification code	RIV/00216224:14330/16:00092356
Organization unit	Faculty of Informatics
ISBN	978-80-263-1095-2
ISSN	2336-4289
UT WoS	000466886400014
Keywords in English	NLP; inter-annotator agreement; low inter-annotator agreement; evaluation; application; application-based evaluation; word sketch; thesaurus; terminology
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 13/5/2020 19:13.

Abstract

In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way.

Links
7F14047, research and development project	Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
7F14047, research and development project	Investor: Ministry of Education, Youth and Sports of the CR

PrintDisplayed: 25/8/2024 15:43

Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus ...

Other applications