On Evaluation of Natural Language Processing Tasks: Is Gold
Standard Evaluation Methodology a Good Solution?

D 2016

On Evaluation of Natural Language Processing Tasks: Is Gold Standard Evaluation Methodology a Good Solution?

KOVÁŘ, Vojtěch, Miloš JAKUBÍČEK and Aleš HORÁK

Basic information

Original name

On Evaluation of Natural Language Processing Tasks: Is Gold Standard Evaluation Methodology a Good Solution?

Name in Czech

K evaluaci úkolů zpracování přirozeného jazyka: je metodologie používající "gold standardy" dobrým řešením?

Authors

KOVÁŘ, Vojtěch (203 Czech Republic, guarantor, belonging to the institution), Miloš JAKUBÍČEK (203 Czech Republic, belonging to the institution) and Aleš HORÁK (203 Czech Republic, belonging to the institution)

Edition

Rome, Proceedings of the 8th International Conference on Agents and Artificial Intelligence, p. 540-545, 6 pp. 2016

Publisher

SCITEPRESS

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Italy

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

storage medium (CD, DVD, flash disk)

RIV identification code

RIV/00216224:14330/16:00087757

Organization unit

Faculty of Informatics

ISBN

978-989-758-172-4

Keywords (in Czech)

zpracování přirozeného jazyka; aplikace; vyhodnocování; evaluace

Keywords in English

Natural Language Processing; Applications; Evaluation

Abstract

ORIG CZ

V originále

The paper discusses problems in state of the art evaluation methods used in natural language processing (NLP). Usually, some form of gold standard data is used for evaluation of various NLP tasks, ranging from morphological annotation to semantic analysis. We discuss problems and validity of this type of evaluation, for various tasks, and illustrate the problems on examples. Then we propose using application-driven evaluations, wherever it is possible. Although it is more expensive, more complicated and not so precise, it is the only way to find out if a particular tool is useful at all.

In Czech

Práce se zabývá problémy v metodologii vyhodnocování v oblasti zpracování přirozeného jazyka (NLP). Většinou jsou pro takové vyhodnocování používána tzv. "gold standard" data. Diskutujeme problémy a validitu tohoto přístupu a navrhujeme aplikačně orientovanou alternativu.

Links

GA15-13277S, research and development project

Name: Hyperintensionální logika pro analýzu přirozeného jazyka

Investor: Czech Science Foundation

7F14047, research and development project

Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)

Investor: Ministry of Education, Youth and Sports of the CR

Detailed Information on Publication Record