D 2010

Through Low-Cost Annotation to Reliable Parsing Evaluation

JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ and Marek GRÁC

Basic information

Original name

Through Low-Cost Annotation to Reliable Parsing Evaluation

Authors

JAKUBÍČEK, Miloš (203 Czech Republic, belonging to the institution), Vojtěch KOVÁŘ (203 Czech Republic, guarantor, belonging to the institution) and Marek GRÁC (703 Slovakia, belonging to the institution)

Edition

Tokyo, PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, p. 555-562, 8 pp. 2010

Publisher

Waseda University

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60200 6.2 Languages and Literature

Country of publisher

Japan

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

RIV identification code

RIV/00216224:14330/10:00065887

Organization unit

Faculty of Informatics

ISBN

978-4-905166-00-9

Keywords in English

noun phrases;parsing;parser evaluation;annotation;inter-annotator agreement

Tags

Tags

International impact, Reviewed
Změněno: 30/4/2014 10:04, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

In this paper, we present an~application-driven low-cost concept of building a~multi-purpose language resource for Czech which is based on currently available results of previous work by various research teams active in the area of natural language processing. We particularly focus on the first phase which consists in extracting noun phrases from a~morphologically annotated corpus and providing a~simple and easy-to-use application for verifying them. For the extraction task, three Czech parsers have been accommodated and evaluated. Finally we discuss the currently achieved results in the context of ongoing work and show that they lead to consistent and reliable results.

Links

GAP401/10/0792, research and development project
Name: Temporální aspekty znalostí a informací
Investor: Czech Science Foundation
LC536, research and development project
Name: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
248307, interní kód MU
Name: Pattern Recognition-based Statistically Enhanced MT (Acronym: PRESEMT)
Investor: European Union, Pattern Recognition-based Statistically Enhanced MT, Cooperation