D 2013

Using Low-Cost Annotation to Train a Reliable Czech Shallow Parser

RADZISZEWSKI, Adam and Marek GRÁC

Basic information

Original name

Using Low-Cost Annotation to Train a Reliable Czech Shallow Parser

Authors

RADZISZEWSKI, Adam (616 Poland) and Marek GRÁC (703 Slovakia, guarantor, belonging to the institution)

Edition

Plzeň, Text, Speech, and Dialogue, p. 575-582, 8 pp. 2013

Publisher

Springer Berling Heidelberg

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60200 6.2 Languages and Literature

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14210/13:00069444

Organization unit

Faculty of Arts

ISBN

978-3-642-40584-6

ISSN

UT WoS

000337294900072

Keywords in English

corpus annotation; shallow parsing; Czech

Tags

Změněno: 6/4/2015 22:16, Mgr. Vendula Hromádková

Abstract

V originále

Bushbank is a relatively new concept - a type of annotated corpus where annotation is driven by use of automatic tools and the task of human annotators is limited to accepting or rejecting parts of their output. This creates a possibility to obtain annotated corpora of considerable size at relatively low cost. In this paper we ask the question if the Czech Bushbank is reliable enough to be used for a NLP task instead of a traditional corpus with high annotation rigour. We perform evaluation of three different parsers using its shallow syntactic annotation, including a CRF chunker made originally for Polish. The results are very promising, showing that many practical applications could benefit from low-cost annotation.