D 2003

Text Corpus with Errors

PALA, Karel; Pavel RYCHLÝ and Pavel SMRŽ

Basic information

Original name

Text Corpus with Errors

Authors

PALA, Karel; Pavel RYCHLÝ and Pavel SMRŽ

Edition

Berlin, Text, Speech and Dialogue: Sixth International Conference, TSD 2003, p. 90-97, 8 pp. 2003

Publisher

Springer Verlag

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

References:

RIV identification code

RIV/00216224:14330/03:00009149

Organization unit

Faculty of Informatics

ISBN

3-540-200-24-X

UT WoS

000186386400012

Keywords in English

error detection
Changed: 26/5/2004 15:13, doc. Mgr. Pavel Rychlý, Ph.D.

Abstract

In the original language

This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.

Links

MSM 143300003, plan (intention)
Name: Interakce člověka s počítačem, dialogové systémy a asistivní technologie
Investor: Ministry of Education, Youth and Sports of the CR, Human-computer interaction, dialog systems and assistive technologies