Classification of Errors in Text

JAKUBÍČEK, Miloš, Jan BUŠTA, Dana HLAVÁČKOVÁ and Karel PALA. Classification of Errors in Text. In RASLAN 2009 : Recent Advances in Slavonic Natural Language Processing. 1st ed. Brno: Masaryk University, 2009, p. 109-119. ISBN 978-80-210-5048-8.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Classification of Errors in Text
Name in Czech	Klasifikace chyb v textu
Authors	JAKUBÍČEK, Miloš (203 Czech Republic, belonging to the institution), Jan BUŠTA (203 Czech Republic, belonging to the institution), Dana HLAVÁČKOVÁ (203 Czech Republic, belonging to the institution) and Karel PALA (203 Czech Republic, guarantor, belonging to the institution).
Edition	1. vyd. Brno, RASLAN 2009 : Recent Advances in Slavonic Natural Language Processing, p. 109-119, 11 pp. 2009.
Publisher	Masaryk University

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	60200 6.2 Languages and Literature
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	URL
RIV identification code	RIV/00216224:14330/09:00038386
Organization unit	Faculty of Informatics
ISBN	978-80-210-5048-8
UT WoS	000379213700015
Keywords (in Czech)	klasifikace chyb; chyby v textu
Keywords in English	errors in text; classification of errors
Tags	International impact, Reviewed
Changed by	Changed by: Mgr. Michal Petr, učo 65024. Changed: 9/10/2019 22:33.

Abstract

This paper presents two classifications of errors in Czech texts. As a basic resource we use the corpus (Chyby -- Errors) which has been continuously developed from 1999--2000 ([1]). The corpus text contains various kinds of errors such as spelling, typographical, grammatical, semantic, lexical, and stylistic ones. They have been corrected manually and annotated according to the classification of errors (annotation scheme) developed for this purpose. For the annotation we implemented a tool named WinCorr. We mention the first annotation scheme and discuss the second one which has been designed recently to obtain more adequate description of the errors occurring in texts. We also discuss the principles on which both classifications are based.

Abstract (in Czech)

Tento článek prezentuje dvě klasifikace chyb v českých textech. Základním zdrojem je korpus Chyby, který byl vytvořen v letech 1999-2000 ([1]). Tento korpus obsahuje různé druhy chyb jako např. pravopisné, typografické, gramatické, sémantické, lexikální a stylistické. Tyto chyby byly ručně opraveny a vyznačeny podle anotačního schématu pro klasifikaci chyb, která byla pro tento účel vyvinuta. Za účelem anotace byl vyvinut nástroj zvaný WinCorr. V článku je popsáno první anotační schéma i jeho revize navržená za účelem získání přesnějšího popisu chyb, které se v textech vyskytují. Předmětem diskuse jsou zároveň základní principy, na nichž obě anotace staví.

Links
LC536, research and development project	Name: Centrum komputační lingvistiky
LC536, research and development project	Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
2C06009, research and development project	Name: Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce (Acronym: COT-SEWing)
2C06009, research and development project	Investor: Ministry of Education, Youth and Sports of the CR

PrintDisplayed: 18/10/2024 00:52

Classification of Errors in Text

Other applications