D 2019

New Online Proofreader for Czech

HLAVÁČKOVÁ, Dana, Barbora HRABALOVÁ, Jakub MACHURA, Markéta MASOPUSTOVÁ, Vojtěch MRKÝVKA et. al.

Basic information

Original name

New Online Proofreader for Czech

Authors

HLAVÁČKOVÁ, Dana (203 Czech Republic, guarantor, belonging to the institution), Barbora HRABALOVÁ (203 Czech Republic, belonging to the institution), Jakub MACHURA (203 Czech Republic, belonging to the institution), Markéta MASOPUSTOVÁ (203 Czech Republic, belonging to the institution), Vojtěch MRKÝVKA (203 Czech Republic, belonging to the institution), Marie VALÍČKOVÁ (203 Czech Republic, belonging to the institution) and Hana ŽIŽKOVÁ (203 Czech Republic, belonging to the institution)

Edition

Brno, Horák, Aleš; Rychlý, Pavel; Rambousek, Adam (eds.): Slavonic Natural Language Processing in the 21st Century, p. 79-92, 14 pp. 2019

Publisher

Tribun EU

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

60203 Linguistics

Country of publisher

Czech Republic

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

RIV identification code

RIV/00216224:14210/19:00108376

Organization unit

Faculty of Arts

ISBN

978-80-263-1545-2

Keywords in English

checker; grammar; agreement; error; punctuation; mistake; tool; Czech
Změněno: 14/1/2021 22:47, prof. Mgr. Pavel Kosek, Ph.D.

Abstract

V originále

This paper focuses on the new web-based language checker – tool currently developed to help users to produce Czech texts with correct grammar and spelling. The paper describes the inner workings of the existing prototype, its modular structure and aims at its upcoming language-checking compounds. Currently, the punctuational module is able to insert nearly two-thirds of all commas into particular types of texts. If a sentence contains a connector, it is easier to find a position where the comma should be placed. However, to detect a boundary between two clauses without the presence of the connector or two members of the complex multiple element represents a harder task. Another module aims at an agreement, especially at two types of agreement: agreeing pre-/post-nominal adjectives and at subject-predicate agreement. The paper also introduces a module dealing with the selected small mistakes in the Czech language. Whereas all modules and the tool itself as well need quality testing data, building of an annotated database of authentic sentences and errors is described as well.

Links

MUNI/A/1061/2018, interní kód MU
Name: Čeština v jednotě synchronie a diachronie - 2019
Investor: Masaryk University, Category A
TL02000146, research and development project
Name: Webový pravopisný, gramatický a typografický korektor pro český jazyk
Investor: Technology Agency of the Czech Republic