POPELÍNSKÝ, Lubomír, Tomáš PAVELEK and Tomáš PTÁČNÍK. On Disambiguation in Czech Corpora. Brno (CZE): FI MU. 012 pp. 2000.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name On Disambiguation in Czech Corpora
Authors POPELÍNSKÝ, Lubomír, Tomáš PAVELEK and Tomáš PTÁČNÍK.
Edition Brno (CZE), 012 pp. 2000.
Publisher FI MU
Other information
Original language English
Type of outcome Research report
Field of Study 20206 Computer hardware and architecture
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
RIV identification code RIV/00216224:14330/00:00002818
Organization unit Faculty of Informatics
Keywords in English Lemma disambiguation; Corpus; Natural language processing; Machine learning
Tags corpus, Lemma disambiguation, machine learning, natural language processing
Changed by Changed by: doc. RNDr. Lubomír Popelínský, Ph.D., učo 1945. Changed: 25/2/2001 17:39.
Abstract
Lemma disambiguation means finding the basic word form, typically nominative singular for nouns or infinitive for verbs. We developed a multistrategy method for lemma disambiguation of unannotated text. The method is based on a combination of inductive logic programming and instance-based learning. We present results of the most important subtasks of lemma disambiguation for Czech language. Although no expert knowledge on Czech grammar has been used the accuracy reaches 90% with a fraction of words remaining ambiguous. We also display first results of tag disambiguation.
Links
VS97028, research and development projectName: Laboratoř zpracování přirozeného jazyka (s aplikacemi pro podporu výuky zrakově postižených)
Investor: Ministry of Education, Youth and Sports of the CR, Natural Language Processing Laboratory (with applications supporting education of people with limited sight)
PrintDisplayed: 29/3/2024 10:26