Detailed Information on Publication Record
2000
On Disambiguation in Czech Corpora
POPELÍNSKÝ, Lubomír, Tomáš PAVELEK and Tomáš PTÁČNÍKBasic information
Original name
On Disambiguation in Czech Corpora
Authors
POPELÍNSKÝ, Lubomír, Tomáš PAVELEK and Tomáš PTÁČNÍK
Edition
Brno (CZE), 012 pp. 2000
Publisher
FI MU
Other information
Language
English
Type of outcome
Výzkumná zpráva
Field of Study
20206 Computer hardware and architecture
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
RIV identification code
RIV/00216224:14330/00:00002818
Organization unit
Faculty of Informatics
Keywords in English
Lemma disambiguation; Corpus; Natural language processing; Machine learning
Změněno: 25/2/2001 17:39, doc. RNDr. Lubomír Popelínský, Ph.D.
Abstract
V originále
Lemma disambiguation means finding the basic word form, typically nominative singular for nouns or infinitive for verbs. We developed a multistrategy method for lemma disambiguation of unannotated text. The method is based on a combination of inductive logic programming and instance-based learning. We present results of the most important subtasks of lemma disambiguation for Czech language. Although no expert knowledge on Czech grammar has been used the accuracy reaches 90% with a fraction of words remaining ambiguous. We also display first results of tag disambiguation.
Links
VS97028, research and development project |
|