KOVÁŘ, Vojtěch and Pavel RYCHLÝ. DMoG : A Data-Based Morphological Guesser. In Horák, Aleš; Rychlý, Pavel; Rambousek, Adam. Recent Advances in Slavonic Natural Language Processing (RASLAN 2021). Brno: Tribun EU, 2021, p. 135-138. ISBN 978-80-263-1670-1.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name DMoG : A Data-Based Morphological Guesser
Authors KOVÁŘ, Vojtěch (203 Czech Republic, guarantor, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution).
Edition Brno, Recent Advances in Slavonic Natural Language Processing (RASLAN 2021), p. 135-138, 4 pp. 2021.
Publisher Tribun EU
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10200 1.2 Computer and information sciences
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW Full text PDF Domovská stránka workshopu
RIV identification code RIV/00216224:14330/21:00123251
Organization unit Faculty of Informatics
ISBN 978-80-263-1670-1
ISSN 2336-4289
Keywords in English Lemmatization; Morphological guesser; Morphological analysis; Morphological guessing
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 15/5/2024 10:10.
Abstract
We present a novel corpus-based approach to lemmatization of unknown words. The tool learns affix patterns from annotated data, and based on these patterns, it predicts other word forms that should be present in the corpus. A lemma candidate then comes from the pattern whose predictions are really found in the corpus. We present a prototype implementation and an initial evaluation on Czech, which shows promising results.
Links
LM2018101, research and development projectName: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy (Acronym: LINDAT/CLARIAH-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 30/8/2024 16:24