KOPPEL, Kristina, Jelena KALLAS, Maria KHOKHLOVÁ, Vít SUCHOMEL, Vít BAISA and Jan MICHELFEIT. SkELL Corpora as a Part of the Language Portal Sonaveeb: Problems and Perspectives. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, p. 763-782. ISSN 2533-5626.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name SkELL Corpora as a Part of the Language Portal Sonaveeb: Problems and Perspectives
Authors KOPPEL, Kristina (233 Estonia), Jelena KALLAS (233 Estonia), Maria KHOKHLOVÁ (643 Russian Federation), Vít SUCHOMEL (203 Czech Republic, guarantor, belonging to the institution), Vít BAISA (203 Czech Republic, belonging to the institution) and Jan MICHELFEIT (203 Czech Republic, belonging to the institution).
Edition Brno, Czech Republic, Proceedings of the 6th Biennial Conference on Electronic Lexicography, p. 763-782, 20 pp. 2019.
Publisher Lexical Computing CZ s.r.o.
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW Konferenční sborník
RIV identification code RIV/00216224:14330/19:00111209
Organization unit Faculty of Informatics
ISSN 2533-5626
Keywords in English GDEX; SkELL; learner corpus; Estonian; Russian
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 8/5/2020 09:25.
Abstract
The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sonaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sonaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.
Links
LM2015071, research and development projectName: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
Investor: Ministry of Education, Youth and Sports of the CR
PrintDisplayed: 23/7/2024 02:37