BLAHUŠ, Marek, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Jan KRAUS, Marek MEDVEĎ, Vlasta OHLÍDALOVÁ a Vít SUCHOMEL. Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data. Online. In Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček, Simon Krek. Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2023, s. 613-637. ISSN 2533-5626. |
Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{2303577, author = {Blahuš, Marek and Cukr, Michal and Herman, Ondřej and Jakubíček, Miloš and Kovář, Vojtěch and Kraus, Jan and Medveď, Marek and Ohlídalová, Vlasta and Suchomel, Vít}, address = {Brno, Czech Republic}, booktitle = {Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference}, editor = {Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček, Simon Krek}, keywords = {Ukrainian; post-editing; dictionary; lexicography}, howpublished = {elektronická verze "online"}, language = {eng}, location = {Brno, Czech Republic}, pages = {613-637}, publisher = {Lexical Computing CZ s.r.o.}, title = {Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data}, url = {https://elex.link/elex2023/wp-content/uploads/114.pdf}, year = {2023} }
TY - JOUR ID - 2303577 AU - Blahuš, Marek - Cukr, Michal - Herman, Ondřej - Jakubíček, Miloš - Kovář, Vojtěch - Kraus, Jan - Medveď, Marek - Ohlídalová, Vlasta - Suchomel, Vít PY - 2023 TI - Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data PB - Lexical Computing CZ s.r.o. CY - Brno, Czech Republic KW - Ukrainian KW - post-editing KW - dictionary KW - lexicography UR - https://elex.link/elex2023/wp-content/uploads/114.pdf N2 - This paper describes the development of a new corpus-based Ukrainian-English dictionary. The dictionary was built from scratch, we used no pre-existing dictionary data. A rapid dictionary development method was used which consists of generating dictionary parts directly from a large corpus, and of post-editing the automatically generated data by native speakers of Ukrainian (not professional lexicographers). The method builds on Baisa et al. (2019) which was improved and updated, and we used a diferent data management model. As the data source, a 3-billion-word Ukrainian web corpus from the TenTen series (Jakubíček et al., 2013) was used. The paper briefy describes the corpus, then we thoroughly explain the individual steps of the miQKiB+ ;2M2`iBQMěTQbi@2/BiBM; workfow, including the volume of the manual work needed for the particular phases in terms of person-days. We also present details about the newly created dictionary and discuss directions for its further development. ER -
BLAHUŠ, Marek, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Jan KRAUS, Marek MEDVEĎ, Vlasta OHLÍDALOVÁ a Vít SUCHOMEL. Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data. Online. In Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček, Simon Krek. \textit{Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference}. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2023, s.~613-637. ISSN~2533-5626.
|