SUCHOMEL, Vít. csTenTen17, a Recent Czech Web Corpus. In Aleš Horák, Pavel Rychlý and Adam Rambousek. Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018. Brno: Tribun EU, 2018, s. 111-123. ISBN 978-80-263-1517-9. |
Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1483790, author = {Suchomel, Vít}, address = {Brno}, booktitle = {Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018}, editor = {Aleš Horák, Pavel Rychlý and Adam Rambousek}, keywords = {Czech corpus; web corpus; text processing}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Brno}, isbn = {978-80-263-1517-9}, pages = {111-123}, publisher = {Tribun EU}, title = {csTenTen17, a Recent Czech Web Corpus}, url = {https://nlp.fi.muni.cz/raslan/2018/paper10-Suchomel.pdf}, year = {2018} }
TY - JOUR ID - 1483790 AU - Suchomel, Vít PY - 2018 TI - csTenTen17, a Recent Czech Web Corpus PB - Tribun EU CY - Brno SN - 9788026315179 KW - Czech corpus KW - web corpus KW - text processing UR - https://nlp.fi.muni.cz/raslan/2018/paper10-Suchomel.pdf N2 - This article introduces a very large Czech text corpus for language research – csTenTen17 compiled from texts downloaded in 2015, 2016 and 2017. The corpus is consisting of 10.5 billion words reaching double the size of its predecessor from 2012. A brief comparison with other recent Czech corpora follows. ER -
SUCHOMEL, Vít. csTenTen17, a Recent Czech Web Corpus. In Aleš Horák, Pavel Rychlý and Adam Rambousek. \textit{Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018}. Brno: Tribun EU, 2018, s.~111-123. ISBN~978-80-263-1517-9.
|