Masarykova univerzita

Výpis publikací

česky | in English

Filtrování publikací

    2024

    1. KOVAŘÍK, František, Miloš JAKUBÍČEK, Vít SUCHOMEL, Michal CUKR a Jan KRAUS. Overview of Latin American and Iberian corpora in Sketch Engine. Santiago de Compostela, 2024, 6 s. 5th OpenCor: Latin American and Iberian Languages Open Corpora Forum.

    2023

    1. BLAHUŠ, Marek, Miloš JAKUBÍČEK, Michal CUKR, Vojtěch KOVÁŘ a Vít SUCHOMEL. Development of Evidence-Based Grammars for Terminology Extraction in OneClick Terms. Online. In Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček, Simon Krek. Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2023, s. 650-662. ISSN 2533-5626.
    2. BLAHUŠ, Marek, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Jan KRAUS, Marek MEDVEĎ, Vlasta OHLÍDALOVÁ a Vít SUCHOMEL. Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data. Online. In Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček, Simon Krek. Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2023, s. 613-637. ISSN 2533-5626.
    3. SUCHOMEL, Vít, Miloš JAKUBÍČEK a Ondřej MATUŠKA. Web corpora for under-resourced languages. Online. In Corpus Linguistics (CL2023), 2023. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2023. ISSN 2533-5626.

    2022

    1. SUCHOMEL, Vít a Jan KRAUS. Semi-Manual Annotation of Topics and Genres in Web Corpora : The Cheap and Fast Way. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022. Brno: Tribun EU, 2022, s. 141-148. ISBN 978-80-263-1752-4.

    2021

    1. SUCHOMEL, Vít. Genre Annotation of Web Corpora: Scheme and Issues. In Kohei Arai, Supriya Kapoor, Rahul Bhatia. Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1. Vancouver, Canada: Springer Nature Switzerland AG, 2021, s. 738-754. ISBN 978-3-030-63127-7. Dostupné z: https://dx.doi.org/10.1007/978-3-030-63128-4_55.
    2. SUCHOMEL, Vít a Jan KRAUS. Website Properties in Relation to the Quality of Text Extracted for Web Corpora. In Horák, Rychlý, Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2021). Brno: Tribun EU, 2021, s. 167-175. ISBN 978-80-263-1670-1.

    2020

    1. JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ, Pavel RYCHLÝ a Vít SUCHOMEL. Current Challenges in Web Corpus Building. Online. In Adrien Barbaresi, Felix Bildhauer, Roland Schafer and Egon Stemle. Proceedings of the 12th Web as Corpus Workshop. Marseille, France: European Language Resources Association, 2020, s. 1-4. ISBN 979-10-95546-68-9.
    2. SUCHOMEL, Vít. Removing Spam from Web Corpora Through Supervised Learning and Semi-manual Classification of Web Sites. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun 2020, 2020, s. 113-123. ISBN 978-80-263-1600-8.

    2019

    1. RAMBOUSEK, Adam, Aleš HORÁK, Vít BAISA a Vít SUCHOMEL. A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus. International Journal on Artificial Intelligence Tools. USA: World Scientific Publishing, 2019, roč. 28, č. 2, s. 1-21. ISSN 0218-2130. Dostupné z: https://dx.doi.org/10.1142/S0218213019500088.
    2. BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Marek MEDVEĎ, Michal MĚCHURA, Pavel RYCHLÝ a Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, s. 805-818. ISSN 2533-5626.
    3. SUCHOMEL, Vít. Discriminating Between Similar Languages Using Large Web Corpora. In Horák, Aleš and Rychlý, Pavel and Rambousek, Adam. Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2019. Brno: Tribun EU, 2019, s. 129-135. ISBN 978-80-263-1530-8.
    4. KOPPEL, Kristina, Jelena KALLAS, Maria KHOKHLOVÁ, Vít SUCHOMEL, Vít BAISA a Jan MICHELFEIT. SkELL Corpora as a Part of the Language Portal Sonaveeb: Problems and Perspectives. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, s. 763-782. ISSN 2533-5626.

    2018

    1. SUCHOMEL, Vít. csTenTen17, a Recent Czech Web Corpus. In Aleš Horák, Pavel Rychlý and Adam Rambousek. Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018. Brno: Tribun EU, 2018, s. 111-123. ISBN 978-80-263-1517-9.

    2017

    1. KALLAS, Jelena, Vít SUCHOMEL a Maria KHOKHLOVA. Automated Identification of Domain Preferences of Collocations. Online. In Iztok Kosem et al. Electronic Lexicography in the 21st Century. Proceedings of Elex 2017 Conference. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2017, s. 309-320. ISSN 2533-5626.
    2. HaBiT system (software)
      PALA, Karel, Aleš HORÁK, Pavel RYCHLÝ, Vít SUCHOMEL, Vít BAISA, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Zuzana NEVĚŘILOVÁ, Adam RAMBOUSEK, Björn GAMBÄCK, Utpal SIKDAR a Lars BUNGUM. HaBiT system. 2017.
    3. SUCHOMEL, Vít. Removing spam from web corpora through supervised learning using FastText. Birmingham, 2017.

    2016

    1. RYCHLÝ, Pavel a Vít SUCHOMEL. Annotated Amharic Corpora. In Petr Sojka, Aleš Horák, Ivan Kopeček, Karel Pala. Text, Speech, and Dialogue 19th International Conference, TSD 2016 Brno, Czech Republic, September 12–16, 2016 Proceedings. Switzerland: Springer International Publishing, 2016, s. 295-302. ISBN 978-3-319-45509-9. Dostupné z: https://dx.doi.org/10.1007/978-3-319-45510-5_34.
    2. HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA a Pavel RYCHLÝ. DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model. Online. In Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). Osaka: Association for Natural Language Processing (ANLP), Osaka, Japan, 2016, s. 114-118. ISBN 978-4-87974-716-7.
    3. SUCHOMEL, Vít a Pavel RYCHLÝ. Set of Ethiopian Web Corpora. 2016.
    4. FIŠER, Darja, Vít SUCHOMEL a Miloš JAKUBÍČEK. Terminology Extraction for Academic Slovene Using Sketch Engine. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016. Brno: Tribun EU, 2016, s. 135-141. ISBN 978-80-263-1095-2.

    2015

    1. BAISA, Vít a Vít SUCHOMEL. Corpus Based Extraction of Hypernyms in Terminological Thesaurus for Land Surveying Domain. In Ninth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2015, s. 69-74. ISBN 978-80-263-0974-1.
    2. BAISA, Vít, Vít SUCHOMEL, adam KILGARRIFF a Miloš JAKUBÍČEK. Sketch Engine for English Language Learning. In Corpus Linguistics 2015. 2015.
    3. RAMBOUSEK, Adam, Vít BAISA, Vít SUCHOMEL, Aleš HORÁK a Lucia KOCINCOVÁ. Terminologický tezaurus pro obor zeměměřictví a katastru nemovitostí: Certifikovaná metodika. 2015.
    4. BAISA, Vít a Vít SUCHOMEL. Turkic Language Support in Sketch Engine. In Proceedings of the international conference "Turkic Languages processing: TurkLang 2015". Kazan: Academy of Sciences of the Republic of Tatarstan Press, 2015, s. 214-223. ISBN 978-5-9690-0262-3.

    2014

    1. ARTS, Tressy, Yonatan BELINKOV, Nizar HABASH, Adam KILGARRIFF a Vít SUCHOMEL. arTenTen: Arabic Corpus and Word Sketches. Journal of King Saud University-Computer and Information Sciences. Elsevier, 2014, roč. 2014, č. 26, s. 381-395. ISSN 1319-1578. Dostupné z: https://dx.doi.org/10.1016/j.jksuci.2014.06.009.
    2. KILGARRIFF, Adam, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Pavel RYCHLÝ a Vít SUCHOMEL. Finding Terms in Corpora for Many Languages with the Sketch Engine. Online. In Proceedings of the Demonstrations at the 14th Conferencethe European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: The Association for Computational Linguistics, 2014, s. 53-56. ISBN 978-1-937284-75-6.
    3. BOJAR, Ondřej, Vojtěch DIATKA, Pavel RYCHLÝ, Pavel STRAŇÁK, Vít SUCHOMEL, Aleš TAMCHYNA a Daniel ZEMAN. HindEnCorp – Hindi-English and Hindi-only Corpus for Machine Translation. Online. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland: European Language Resources Association (ELRA), 2014, s. 3550-3555. ISBN 978-2-9517408-8-4.
    4. NEVĚŘILOVÁ, Zuzana a Vít SUCHOMEL. Intelligent Search and Replace for Czech Phrases. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, s. 97-105. ISSN 2336-4289.
    5. HORÁK, Aleš, Adam RAMBOUSEK, Vít SUCHOMEL a Lucia KOCINCOVÁ. Semiautomatic Building and Extension of Terminological Thesaurus for Land Surveying Domain. In Aleš Horák, Pavel Rychlý. Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, s. 129-137. ISSN 2336-4289.
    6. BAISA, Vít a Vít SUCHOMEL. SkELL: Web Interface for English Language Learning. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, s. 63-70. ISSN 2336-4289.
    7. SUCHOMEL, Vít, Jan MICHELFEIT a Jan POMIKÁLEK. Text Tokenisation Using unitok. In Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2014, s. 71-75. ISSN 2336-4289.
    8. KILGARRIFF, Adam, Vít BAISA, Jan BUŠTA, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Jan MICHELFEIT, Pavel RYCHLÝ a Vít SUCHOMEL. The Sketch Engine: ten years on. Lexicography. Springer Berlin Heidelberg, 2014, roč. 1, č. 1, s. 7-36. ISSN 2197-4292. Dostupné z: https://dx.doi.org/10.1007/s40607-014-0009-9.

    2013

    1. SRDANOVIĆ, Irena, Vít SUCHOMEL, Adam KILGARRIFF a Toshinobu OGISO. 百億語のコーパスを用いた日本語の語彙・文法情報のプロファイリング. Online. 2013, s. 229-238.
    2. BELINKOV, Yonatan, Nizar HABASH, Adam KILGARRIFF, Noam ORDAN, Ryan ROTH a Vít SUCHOMEL. arTenTen: a new, vast corpus for Arabic. Online. In Eric Atwell and Andrew Hardie. Proceedings of WACL’2 Second Workshop on Arabic Corpus Linguistics. 2013, s. 20.
    3. BAISA, Vít a Vít SUCHOMEL. Intrinsic Methods for Comparison of Corpora. In A. Horák, P. Rychlý. RASLAN 2013 Recent Advances in Slavonic Natural Language Processing. první. Brno: Tribun EU, 2013, s. 51-58. ISBN 978-80-263-0520-0.
    4. JAKUBÍČEK, Miloš, Adam KILGARRIFF, Vojtěch KOVÁŘ, Pavel RYCHLÝ a Vít SUCHOMEL. The TenTen Corpus Family. Online. In 7th International Corpus Linguistics Conference CL 2013. Lancaster, 2013, s. 125-127.
    5. KILGARRIFF, Adam a Vít SUCHOMEL. Web Spam. Online. In Stefan Evert , Egon Stemle, Paul Rayson. Proceedings of the 8th Web as Corpus Workshop (WAC-8) @Corpus Linguistics 2013. 2013, s. 46-52.

    2012

    1. BAISA, Vít a Vít SUCHOMEL. Detecting Spam in Web Corpora. In Aleš Horák, Pavel Rychlý. 6th Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2012, s. 69-76. ISBN 978-80-263-0313-8.
    2. SUCHOMEL, Vít a Jan POMIKÁLEK. Efficient Web Crawling for Large Text Corpora. Online. In Adam Kilgarriff, Serge Sharoff. Proceedings of the seventh Web as Corpus Workshop (WAC7). Lyon, 2012, s. 39-43.
    3. BAISA, Vít a Vít SUCHOMEL. Large Corpora for Turkic Languages and Unsupervised Morphological Analysis. Online. In Seniz Demir, Ilknur Durgar El-Kahlout, Mehmet Ugur Dogan. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey: European Language Resources Association (ELRA), 2012, s. 28-32. ISBN 978-2-9517408-7-7.
    4. DOVUDOV, Gulshan, Vít SUCHOMEL a Pavel ŠMERK. POS Annotated 50M Corpus of Tajik Language. Online. In Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL 8/AfLaT 2012). Istanbul: European Language Resources Association (ELRA), 2012, s. 93-98. ISBN 978-2-9517408-7-7.
    5. SUCHOMEL, Vít. Recent Czech Web Corpora. In Aleš Horák, Pavel Rychlý. 6th Workshop on Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2012, s. 77-83. ISBN 978-80-263-0313-8.
    6. SpiderLing (software)
      SUCHOMEL, Vít. SpiderLing. 2012.
    7. DOVUDOV, Gulshan, Vít SUCHOMEL a Pavel ŠMERK. Towards 100M Morphologically Annotated Corpus of Tajik. In Aleš Horák, Pavel Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2012. Brno: Tribun EU, 2012, s. 91-94. ISBN 978-80-263-0313-8.

    2011

    1. DOVUDOV, Gulshan, Jan POMIKÁLEK, Vít SUCHOMEL a Pavel ŠMERK. Building a 50M Corpus of Tajik Language. In Aleš Horák, Pavel Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2011. Brno: Tribun EU, 2011, s. 89-95. ISBN 978-80-263-0077-9.
    2. Chared (software)
      POMIKÁLEK, Jan a Vít SUCHOMEL. Chared. 2011.
    3. POMIKÁLEK, Jan a Vít SUCHOMEL. chared: Character Encoding Detection with a Known Language. In Aleš Horák, Pavel Rychlý. RASLAN 2011. 5. vyd. Brno, Czech Republic: Tribun EU, 2011, s. 125-129. ISBN 978-80-263-0077-9.
    4. SUCHOMEL, Vít a Jan POMIKÁLEK. Practical Web Crawling for Text Corpora. In A. Horák, P. Rychlý. Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2011. Brno: Tribun EU, 2011, s. 97-108. ISBN 978-80-263-0077-9.
Zobrazit podrobně
Zobrazeno: 24. 4. 2024 23:06