Discriminating Between Similar Languages Using Large Web
Corpora

SUCHOMEL, Vít. Discriminating Between Similar Languages Using Large Web Corpora. In Horák, Aleš and Rychlý, Pavel and Rambousek, Adam. Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2019. Brno: Tribun EU, 2019, p. 129-135. ISBN 978-80-263-1530-8.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Discriminating Between Similar Languages Using Large Web Corpora
Authors	SUCHOMEL, Vít (203 Czech Republic, guarantor, belonging to the institution).
Edition	Brno, Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2019, p. 129-135, 7 pp. 2019.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10200 1.2 Computer and information sciences
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	URL
RIV identification code	RIV/00216224:14330/19:00111666
Organization unit	Faculty of Informatics
ISBN	978-80-263-1530-8
ISSN	2336-4289
UT WoS	000604899800015
Keywords in English	language identification; discriminating similar languages; building web corpora
Changed by	Changed by: Mgr. Michal Petr, učo 65024. Changed: 16/5/2022 15:28.

Abstract

This paper presents a method for discriminating similar lan-guages based on wordlists from large web corpora. The main benefits ofthe approach are language independency, a measure of confidence of theclassification and an easy-to-maintain implementation.The method is evaluated on VarDial 2014 workshop data set. The resultaccuracy is comparable to other methods successfully performing at theworkshop.A tool implementing the method in Python can be obtained from web sitehttp://corpus.tools/.

Links
LM2015071, research and development project	Name: Jazyková výzkumná infrastruktura v České republice (Acronym: LINDAT-Clarin)
LM2015071, research and development project	Investor: Ministry of Education, Youth and Sports of the CR

PrintDisplayed: 30/4/2024 10:57

Discriminating Between Similar Languages Using Large Web Corpora

Other applications