DSL Shared task 2016: Perfect Is The Enemy of Good Language
Discrimination Through Expectation-Maximization and Chunk-based
Language Model

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA and Pavel RYCHLÝ. DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model. Online. In Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). Osaka: Association for Natural Language Processing (ANLP), Osaka, Japan, 2016, p. 114-118. ISBN 978-4-87974-716-7.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model
Authors	HERMAN, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Vít SUCHOMEL (203 Czech Republic, belonging to the institution), Vít BAISA (203 Czech Republic, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution).
Edition	Osaka, Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), p. 114-118, 5 pp. 2016.
Publisher	Association for Natural Language Processing (ANLP), Osaka, Japan

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	electronic version available online
WWW	URL
RIV identification code	RIV/00216224:14330/16:00092557
Organization unit	Faculty of Informatics
ISBN	978-4-87974-716-7
Keywords in English	language discrimination;expectation maximization;language model
Tags	best
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Vít Suchomel, Ph.D., učo 139723. Changed: 1/11/2017 12:13.

Abstract
In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word\|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.

Links
MUNI/A/0945/2015, interní kód MU	Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V.
MUNI/A/0945/2015, interní kód MU	Investor: Masaryk University, Category A
7F14047, research and development project	Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)
7F14047, research and development project	Investor: Ministry of Education, Youth and Sports of the CR

PrintDisplayed: 24/8/2024 14:10

DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through ...

Other applications