DSL Shared task 2016: Perfect Is The Enemy of Good Language
Discrimination Through Expectation-Maximization and Chunk-based
Language Model

D 2016

DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA and Pavel RYCHLÝ

Basic information

Original name

DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

Authors

HERMAN, Ondřej (203 Czech Republic, guarantor, belonging to the institution), Vít SUCHOMEL (203 Czech Republic, belonging to the institution), Vít BAISA (203 Czech Republic, belonging to the institution) and Pavel RYCHLÝ (203 Czech Republic, belonging to the institution)

Edition

Osaka, Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), p. 114-118, 5 pp. 2016

Publisher

Association for Natural Language Processing (ANLP), Osaka, Japan

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Czech Republic

Confidentiality degree

is not subject to a state or trade secret

Publication form

electronic version available online

References:

URL

RIV identification code

RIV/00216224:14330/16:00092557

Organization unit

Faculty of Informatics

ISBN

978-4-87974-716-7

Keywords in English

language discrimination;expectation maximization;language model

Abstract

V originále

In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.

Links

MUNI/A/0945/2015, interní kód MU

Name: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V.

Investor: Masaryk University, Category A

7F14047, research and development project

Name: Harvesting big text data for under-resourced languages (Acronym: HaBiT)

Investor: Ministry of Education, Youth and Sports of the CR

Citovat

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA and Pavel RYCHLÝ. DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model. Online. In Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). Osaka: Association for Natural Language Processing (ANLP), Osaka, Japan, 2016, p. 114-118. ISBN 978-4-87974-716-7.

@inproceedings{1366107,
   author = {Herman, Ondřej and Suchomel, Vít and Baisa, Vít and Rychlý, Pavel},
   address = {Osaka},
   booktitle = {Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)},
   editor = {Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi},
   keywords = {language discrimination;expectation maximization;language model},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Osaka},
   isbn = {978-4-87974-716-7},
   pages = {114-118},
   publisher = {Association for Natural Language Processing (ANLP), Osaka, Japan},
   title = {DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model},
   url = {https://aclanthology.info/pdf/W/W16/W16-4815.pdf},
   year = {2016}
}

TY  - CONF
ID  - 1366107
AU  - Herman, Ondřej - Suchomel, Vít - Baisa, Vít - Rychlý, Pavel
PY  - 2016
TI  - DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model
PB  - Association for Natural Language Processing (ANLP), Osaka, Japan
CY  - Osaka
SN  - 9784879747167
KW  - language discrimination;expectation maximization;language model
UR  - https://aclanthology.info/pdf/W/W16/W16-4815.pdf
N2  - In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.
ER  -

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA and Pavel RYCHLÝ. DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model. Online. In Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubeši\'c, Jörg Tiedemann, Shervin Malmasi. \textit{Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)}. Osaka: Association for Natural Language Processing (ANLP), Osaka, Japan, 2016, p.~114-118. ISBN~978-4-87974-716-7.

Přehled o publikaci