DSL Shared task 2016: Perfect Is The Enemy of Good Language
Discrimination Through Expectation-Maximization and Chunk-based
Language Model

D 2016

DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA a Pavel RYCHLÝ

Základní údaje

Originální název

DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

Autoři

HERMAN, Ondřej (203 Česká republika, garant, domácí), Vít SUCHOMEL (203 Česká republika, domácí), Vít BAISA (203 Česká republika, domácí) a Pavel RYCHLÝ (203 Česká republika, domácí)

Vydání

Osaka, Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), od s. 114-118, 5 s. 2016

Nakladatel

Association for Natural Language Processing (ANLP), Osaka, Japan

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10201 Computer sciences, information science, bioinformatics

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

elektronická verze "online"

Odkazy

URL

Kód RIV

RIV/00216224:14330/16:00092557

Organizační jednotka

Fakulta informatiky

ISBN

978-4-87974-716-7

Klíčová slova anglicky

language discrimination;expectation maximization;language model

Štítky

best

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 1. 11. 2017 12:13, RNDr. Vít Suchomel, Ph.D.

Anotace

V originále

In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.

Návaznosti

MUNI/A/0945/2015, interní kód MU

Název: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V.

Investor: Masarykova univerzita, Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V., DO R. 2020_Kategorie A - Specifický výzkum - Studentské výzkumné projekty

7F14047, projekt VaV

Název: Harvesting big text data for under-resourced languages (Akronym: HaBiT)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Harvesting big text data for under-resourced languages

Citovat

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA a Pavel RYCHLÝ. DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model. Online. In Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi. Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3). Osaka: Association for Natural Language Processing (ANLP), Osaka, Japan, 2016, s. 114-118. ISBN 978-4-87974-716-7.

@inproceedings{1366107,
   author = {Herman, Ondřej and Suchomel, Vít and Baisa, Vít and Rychlý, Pavel},
   address = {Osaka},
   booktitle = {Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)},
   editor = {Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi},
   keywords = {language discrimination;expectation maximization;language model},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Osaka},
   isbn = {978-4-87974-716-7},
   pages = {114-118},
   publisher = {Association for Natural Language Processing (ANLP), Osaka, Japan},
   title = {DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model},
   url = {https://aclanthology.info/pdf/W/W16/W16-4815.pdf},
   year = {2016}
}

TY  - JOUR
ID  - 1366107
AU  - Herman, Ondřej - Suchomel, Vít - Baisa, Vít - Rychlý, Pavel
PY  - 2016
TI  - DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model
PB  - Association for Natural Language Processing (ANLP), Osaka, Japan
CY  - Osaka
SN  - 9784879747167
KW  - language discrimination;expectation maximization;language model
UR  - https://aclanthology.info/pdf/W/W16/W16-4815.pdf
N2  - In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.
ER  -

HERMAN, Ondřej, Vít SUCHOMEL, Vít BAISA a Pavel RYCHLÝ. DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model. Online. In Preslav Nakov, Marcos Zampieri, Liling Tan, Nikola Ljubeši\'c, Jörg Tiedemann, Shervin Malmasi. \textit{Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)}. Osaka: Association for Natural Language Processing (ANLP), Osaka, Japan, 2016, s.~114-118. ISBN~978-4-87974-716-7.

Podrobný výpis o publikaci