Discriminating Between Similar Languages Using Large Web
Corpora

D 2019

Discriminating Between Similar Languages Using Large Web Corpora

SUCHOMEL, Vít

Základní údaje

Originální název

Discriminating Between Similar Languages Using Large Web Corpora

Autoři

SUCHOMEL, Vít (203 Česká republika, garant, domácí)

Vydání

Brno, Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2019, od s. 129-135, 7 s. 2019

Nakladatel

Tribun EU

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10200 1.2 Computer and information sciences

Stát vydavatele

Česká republika

Utajení

není předmětem státního či obchodního tajemství

Forma vydání

tištěná verze "print"

Odkazy

URL

Kód RIV

RIV/00216224:14330/19:00111666

Organizační jednotka

Fakulta informatiky

ISBN

978-80-263-1530-8

ISSN

UT WoS

000604899800015

Klíčová slova anglicky

language identification; discriminating similar languages; building web corpora

Změněno: 16. 5. 2022 15:28, Mgr. Michal Petr

Anotace

V originále

This paper presents a method for discriminating similar lan-guages based on wordlists from large web corpora. The main benefits ofthe approach are language independency, a measure of confidence of theclassification and an easy-to-maintain implementation.The method is evaluated on VarDial 2014 workshop data set. The resultaccuracy is comparable to other methods successfully performing at theworkshop.A tool implementing the method in Python can be obtained from web sitehttp://corpus.tools/.

Návaznosti

LM2015071, projekt VaV

Název: Jazyková výzkumná infrastruktura v České republice (Akronym: LINDAT-Clarin)

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum

Podrobný výpis o publikaci