An efficient algorithm for building a distributional thesaurus

RYCHLÝ, Pavel a Adam KILGARRIFF. An efficient algorithm for building a distributional thesaurus. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Prague, Czech Republic: Association for Computational Linguistics, 2007, s. 41-44. ISBN 978-1-932432-86-2.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	An efficient algorithm for building a distributional thesaurus
Název česky	Efektivní algoritmu pro vytváření distribučního thesauru
Autoři	RYCHLÝ, Pavel (203 Česká republika, garant) a Adam KILGARRIFF (826 Velká Británie a Severní Irsko).
Vydání	Prague, Czech Republic, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, od s. 41-44, 4 s. 2007.
Nakladatel	Association for Computational Linguistics

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10201 Computer sciences, information science, bioinformatics
Stát vydavatele	Česká republika
Utajení	není předmětem státního či obchodního tajemství
WWW	URL
Kód RIV	RIV/00216224:14330/07:00019564
Organizační jednotka	Fakulta informatiky
ISBN	978-1-932432-86-2
Klíčová slova anglicky	text corpus; distributional thesaurus
Štítky	distributional thesaurus, text corpus
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: doc. Mgr. Pavel Rychlý, Ph.D., učo 3692. Změněno: 11. 2. 2008 12:12.

Anotace

Gorman and Curran (2006) argue that thesaurus generation for billion+-word corpora is problematic as the full computation takes many days. We present an algorithm with which the computation takes under two hours. We have created, and made publicly available, thesauruses based on large corpora for (at time of writing) seven major world languages. The development is implemented in the Sketch Engine.

Anotace česky
Gorman and Curran (2006) diskutují, že vytvoření distributivního thesauru z korpusu o velikosti větěší než miliarda slov je problematické, protože úplný výpočet může trvat mnoho dní. My prezentujeme algoritmus, který zvládne výpočet do dvou hodin.

Návaznosti
LC536, projekt VaV	Název: Centrum komputační lingvistiky
LC536, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Centrum komputační lingvistiky
1ET100300419, projekt VaV	Název: Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu
1ET100300419, projekt VaV	Investor: Akademie věd ČR, Inteligentní modely, algoritmy, metody a nástroje pro vytváření sémantického webu
2C06009, projekt VaV	Název: Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce (Akronym: COT-SEWing)
2C06009, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce

VytisknoutZobrazeno: 19. 9. 2024 14:58

An efficient algorithm for building a distributional thesaurus

Další aplikace