Distributed Corpus Search

RYCHLÝ, Pavel, Radoslav RÁBARA a Ondřej HERMAN. Distributed Corpus Search. Online. In Piotr Banski, Marc Kupietz, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Andreas Witt. 6th Workshop on the Challenges in the Management of Large Corpora. Miyazaki, Japan: European Language Resource Association, 2018, s. 10-13. ISBN 979-1-09-554614-6.

Další formáty: BibTeX LaTeX RIS

Základní údaje
Originální název	Distributed Corpus Search
Autoři	RYCHLÝ, Pavel (203 Česká republika, domácí), Radoslav RÁBARA (703 Slovensko, domácí) a Ondřej HERMAN (203 Česká republika, garant, domácí).
Vydání	Miyazaki, Japan, 6th Workshop on the Challenges in the Management of Large Corpora, od s. 10-13, 4 s. 2018.
Nakladatel	European Language Resource Association

Další údaje
Originální jazyk	angličtina
Typ výsledku	Stať ve sborníku
Obor	10200 1.2 Computer and information sciences
Stát vydavatele	Francie
Utajení	není předmětem státního či obchodního tajemství
Forma vydání	elektronická verze "online"
WWW	Sborník CMLC-6
Kód RIV	RIV/00216224:14330/18:00101008
Organizační jednotka	Fakulta informatiky
ISBN	979-1-09-554614-6
Klíčová slova anglicky	distributed corpus search
Příznaky	Mezinárodní význam, Recenzováno
Změnil	Změnil: doc. Mgr. Pavel Rychlý, Ph.D., učo 3692. Změněno: 23. 1. 2019 13:48.

Anotace

Available amount of linguistic data raises fast and so do the processing requIrements. The current trend is towards parallel and distributed systems, but corpus management systems have been slow to follow it. In this article, we describe the work in progress distributed corpus management system using a large cluster of commodity machines. The implementation is based on the Manatee corpus management system and written in the Go language. Currently, the implemented features are query evaluation, concordance building, concordance sorting and frequency distribution calculation. We evaluate the performance of the distributed system on a cluster of 65 commodity computers and compare it to the old implementation of Manatee. The performance increase for the distributed evaluation in the concordance creation task ranges from 2.4 to 69.2 compared to the old system, from 56 to 305 times for the concordance sorting task and from 27 to 614 for the frequency distribution calculation. The results show that the system scales very well.

Návaznosti
GA18-23891S, projekt VaV	Název: Hyperintensionální usuzování nad texty přirozeného jazyka
GA18-23891S, projekt VaV	Investor: Grantová agentura ČR, Hyperintensionální usuzování nad texty přirozeného jazyka
LM2015071, projekt VaV	Název: Jazyková výzkumná infrastruktura v České republice (Akronym: LINDAT-Clarin)
LM2015071, projekt VaV	Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum
MUNI/A/0854/2017, interní kód MU	Název: Rozsáhlé výpočetní systémy: modely, aplikace a verifikace VII.
MUNI/A/0854/2017, interní kód MU	Investor: Masarykova univerzita, Rozsáhlé výpočetní systémy: modely, aplikace a verifikace VII., DO R. 2020_Kategorie A - Specifický výzkum - Studentské výzkumné projekty

VytisknoutZobrazeno: 25. 4. 2024 00:58

Distributed Corpus Search

Další aplikace