Detailed Information on Publication Record
2018
Distributed Corpus Search
RYCHLÝ, Pavel, Radoslav RÁBARA and Ondřej HERMANBasic information
Original name
Distributed Corpus Search
Authors
RYCHLÝ, Pavel (203 Czech Republic, belonging to the institution), Radoslav RÁBARA (703 Slovakia, belonging to the institution) and Ondřej HERMAN (203 Czech Republic, guarantor, belonging to the institution)
Edition
Miyazaki, Japan, 6th Workshop on the Challenges in the Management of Large Corpora, p. 10-13, 4 pp. 2018
Publisher
European Language Resource Association
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10200 1.2 Computer and information sciences
Country of publisher
France
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/18:00101008
Organization unit
Faculty of Informatics
ISBN
979-1-09-554614-6
Keywords in English
distributed corpus search
Tags
International impact, Reviewed
Změněno: 23/1/2019 13:48, doc. Mgr. Pavel Rychlý, Ph.D.
Abstract
V originále
Available amount of linguistic data raises fast and so do the processing requIrements. The current trend is towards parallel and distributed systems, but corpus management systems have been slow to follow it. In this article, we describe the work in progress distributed corpus management system using a large cluster of commodity machines. The implementation is based on the Manatee corpus management system and written in the Go language. Currently, the implemented features are query evaluation, concordance building, concordance sorting and frequency distribution calculation. We evaluate the performance of the distributed system on a cluster of 65 commodity computers and compare it to the old implementation of Manatee. The performance increase for the distributed evaluation in the concordance creation task ranges from 2.4 to 69.2 compared to the old system, from 56 to 305 times for the concordance sorting task and from 27 to 614 for the frequency distribution calculation. The results show that the system scales very well.
Links
GA18-23891S, research and development project |
| ||
LM2015071, research and development project |
| ||
MUNI/A/0854/2017, interní kód MU |
|