Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1420812, author = {Rychlý, Pavel and Rábara, Radoslav and Herman, Ondřej}, address = {Miyazaki, Japan}, booktitle = {6th Workshop on the Challenges in the Management of Large Corpora}, editor = {Piotr Banski, Marc Kupietz, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Andreas Witt}, keywords = {distributed corpus search}, howpublished = {elektronická verze "online"}, language = {eng}, location = {Miyazaki, Japan}, isbn = {979-1-09-554614-6}, pages = {10-13}, publisher = {European Language Resource Association}, title = {Distributed Corpus Search}, url = {http://lrec-conf.org/workshops/lrec2018/W17/pdf/book_of_proceedings.pdf}, year = {2018} }
TY - JOUR ID - 1420812 AU - Rychlý, Pavel - Rábara, Radoslav - Herman, Ondřej PY - 2018 TI - Distributed Corpus Search PB - European Language Resource Association CY - Miyazaki, Japan SN - 9791095546146 KW - distributed corpus search UR - http://lrec-conf.org/workshops/lrec2018/W17/pdf/book_of_proceedings.pdf L2 - http://lrec-conf.org/workshops/lrec2018/W17/pdf/book_of_proceedings.pdf N2 - Available amount of linguistic data raises fast and so do the processing requIrements. The current trend is towards parallel and distributed systems, but corpus management systems have been slow to follow it. In this article, we describe the work in progress distributed corpus management system using a large cluster of commodity machines. The implementation is based on the Manatee corpus management system and written in the Go language. Currently, the implemented features are query evaluation, concordance building, concordance sorting and frequency distribution calculation. We evaluate the performance of the distributed system on a cluster of 65 commodity computers and compare it to the old implementation of Manatee. The performance increase for the distributed evaluation in the concordance creation task ranges from 2.4 to 69.2 compared to the old system, from 56 to 305 times for the concordance sorting task and from 27 to 614 for the frequency distribution calculation. The results show that the system scales very well. ER -
RYCHLÝ, Pavel, Radoslav RÁBARA a Ondřej HERMAN. Distributed Corpus Search. Online. In Piotr Banski, Marc Kupietz, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Andreas Witt. \textit{6th Workshop on the Challenges in the Management of Large Corpora}. Miyazaki, Japan: European Language Resource Association, 2018, s.~10-13. ISBN~979-1-09-554614-6.
|