RYGL, Jan, Petr SOJKA, Michal RŮŽIČKA a Radim ŘEHŮŘEK. ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016. Brno: Tribun EU, 2016, s. 79-87. ISBN 978-80-263-1095-2. |
Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{1361540, author = {Rygl, Jan and Sojka, Petr and Růžička, Michal and Řehůřek, Radim}, address = {Brno}, booktitle = {Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016}, editor = {Aleš Horák, Pavel Rychlý, Adam Rambousek}, keywords = {ScaleText; vector space modelling; Latent Semantic Indexing; LSI; machine learning; scalable search; search system design; text mining}, howpublished = {tištěná verze "print"}, language = {eng}, location = {Brno}, isbn = {978-80-263-1095-2}, pages = {79-87}, publisher = {Tribun EU}, title = {ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text}, url = {http://raslan2016.nlp-consulting.net/}, year = {2016} }
TY - JOUR ID - 1361540 AU - Rygl, Jan - Sojka, Petr - Růžička, Michal - Řehůřek, Radim PY - 2016 TI - ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text PB - Tribun EU CY - Brno SN - 9788026310952 KW - ScaleText KW - vector space modelling KW - Latent Semantic Indexing KW - LSI KW - machine learning KW - scalable search KW - search system design KW - text mining UR - http://raslan2016.nlp-consulting.net/ L2 - http://www.fi.muni.cz/usr/sojka/papers/rygl-sojka-ruzicka-rehurek-raslan2016.pdf N2 - This paper describes the design of a new ScaleText system aimed at scalable semantic indexing of heterogeneous textual corpora. We discuss the design decisions that lead to a modular system architecture for indexing and searching using semantic vectors of document segments – nuggets of wisdom. The prototype system implementation is evaluated by applying Latent Semantic Indexing (LSI) on the Enron corpus. And the Bpref measure is used to automate comparing the performance of different algorithms and system configurations. ER -
RYGL, Jan, Petr SOJKA, Michal RŮŽIČKA a Radim ŘEHŮŘEK. ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text. In Aleš Horák, Pavel Rychlý, Adam Rambousek. \textit{Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016}. Brno: Tribun EU, 2016, s.~79-87. ISBN~978-80-263-1095-2.
|