Information System of Masaryk University 

ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity ...

česky | in English

RYGL, Jan, Petr SOJKA, Michal RŮŽIČKA and Radim ŘEHŮŘEK. ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016. Brno: Tribun EU, 2016. p. 79-87, 9 pp. ISBN 978-80-263-1095-2.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text
Authors RYGL, Jan (203 Czech Republic), Petr SOJKA (203 Czech Republic, guarantor, belonging to the institution), Michal RŮŽIČKA (203 Czech Republic, belonging to the institution) and Radim ŘEHŮŘEK (203 Czech Republic).
Edition Brno, Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, p. 79-87, 9 pp. 2016.
Publisher Tribun EU
Other information
Original language English
Type of outcome article in proceedings
Field of Study Informatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW Domovská stránka workshopu preprint
RIV identification code RIV/00216224:14330/16:00087632
Organization unit Faculty of Informatics
ISBN 978-80-263-1095-2
ISSN 2336-4289
Keywords (in Czech) ScaleText; modelování vektorovým prostorem; latentní sémantické indexování; LSI; strojové učení; škálovatelné vyhledávání; návrh vyhledávače; dolování textu
Keywords in English ScaleText; vector space modelling; Latent Semantic Indexing; LSI; machine learning; scalable search; search system design; text mining
Tags International impact, Reviewed
Changed by Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 29. 11. 2016 11:48.
Abstract
This paper describes the design of a new ScaleText system aimed at scalable semantic indexing of heterogeneous textual corpora. We discuss the design decisions that lead to a modular system architecture for indexing and searching using semantic vectors of document segments – nuggets of wisdom. The prototype system implementation is evaluated by applying Latent Semantic Indexing (LSI) on the Enron corpus. And the Bpref measure is used to automate comparing the performance of different algorithms and system configurations.
Links
MUNI/A/0892/2015, internal MU codeName: Výzkum v aplikované informatice na FI MU (Acronym: VAIFIMU)
Investor: Masaryk University, Grant Agency of the Masaryk University, Category A
TD03000295, research and development projectName: Inteligentní software pro sémantické hledání dokumentů (Acronym: ISSHD)
Investor: Technology Agency of the Czech Republic, OMEGA - Programme of support of applied social science research and experimental development
PrintDisplayed: 22. 11. 2017 13:54

Other references 


Go to top | Current date and time: 22. 11. 2017 13:54, Week 47 (odd)

Contact: istech(zavináč/atsign)fi(tečka/dot)muni(tečka/dot)cz, Office for Studies, access rights administrators, is-technicians, e-technicians, IT support | Use of cookies | learn more about Information System