D
2016
ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text
RYGL, Jan, Petr SOJKA, Michal RŮŽIČKA and Radim ŘEHŮŘEK
Basic information
Original name
ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text
Authors
RYGL, Jan (203 Czech Republic),
Petr SOJKA (203 Czech Republic, guarantor, belonging to the institution),
Michal RŮŽIČKA (203 Czech Republic, belonging to the institution) and Radim ŘEHŮŘEK (203 Czech Republic)
Edition
Brno, Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, p. 79-87, 9 pp. 2016
Other information
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Czech Republic
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
RIV identification code
RIV/00216224:14330/16:00087632
Organization unit
Faculty of Informatics
Keywords (in Czech)
ScaleText; modelování vektorovým prostorem; latentní sémantické indexování; LSI; strojové učení; škálovatelné vyhledávání; návrh vyhledávače; dolování textu
Keywords in English
ScaleText; vector space modelling; Latent Semantic Indexing; LSI; machine learning; scalable search; search system design; text mining
Tags
International impact, Reviewed
V originále
This paper describes the design of a new ScaleText system aimed at scalable semantic indexing of heterogeneous textual corpora. We discuss the design decisions that lead to a modular system architecture for indexing and searching using semantic vectors of document segments – nuggets of wisdom. The prototype system implementation is evaluated by applying Latent Semantic Indexing (LSI) on the Enron corpus. And the Bpref measure is used to automate comparing the performance of different algorithms and system configurations.
Links
MUNI/A/0892/2015, interní kód MU | Name: Výzkum v aplikované informatice na FI MU (Acronym: VAIFIMU) | Investor: Masaryk University, Category A |
|
TD03000295, research and development project | Name: Inteligentní software pro sémantické hledání dokumentů (Acronym: ISSHD) | Investor: Technology Agency of the Czech Republic |
|
Displayed: 4/11/2024 18:51