Detailed Information on Publication Record

RYGL, Jan, Petr SOJKA, Michal RŮŽIČKA and Radim ŘEHŮŘEK. ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text. In Aleš Horák, Pavel Rychlý, Adam Rambousek. Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016. Brno: Tribun EU, 2016, p. 79-87. ISBN 978-80-263-1095-2.

Other formats: BibTeX LaTeX RIS

Basic information
Original name	ScaleText: The Design of a Scalable, Adaptable and User-Friendly Document System for Similarity Searches : Digging for Nuggets of Wisdom in Text
Authors	RYGL, Jan (203 Czech Republic), Petr SOJKA (203 Czech Republic, guarantor, belonging to the institution), Michal RŮŽIČKA (203 Czech Republic, belonging to the institution) and Radim ŘEHŮŘEK (203 Czech Republic).
Edition	Brno, Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, p. 79-87, 9 pp. 2016.
Publisher	Tribun EU

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Czech Republic
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	Domovská stránka workshopu preprint
RIV identification code	RIV/00216224:14330/16:00087632
Organization unit	Faculty of Informatics
ISBN	978-80-263-1095-2
ISSN	2336-4289
UT WoS	000466886400009
Keywords (in Czech)	ScaleText; modelování vektorovým prostorem; latentní sémantické indexování; LSI; strojové učení; škálovatelné vyhledávání; návrh vyhledávače; dolování textu
Keywords in English	ScaleText; vector space modelling; Latent Semantic Indexing; LSI; machine learning; scalable search; search system design; text mining
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 13/5/2020 19:39.

Abstract
This paper describes the design of a new ScaleText system aimed at scalable semantic indexing of heterogeneous textual corpora. We discuss the design decisions that lead to a modular system architecture for indexing and searching using semantic vectors of document segments – nuggets of wisdom. The prototype system implementation is evaluated by applying Latent Semantic Indexing (LSI) on the Enron corpus. And the Bpref measure is used to automate comparing the performance of different algorithms and system configurations.

Abstract

This paper describes the design of a new ScaleText system aimed at scalable semantic indexing of heterogeneous textual corpora. We discuss the design decisions that lead to a modular system architecture for indexing and searching using semantic vectors of document segments – nuggets of wisdom. The prototype system implementation is evaluated by applying Latent Semantic Indexing (LSI) on the Enron corpus. And the Bpref measure is used to automate comparing the performance of different algorithms and system configurations.

Links
MUNI/A/0892/2015, interní kód MU	Name: Výzkum v aplikované informatice na FI MU (Acronym: VAIFIMU)
MUNI/A/0892/2015, interní kód MU	Investor: Masaryk University, Category A
TD03000295, research and development project	Name: Inteligentní software pro sémantické hledání dokumentů (Acronym: ISSHD)
TD03000295, research and development project	Investor: Technology Agency of the Czech Republic