Informační systém MU
ŘEHŮŘEK, Radim. On Dimensionality of Latent Semantic Indexing for Text Segmentation. Proceedings of the International Multiconference on Computer Science and Information Technology. Wisła, Poland, 2007, vol. 2007, No 2, p. 347-356. ISSN 1896-7094.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name On Dimensionality of Latent Semantic Indexing for Text Segmentation
Name in Czech K dimenzionalitě Lantentního Sémantického Indexování pro segmentaci textu
Authors ŘEHŮŘEK, Radim (203 Czech Republic, guarantor).
Edition Proceedings of the International Multiconference on Computer Science and Information Technology, Wisła, Poland, 2007, 1896-7094.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Poland
Confidentiality degree is not subject to a state or trade secret
WWW URL
RIV identification code RIV/00216224:14330/07:00022870
Organization unit Faculty of Informatics
Keywords in English text segmentation; LSI; latent semantic indexing
Tags latent semantic indexing, LSI, text segmentation
Tags International impact, Reviewed
Changed by Changed by: RNDr. Radim Řehůřek, Ph.D., učo 39672. Changed: 4/12/2007 00:00.
Abstract
In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set.
Abstract (in Czech)
In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
Displayed: 26/4/2024 23:21