ŘEHŮŘEK, Radim. On Dimensionality of Latent Semantic Indexing for Text Segmentation. Proceedings of the International Multiconference on Computer Science and Information Technology. Wisła, Poland, 2007, vol. 2007, No 2, p. 347-356. ISSN 1896-7094. |
Other formats:
BibTeX
LaTeX
RIS
@article{727390, author = {Řehůřek, Radim}, article_location = {Wisła, Poland}, article_number = {2}, keywords = {text segmentation; LSI; latent semantic indexing}, language = {eng}, issn = {1896-7094}, journal = {Proceedings of the International Multiconference on Computer Science and Information Technology}, title = {On Dimensionality of Latent Semantic Indexing for Text Segmentation}, url = {http://www.papers2007.imcsit.org/}, volume = {2007}, year = {2007} }
TY - JOUR ID - 727390 AU - Řehůřek, Radim PY - 2007 TI - On Dimensionality of Latent Semantic Indexing for Text Segmentation JF - Proceedings of the International Multiconference on Computer Science and Information Technology VL - 2007 IS - 2 SP - 347-356 EP - 347-356 SN - 18967094 KW - text segmentation KW - LSI KW - latent semantic indexing UR - http://www.papers2007.imcsit.org/ N2 - In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set. ER -
ŘEHŮŘEK, Radim. On Dimensionality of Latent Semantic Indexing for Text Segmentation. \textit{Proceedings of the International Multiconference on Computer Science and Information Technology}. Wisła, Poland, 2007, vol.~2007, No~2, p.~347-356. ISSN~1896-7094.
|