ŘEHŮŘEK, Radim. Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms. In Michael Mahoney, Ameet Talwalkar, Mehryan Mohri, Arthur Gretton. NIPS 2010 workshop on Low-rank Methods for Large-scale Machine Learning. 2010, 7 pp.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms
Name in Czech Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms
Authors ŘEHŮŘEK, Radim.
Edition NIPS 2010 workshop on Low-rank Methods for Large-scale Machine Learning, 7 pp. 2010.
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10000 1. Natural Sciences
Country of publisher Canada
Confidentiality degree is not subject to a state or trade secret
WWW workshop poster
Organization unit Faculty of Informatics
Keywords (in Czech) svd lsa lsi
Keywords in English svd lda lsi
Tags similarity of text documents
Tags International impact, Reviewed
Changed by Changed by: RNDr. Radim Řehůřek, Ph.D., učo 39672. Changed: 21/1/2011 15:43.
Abstract
With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in algorithms that operate in \emph{constant memory} w.r.t. to the input size, so that arbitrarily large input can be processed. In this paper, we present a practical comparison of two such algorithms: a distributed method that operates in a single pass over the input vs. a streamed two-pass stochastic algorithm. The experiments track the effect of distributed computing, oversampling and memory trade-offs on the accuracy and performance of the two algorithms. To ensure meaningful results, we choose the input to be a real dataset, namely the whole of the English Wikipedia, in the application settings of Latent Semantic Analysis.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky
PrintDisplayed: 26/4/2024 15:18