2010
Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms
ŘEHŮŘEK, RadimBasic information
Original name
Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms
Name in Czech
Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms
Authors
ŘEHŮŘEK, Radim
Edition
NIPS 2010 workshop on Low-rank Methods for Large-scale Machine Learning, 7 pp. 2010
Other information
Language
English
Type of outcome
Proceedings paper
Field of Study
10000 1. Natural Sciences
Country of publisher
Canada
Confidentiality degree
is not subject to a state or trade secret
Organization unit
Faculty of Informatics
Keywords (in Czech)
svd lsa lsi
Keywords in English
svd lda lsi
Tags
International impact, Reviewed
Changed: 21/1/2011 15:43, RNDr. Radim Řehůřek, Ph.D.
Abstract
V originále
With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in algorithms that operate in \emph{constant memory} w.r.t. to the input size, so that arbitrarily large input can be processed. In this paper, we present a practical comparison of two such algorithms: a distributed method that operates in a single pass over the input vs. a streamed two-pass stochastic algorithm. The experiments track the effect of distributed computing, oversampling and memory trade-offs on the accuracy and performance of the two algorithms. To ensure meaningful results, we choose the input to be a real dataset, namely the whole of the English Wikipedia, in the application settings of Latent Semantic Analysis.
Links
LC536, research and development project |
|