D 2010

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

ŘEHŮŘEK, Radim

Basic information

Original name

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

Name in Czech

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

Authors

ŘEHŮŘEK, Radim

Edition

NIPS 2010 workshop on Low-rank Methods for Large-scale Machine Learning, 7 pp. 2010

Other information

Language

English

Type of outcome

Proceedings paper

Field of Study

10000 1. Natural Sciences

Country of publisher

Canada

Confidentiality degree

is not subject to a state or trade secret

References:

Organization unit

Faculty of Informatics

Keywords (in Czech)

svd lsa lsi

Keywords in English

svd lda lsi

Tags

International impact, Reviewed
Changed: 21/1/2011 15:43, RNDr. Radim Řehůřek, Ph.D.

Abstract

V originále

With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in algorithms that operate in \emph{constant memory} w.r.t. to the input size, so that arbitrarily large input can be processed. In this paper, we present a practical comparison of two such algorithms: a distributed method that operates in a single pass over the input vs. a streamed two-pass stochastic algorithm. The experiments track the effect of distributed computing, oversampling and memory trade-offs on the accuracy and performance of the two algorithms. To ensure meaningful results, we choose the input to be a real dataset, namely the whole of the English Wikipedia, in the application settings of Latent Semantic Analysis.

Links

LC536, research and development project
Name: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky