Fast and Faster: A Comparison of Two Streamed Matrix
Decomposition Algorithms

D 2010

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

ŘEHŮŘEK, Radim

Základní údaje

Originální název

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

Název česky

Fast and Faster: A Comparison of Two Streamed Matrix Decomposition Algorithms

Autoři

ŘEHŮŘEK, Radim

Vydání

NIPS 2010 workshop on Low-rank Methods for Large-scale Machine Learning, 7 s. 2010

Další údaje

Jazyk

angličtina

Typ výsledku

Stať ve sborníku

Obor

10000 1. Natural Sciences

Stát vydavatele

Kanada

Utajení

není předmětem státního či obchodního tajemství

Odkazy

workshop poster

Organizační jednotka

Fakulta informatiky

Klíčová slova česky

svd lsa lsi

Klíčová slova anglicky

svd lda lsi

Štítky

similarity of text documents

Příznaky

Mezinárodní význam, Recenzováno

Změněno: 21. 1. 2011 15:43, RNDr. Radim Řehůřek, Ph.D.

Anotace

V originále

With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in algorithms that operate in \emph{constant memory} w.r.t. to the input size, so that arbitrarily large input can be processed. In this paper, we present a practical comparison of two such algorithms: a distributed method that operates in a single pass over the input vs. a streamed two-pass stochastic algorithm. The experiments track the effect of distributed computing, oversampling and memory trade-offs on the accuracy and performance of the two algorithms. To ensure meaningful results, we choose the input to be a real dataset, namely the whole of the English Wikipedia, in the application settings of Latent Semantic Analysis.

Návaznosti

LC536, projekt VaV

Název: Centrum komputační lingvistiky

Investor: Ministerstvo školství, mládeže a tělovýchovy ČR, Centrum komputační lingvistiky

Přehled o publikaci