Gensim -- Statistical Semantics in Python
ŘEHŮŘEK, Radim and Petr SOJKA. Gensim -- Statistical Semantics in Python. In EuroScipy 2011, Paris. 2011. |
Other formats:
BibTeX
LaTeX
RIS
|
Basic information | |
---|---|
Original name | Gensim -- Statistical Semantics in Python |
Name in Czech | Gensim -- statistická sémantika v Pythonu |
Authors | ŘEHŮŘEK, Radim (203 Czech Republic, guarantor, belonging to the institution) and Petr SOJKA (203 Czech Republic, belonging to the institution). |
Edition | EuroScipy 2011, Paris, 2011. |
Other information | |
---|---|
Original language | English |
Type of outcome | Presentations at conferences |
Field of Study | 10201 Computer sciences, information science, bioinformatics |
Country of publisher | France |
Confidentiality degree | is not subject to a state or trade secret |
WWW | poster conference programme |
RIV identification code | RIV/00216224:14330/11:00053512 |
Organization unit | Faculty of Informatics |
Keywords (in Czech) | statistická sémantika;gensim;Python;LDA;SVD |
Keywords in English | statistical semantics;gensim;Python;LDA;SVD |
Tags | International impact, Reviewed |
Changed by | Changed by: doc. RNDr. Petr Sojka, Ph.D., učo 2378. Changed: 17/4/2012 22:37. |
Abstract |
---|
\texttt{Gensim} is a pure Python library that fights on two fronts: 1)~digital document indexing and similarity search; and 2)~fast, memory-efficient, scalable algorithms for Singular Value Decomposition and Latent Dirichlet Allocation. The connection between the two is unsupervised, semantic analysis of plain text in digital collections. Gensim was created for large digital libraries, but its underlying algorithms for large-scale, distributed, online SVD and LDA are like the Swiss Army knife of data analysis---also useful on their own, outside of the domain of Natural Language Processing. |
Abstract (in Czech) |
---|
\texttt{Gensim} je knihovna naprogramovaná jazyce Python, která je užitečná na dvou frontách: 1) pro indexaci elektronických dokumentů a pro podobnostní hledání; a 2) pro rychlou, paměťově omezenou a efektivní škálovatelnou implementaci algoritmů pro Singular Value Decomposition a Latent Dirichlet Allocation. Vazba mezi oběma užitími je semantická analýza textů (bez učitele) v rozsáhlých digitálních kolekcích a knihovnách. Gensim byl vytvořen pro velké digitální knihovny, ale jím implementované algoritmy pro velké, distribuované, online užití SVD a LDA jsou švýcarským nožíkem analýzy dat a jako takové jsou užitečné i mimo doménu Natural Language Processing. |
Links | |
---|---|
LC536, research and development project | Name: Centrum komputační lingvistiky |
Investor: Ministry of Education, Youth and Sports of the CR, Centrum komputační lingvistiky | |
250503, interní kód MU | Name: The European Digital Mathematics Library (Acronym: EuDML) |
Investor: European Union |
PrintDisplayed: 21/9/2024 14:35