POMIKÁLEK, Jan and Radim ŘEHŮŘEK. The Influence of Preprocessing Parameters on Text Categorization. International Journal of Applied Science, Engineering and Technology, 2007, 4/2007, No 1, p. 430-434. ISSN 1307-4318.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name The Influence of Preprocessing Parameters on Text Categorization
Name in Czech Vliv parametrů předzpracování na kategorizaci textu
Authors POMIKÁLEK, Jan (203 Czech Republic, guarantor) and Radim ŘEHŮŘEK (203 Czech Republic).
Edition International Journal of Applied Science, Engineering and Technology, 2007, 1307-4318.
Other information
Original language English
Type of outcome article in a journal
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Thailand
Confidentiality degree is not subject to a state or trade secret
WWW URL
RIV identification code RIV/00216224:14330/07:00022183
Organization unit Faculty of Informatics
UT WoS 000260422800082
Keywords in English machine learning; text categorization; preprocessing; feature selection
Tags feature selection, machine learning, preprocessing, text categorization
Tags International impact, Reviewed
Changed by Changed by: RNDr. Radim Řehůřek, Ph.D., učo 39672. Changed: 29. 3. 2010 18:48.
Abstract
Results of a large scale study on mutual influence of preprocessing parameters in automated text categorization are presented and analyzed. These parameters include choice of tokenizer, feature selection, stemming, term weighing and data amount in combination with various Machine Learning algorithms.
Abstract (in Czech)
Výsledek studie o vzájemném vlivu parametrů předzpracování na automatickou kategorizaci textu. Sledované parametry zahrnují tokenizaci, výběr rysů, váhování, stemming a objem dat v kombinaci s několika metodami strojového učení.
Links
LC536, research and development projectName: Centrum komputační lingvistiky
Investor: Ministry of Education, Youth and Sports of the CR, Basic Research Center
2C06009, research and development projectName: Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce (Acronym: COT-SEWing)
Investor: Ministry of Education, Youth and Sports of the CR, Information technologies for knowledge society
PrintDisplayed: 26. 5. 2020 03:08