ŽIŽKA, Jan a Arnošt SVOBODA. Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis. Brno: Mendel University in Brno, Czech Republic, 2015, roč. 63, č. 6, s. 2229-2237. ISSN 1211-8516. doi:10.11118/actaun201563062229.
Další formáty:   BibTeX LaTeX RIS
Základní údaje
Originální název Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
Autoři ŽIŽKA, Jan (203 Česká republika, garant) a Arnošt SVOBODA (203 Česká republika, domácí).
Vydání Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Brno, Mendel University in Brno, Czech Republic, 2015, 1211-8516.
Další údaje
Originální jazyk angličtina
Typ výsledku Článek v odborném periodiku
Obor 10201 Computer sciences, information science, bioinformatics
Stát vydavatele Česká republika
Utajení není předmětem státního či obchodního tajemství
WWW URL
Kód RIV RIV/00216224:14560/15:00085489
Organizační jednotka Ekonomicko-správní fakulta
Doi http://dx.doi.org/10.11118/actaun201563062229
Klíčová slova anglicky text mining; customer opinion analysis; decision trees; decision rules; windowing; large data volumes; machine learning; computational complexity; training-set size
Příznaky Recenzováno
Změnil Změnila: Mgr. Daniela Marcollová, učo 111148. Změněno: 5. 5. 2016 10:53.
Anotace
Not only can the shortage of data be a data mining problem - having too much data may be the cause of difficulty as well. The experimental investigation of the influence of the review number on the knowledge mined from the text documents demonstrated primarily the not surprising cardinal high-time dependence. With the permanent increase of the volume of hotel-service reviews, the CPU time of the text mining process grew strongly non-linearly while the knowledge, expressed in generated semantically relevant words, remained increasing, too, even if its increase was progressively smaller all the time. Among others, the revealed relevant words (or phrases composed of them) can be further used as significant key-words for information retrieval or for defining more detailed topics hidden in text documents. After finishing the above described research, which aimed at revealing relevant words that represented the reviews, a following series of experiments have been started to mine better knowledge that would provide more information understandable by humans: automatically discovering significant phrases composed from relevant words. To find the phrases, a method of analyzing n-grams (here a contiguous sequence of n words) was applied to reviews written in English, Spanish, German, and Russian. Similar procedures as described in this article, using the same decision-trees/rules tool, data source, and windows containing constantly 100,000 reviews, were used. From the semantic point of view - unlike 1-grams described in this paper - the best phrases were provided by 3-grams, for example, "breakfast very good" (a positive phrase), "no free Internet" (a negative phrase) and so like. Details can be found in Žižka and Dařena (2015).
VytisknoutZobrazeno: 4. 10. 2022 16:25