ŽIŽKA, Jan and Arnošt SVOBODA. Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis. Brno: Mendel University in Brno, Czech Republic, 2015, vol. 63, No 6, p. 2229-2237. ISSN 1211-8516. doi:10.11118/actaun201563062229.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
Authors ŽIŽKA, Jan (203 Czech Republic, guarantor) and Arnošt SVOBODA (203 Czech Republic, belonging to the institution).
Edition Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Brno, Mendel University in Brno, Czech Republic, 2015, 1211-8516.
Other information
Original language English
Type of outcome Article in a journal
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Czech Republic
Confidentiality degree is not subject to a state or trade secret
WWW URL
RIV identification code RIV/00216224:14560/15:00085489
Organization unit Faculty of Economics and Administration
Doi http://dx.doi.org/10.11118/actaun201563062229
Keywords in English text mining; customer opinion analysis; decision trees; decision rules; windowing; large data volumes; machine learning; computational complexity; training-set size
Tags Reviewed
Changed by Changed by: Mgr. Daniela Marcollová, učo 111148. Changed: 5. 5. 2016 10:53.
Abstract
Not only can the shortage of data be a data mining problem - having too much data may be the cause of difficulty as well. The experimental investigation of the influence of the review number on the knowledge mined from the text documents demonstrated primarily the not surprising cardinal high-time dependence. With the permanent increase of the volume of hotel-service reviews, the CPU time of the text mining process grew strongly non-linearly while the knowledge, expressed in generated semantically relevant words, remained increasing, too, even if its increase was progressively smaller all the time. Among others, the revealed relevant words (or phrases composed of them) can be further used as significant key-words for information retrieval or for defining more detailed topics hidden in text documents. After finishing the above described research, which aimed at revealing relevant words that represented the reviews, a following series of experiments have been started to mine better knowledge that would provide more information understandable by humans: automatically discovering significant phrases composed from relevant words. To find the phrases, a method of analyzing n-grams (here a contiguous sequence of n words) was applied to reviews written in English, Spanish, German, and Russian. Similar procedures as described in this article, using the same decision-trees/rules tool, data source, and windows containing constantly 100,000 reviews, were used. From the semantic point of view - unlike 1-grams described in this paper - the best phrases were provided by 3-grams, for example, "breakfast very good" (a positive phrase), "no free Internet" (a negative phrase) and so like. Details can be found in Žižka and Dařena (2015).
PrintDisplayed: 8. 8. 2022 16:09