Další formáty:
BibTeX
LaTeX
RIS
@inproceedings{598608, author = {Blaťák, Jan}, address = {Covilha, Portugal}, booktitle = {EPIA'05, 12th Portuguese Conference on Artificial Intelligence}, edition = {1}, keywords = {machine learning; first-order frequent patterns; text mining; distributed mining}, language = {eng}, location = {Covilha, Portugal}, isbn = {0-7803-9365-1}, pages = {344-350}, publisher = {Institute of Electrical and Electronics Engineers, Inc.}, title = {First-order Frequent Patterns in Text Mining}, year = {2005} }
TY - JOUR ID - 598608 AU - Blaťák, Jan PY - 2005 TI - First-order Frequent Patterns in Text Mining PB - Institute of Electrical and Electronics Engineers, Inc. CY - Covilha, Portugal SN - 0780393651 KW - machine learning KW - first-order frequent patterns KW - text mining KW - distributed mining N2 - In this paper a universal framework for mining long first-order frequent patterns in text data is presented. It consists of RAP, an ILP system for mining maximal first-order frequent patterns, and two types of redefined background knowledge. Two methods of using generated patterns for solving text mining tasks are described: propositionalization and CBA (class based association). A new variant of the CBA rule based classifier is proposed. The framework is used for solving three text mining tasks: information extraction from biomedical texts, context-sensitive text correction of English and morphological disambiguation of Czech. The distributed mining of frequent patterns is described and its influence on mining in text is discussed. It is shown that frequent patterns as new features for propositionalization usually provide better results than CBA. ER -
BLAŤÁK, Jan. First-order Frequent Patterns in Text Mining. In \textit{EPIA'05, 12th Portuguese Conference on Artificial Intelligence}. 1. vyd. Covilha, Portugal: Institute of Electrical and Electronics Engineers, Inc., 2005, s.~344-350. ISBN~0-7803-9365-1.
|