Detailed Information on Publication Record
2013
Avoiding Anomalies in Data Stream Learning
GAMA, Joao, Petr KOSINA and Ezilda ALMEIDABasic information
Original name
Avoiding Anomalies in Data Stream Learning
Authors
GAMA, Joao (620 Portugal), Petr KOSINA (203 Czech Republic, guarantor, belonging to the institution) and Ezilda ALMEIDA (620 Portugal)
Edition
Berlin Heidelberg, Discovery Science, Proceedings of 16th International Conference DS 2013, p. 49-63, 15 pp. 2013
Publisher
Springer
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
Germany
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
printed version "print"
References:
Impact factor
Impact factor: 0.402 in 2005
RIV identification code
RIV/00216224:14330/13:00070032
Organization unit
Faculty of Informatics
ISBN
978-3-642-40896-0
ISSN
UT WoS
000340562100004
Keywords in English
Data Streams; Rule Learning; Anomaly Detection
Tags
Tags
International impact, Reviewed
Změněno: 7/1/2019 14:02, RNDr. Pavel Šmerk, Ph.D.
Abstract
V originále
The presence of anomalies in data compromises data quality and can reduce the effectiveness of learning algorithms. Standard data mining methodologies refer to data cleaning as a pre-processing before the learning task. The problem of data cleaning is exacerbated when learning in the computational model of data streams. In this paper we present a streaming algorithm for learning classification rules able to detect contextual anomalies in the data. Contextual anomalies are surprising attribute values in the context defined by the conditional part of the rule. For each example we compute the degree of anomaliness based on the probability of the attribute-values given the conditional part of the rule covering the example. The examples with high degree of anomaliness are signaled to the user and not used to train the classifier. The experimental evaluation in real-world data sets shows the ability to discover anomalous examples in the data. The main advantage of the proposed method is the ability to inform the context and explain why the anomaly occurs.
Links
LG13010, research and development project |
| ||
MUNI/A/0758/2011, interní kód MU |
|