D 2013

Avoiding Anomalies in Data Stream Learning

GAMA, Joao, Petr KOSINA and Ezilda ALMEIDA

Basic information

Original name

Avoiding Anomalies in Data Stream Learning

Authors

GAMA, Joao (620 Portugal), Petr KOSINA (203 Czech Republic, guarantor, belonging to the institution) and Ezilda ALMEIDA (620 Portugal)

Edition

Berlin Heidelberg, Discovery Science, Proceedings of 16th International Conference DS 2013, p. 49-63, 15 pp. 2013

Publisher

Springer

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

Germany

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

printed version "print"

References:

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/13:00070032

Organization unit

Faculty of Informatics

ISBN

978-3-642-40896-0

ISSN

UT WoS

000340562100004

Keywords in English

Data Streams; Rule Learning; Anomaly Detection

Tags

Tags

International impact, Reviewed
Změněno: 7/1/2019 14:02, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

The presence of anomalies in data compromises data quality and can reduce the effectiveness of learning algorithms. Standard data mining methodologies refer to data cleaning as a pre-processing before the learning task. The problem of data cleaning is exacerbated when learning in the computational model of data streams. In this paper we present a streaming algorithm for learning classification rules able to detect contextual anomalies in the data. Contextual anomalies are surprising attribute values in the context defined by the conditional part of the rule. For each example we compute the degree of anomaliness based on the probability of the attribute-values given the conditional part of the rule covering the example. The examples with high degree of anomaliness are signaled to the user and not used to train the classifier. The experimental evaluation in real-world data sets shows the ability to discover anomalous examples in the data. The main advantage of the proposed method is the ability to inform the context and explain why the anomaly occurs.

Links

LG13010, research and development project
Name: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0758/2011, interní kód MU
Name: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity (Acronym: SKOMU)
Investor: Masaryk University, Category A