Avoiding Anomalies in Data Stream Learning

GAMA, Joao, Petr KOSINA and Ezilda ALMEIDA. Avoiding Anomalies in Data Stream Learning. Online. In Johannes Furnkranz, Eyke Hullermeier,Tomoyuki Higuchi. Discovery Science, Proceedings of 16th International Conference DS 2013. Berlin Heidelberg: Springer, 2013. p. 49-63. ISBN 978-3-642-40896-0. Available from: https://dx.doi.org/10.1007/978-3-642-40897-7_4. [citováno 2024-04-23]

Other formats: BibTeX LaTeX RIS

Basic information
Original name	Avoiding Anomalies in Data Stream Learning
Authors	GAMA, Joao (620 Portugal), Petr KOSINA (203 Czech Republic, guarantor, belonging to the institution) and Ezilda ALMEIDA (620 Portugal)
Edition	Berlin Heidelberg, Discovery Science, Proceedings of 16th International Conference DS 2013, p. 49-63, 15 pp. 2013.
Publisher	Springer

Other information
Original language	English
Type of outcome	Proceedings paper
Field of Study	10201 Computer sciences, information science, bioinformatics
Country of publisher	Germany
Confidentiality degree	is not subject to a state or trade secret
Publication form	printed version "print"
WWW	URL
Impact factor	Impact factor: 0.402 in 2005
RIV identification code	RIV/00216224:14330/13:00070032
Organization unit	Faculty of Informatics
ISBN	978-3-642-40896-0
ISSN	0302-9743
Doi	http://dx.doi.org/10.1007/978-3-642-40897-7_4
UT WoS	000340562100004
Keywords in English	Data Streams; Rule Learning; Anomaly Detection
Tags	firank_B
Tags	International impact, Reviewed
Changed by	Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 7/1/2019 14:02.

Abstract

The presence of anomalies in data compromises data quality and can reduce the effectiveness of learning algorithms. Standard data mining methodologies refer to data cleaning as a pre-processing before the learning task. The problem of data cleaning is exacerbated when learning in the computational model of data streams. In this paper we present a streaming algorithm for learning classification rules able to detect contextual anomalies in the data. Contextual anomalies are surprising attribute values in the context defined by the conditional part of the rule. For each example we compute the degree of anomaliness based on the probability of the attribute-values given the conditional part of the rule covering the example. The examples with high degree of anomaliness are signaled to the user and not used to train the classifier. The experimental evaluation in real-world data sets shows the ability to discover anomalous examples in the data. The main advantage of the proposed method is the ability to inform the context and explain why the anomaly occurs.

Links
LG13010, research and development project	Name: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
LG13010, research and development project	Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0758/2011, interní kód MU	Name: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity (Acronym: SKOMU)
MUNI/A/0758/2011, interní kód MU	Investor: Masaryk University, Category A

PrintDisplayed: 23/4/2024 18:39

Avoiding Anomalies in Data Stream Learning

Other applications