GAMA, Joao, Petr KOSINA and Ezilda ALMEIDA. Avoiding Anomalies in Data Stream Learning. Online. In Johannes Furnkranz, Eyke Hullermeier,Tomoyuki Higuchi. Discovery Science, Proceedings of 16th International Conference DS 2013. Berlin Heidelberg: Springer, 2013. p. 49-63. ISBN 978-3-642-40896-0. Available from: https://dx.doi.org/10.1007/978-3-642-40897-7_4. [citováno 2024-04-23]
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Avoiding Anomalies in Data Stream Learning
Authors GAMA, Joao (620 Portugal), Petr KOSINA (203 Czech Republic, guarantor, belonging to the institution) and Ezilda ALMEIDA (620 Portugal)
Edition Berlin Heidelberg, Discovery Science, Proceedings of 16th International Conference DS 2013, p. 49-63, 15 pp. 2013.
Publisher Springer
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher Germany
Confidentiality degree is not subject to a state or trade secret
Publication form printed version "print"
WWW URL
Impact factor Impact factor: 0.402 in 2005
RIV identification code RIV/00216224:14330/13:00070032
Organization unit Faculty of Informatics
ISBN 978-3-642-40896-0
ISSN 0302-9743
Doi http://dx.doi.org/10.1007/978-3-642-40897-7_4
UT WoS 000340562100004
Keywords in English Data Streams; Rule Learning; Anomaly Detection
Tags firank_B
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 7/1/2019 14:02.
Abstract
The presence of anomalies in data compromises data quality and can reduce the effectiveness of learning algorithms. Standard data mining methodologies refer to data cleaning as a pre-processing before the learning task. The problem of data cleaning is exacerbated when learning in the computational model of data streams. In this paper we present a streaming algorithm for learning classification rules able to detect contextual anomalies in the data. Contextual anomalies are surprising attribute values in the context defined by the conditional part of the rule. For each example we compute the degree of anomaliness based on the probability of the attribute-values given the conditional part of the rule covering the example. The examples with high degree of anomaliness are signaled to the user and not used to train the classifier. The experimental evaluation in real-world data sets shows the ability to discover anomalous examples in the data. The main advantage of the proposed method is the ability to inform the context and explain why the anomaly occurs.
Links
LG13010, research and development projectName: Zastoupení ČR v European Research Consortium for Informatics and Mathematics (Acronym: ERCIM-CZ)
Investor: Ministry of Education, Youth and Sports of the CR
MUNI/A/0758/2011, interní kód MUName: Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity (Acronym: SKOMU)
Investor: Masaryk University, Category A
PrintDisplayed: 23/4/2024 18:39