D 2022

On Usefulness of Outlier Elimination in Classification Tasks

HETLEROVIĆ, Dušan, Lubomír POPELÍNSKÝ, P. BRAZDIL, C. SOARES, F. FREAITAS et. al.

Basic information

Original name

On Usefulness of Outlier Elimination in Classification Tasks

Name in Czech

On Usefulness of Outlier Elimination in Classification Tasks

Authors

HETLEROVIĆ, Dušan (703 Slovakia, belonging to the institution), Lubomír POPELÍNSKÝ (203 Czech Republic, belonging to the institution), P. BRAZDIL, C. SOARES and F. FREAITAS

Edition

Rennes, International Symposium on Intelligent Data Analysis 2022, p. 143-156, 14 pp. 2022

Publisher

Springer

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

Impact factor

Impact factor: 0.402 in 2005

RIV identification code

RIV/00216224:14330/22:00126186

Organization unit

Faculty of Informatics

ISBN

978-3-031-01332-4

ISSN

UT WoS

000937256100012

Keywords (in Czech)

Outlier elimination; Metalearning; Average ranking; Reduction of portfolios

Keywords in English

Outlier elimination; Metalearning; Average ranking; Reduction of portfolios

Tags

International impact, Reviewed
Změněno: 28/3/2023 12:48, RNDr. Pavel Šmerk, Ph.D.

Abstract

V originále

Although outlier detection/elimination has been studied before, few comprehensive studies exist on when exactly this technique would be useful as preprocessing in classification tasks. The objective of our study is to fill in this gap. We have performed experiments with 12 various outlier elimination methods and 10 classification algorithms on 50 different datasets. The results were then processed by the proposed reduction method, whose aim is identify the most useful workflows for a given set of tasks (datasets). The reduction method has identified that just three OEMs that are generally useful for the given set of tasks. We have shown that the inclusion of these OEMs is indeed useful, as it leads to lower loss in accuracy and the difference is quite significant (0.5\%) on average.

In Czech

Although outlier detection/elimination has been studied before, few comprehensive studies exist on when exactly this technique would be useful as preprocessing in classification tasks. The objective of our study is to fill in this gap. We have performed experiments with 12 various outlier elimination methods and 10 classification algorithms on 50 different datasets. The results were then processed by the proposed reduction method, whose aim is identify the most useful workflows for a given set of tasks (datasets). The reduction method has identified that just three OEMs that are generally useful for the given set of tasks. We have shown that the inclusion of these OEMs is indeed useful, as it leads to lower loss in accuracy and the difference is quite significant (0.5\%) on average.