Cost-Sensitive Strategies for Data Imbalance in Bug Severity
Classification: Experimental Results

D 2017

Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results

SINGHA ROY, Nivir Kanti and Bruno ROSSI

Basic information

Original name

Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results

Authors

SINGHA ROY, Nivir Kanti (50 Bangladesh) and Bruno ROSSI (380 Italy, guarantor, belonging to the institution)

Edition

Not specified, 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017, p. 426-429, 4 pp. 2017

Publisher

IEEE

Other information

Language

English

Type of outcome

Stať ve sborníku

Field of Study

10201 Computer sciences, information science, bioinformatics

Country of publisher

United States of America

Confidentiality degree

není předmětem státního či obchodního tajemství

Publication form

electronic version available online

References:

URL

RIV identification code

RIV/00216224:14330/17:00100027

Organization unit

Faculty of Informatics

ISBN

978-1-5386-2140-0

DOI

http://dx.doi.org/10.1109/SEAA.2017.71

UT WoS

000426074600063

Keywords in English

cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier

Abstract

V originále

Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.

Citovat

SINGHA ROY, Nivir Kanti and Bruno ROSSI. Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results. Online. In 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017. Not specified: IEEE, 2017, p. 426-429. ISBN 978-1-5386-2140-0. Available from: https://dx.doi.org/10.1109/SEAA.2017.71.

@inproceedings{1408417,
   author = {Singha Roy, Nivir Kanti and Rossi, Bruno},
   address = {Not specified},
   booktitle = {43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017},
   doi = {http://dx.doi.org/10.1109/SEAA.2017.71},
   keywords = {cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier},
   howpublished = {elektronická verze "online"},
   language = {eng},
   location = {Not specified},
   isbn = {978-1-5386-2140-0},
   pages = {426-429},
   publisher = {IEEE},
   title = {Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results},
   url = {http://ieeexplore.ieee.org/document/8051382/},
   year = {2017}
}

TY  - JOUR
ID  - 1408417
AU  - Singha Roy, Nivir Kanti - Rossi, Bruno
PY  - 2017
TI  - Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
PB  - IEEE
CY  - Not specified
SN  - 9781538621400
KW  - cost-sensitive strategies
KW  - data imbalance
KW  - software bug severity classification
KW  - software bug triaging process
KW  - support vector machine
KW  - SVM classifier
UR  - http://ieeexplore.ieee.org/document/8051382/
N2  - Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.
ER  -

SINGHA ROY, Nivir Kanti and Bruno ROSSI. Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results. Online. In \textit{43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017}. Not specified: IEEE, 2017, p.~426-429. ISBN~978-1-5386-2140-0. Available from: https://dx.doi.org/10.1109/SEAA.2017.71.

Detailed Information on Publication Record