Detailed Information on Publication Record
2017
Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
SINGHA ROY, Nivir Kanti and Bruno ROSSIBasic information
Original name
Cost-Sensitive Strategies for Data Imbalance in Bug Severity Classification: Experimental Results
Authors
SINGHA ROY, Nivir Kanti (50 Bangladesh) and Bruno ROSSI (380 Italy, guarantor, belonging to the institution)
Edition
Not specified, 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2017, p. 426-429, 4 pp. 2017
Publisher
IEEE
Other information
Language
English
Type of outcome
Stať ve sborníku
Field of Study
10201 Computer sciences, information science, bioinformatics
Country of publisher
United States of America
Confidentiality degree
není předmětem státního či obchodního tajemství
Publication form
electronic version available online
References:
RIV identification code
RIV/00216224:14330/17:00100027
Organization unit
Faculty of Informatics
ISBN
978-1-5386-2140-0
UT WoS
000426074600063
Keywords in English
cost-sensitive strategies; data imbalance; software bug severity classification; software bug triaging process; support vector machine; SVM classifier
Tags
International impact, Reviewed
Změněno: 20/11/2019 10:02, Bruno Rossi, PhD
Abstract
V originále
Context: Software Bug Severity Classification can help to improve the software bug triaging process. However, severity levels present a high-level of data imbalance that needs to be taken into account. Aim: We investigate cost-sensitive strategies in multi-class bug severity classification to counteract data imbalance. Method: We transform datasets from three severity classification papers to a common format, totaling 17 projects. We test different cost sensitive strategies to penalize majority classes. We adopt a Support Vector Machine (SVM) classifier that we also compare to a baseline "majority class" classifier. Results: A model weighting classes based on the inverse of instance frequencies yields a statistically significant improvement (low effect size) over the standard unweighted SVM model in the assembled dataset. Conclusions: Data imbalance should be taken more into consideration in future severity classification research papers.