Effect of training-sample size and classification difficulty on
the accuracy of genomic predictors

J 2010

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors

POPOVICI, Vlad; Weijie CHEN; Brandon G GALLAS; Christos HATZIS; Weiwei SHI et al.

Základní údaje

Originální název

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors

Autoři

Vydání

Breast Cancer Research, London, Great Britain, BIOMED CENTRAL LTD, 2010, 1465-5411

Další údaje

Jazyk

angličtina

Typ výsledku

Článek v odborném periodiku

Utajení

není předmětem státního či obchodního tajemství

Impakt faktor

Impact factor: 5.785

Označené pro přenos do RIV

DOI

https://doi.org/10.1186/bcr2468

UT WoS

000276986300011

Změněno: 4. 3. 2013 15:05, doc. Ing. Vlad Popovici, PhD

Anotace

V originále

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.

Citovat

POPOVICI, Vlad; Weijie CHEN; Brandon G GALLAS; Christos HATZIS; Weiwei SHI; Frank W SAMUELSON; Yuri NIKOLSKY; Marina TSYGANOVA; Alex ISHKIN; Tatiana NIKOLSKAYA; Kenneth R HESS; Vicente VALERO; Daniel BOOSER; Mauro DELORENZI; Gabriel N HORTOBAGYI; Leming SHI; W Fraser SYMMANS a Lajos PUSZTAI. Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Research. London, Great Britain: BIOMED CENTRAL LTD, 2010, roč. 12, č. 1, 13 s. ISSN 1465-5411. Dostupné z: https://doi.org/10.1186/bcr2468.

@article{1090304,
   author = {Popovici, Vlad and Chen, Weijie and Gallas, Brandon G and Hatzis, Christos and Shi, Weiwei and Samuelson, Frank W and Nikolsky, Yuri and Tsyganova, Marina and Ishkin, Alex and Nikolskaya, Tatiana and Hess, Kenneth R and Valero, Vicente and Booser, Daniel and Delorenzi, Mauro and Hortobagyi, Gabriel N and Shi, Leming and Symmans, W Fraser and Pusztai, Lajos},
   article_location = {London, Great Britain},
   article_number = {1},
   doi = {https://doi.org/10.1186/bcr2468},
   language = {eng},
   issn = {1465-5411},
   journal = {Breast Cancer Research},
   title = {Effect of training-sample size and classification difficulty on the accuracy of genomic predictors},
   volume = {12},
   year = {2010}
}

TY  - JOUR
ID  - 1090304
AU  - Popovici, Vlad - Chen, Weijie - Gallas, Brandon G - Hatzis, Christos - Shi, Weiwei - Samuelson, Frank W - Nikolsky, Yuri - Tsyganova, Marina - Ishkin, Alex - Nikolskaya, Tatiana - Hess, Kenneth R - Valero, Vicente - Booser, Daniel - Delorenzi, Mauro - Hortobagyi, Gabriel N - Shi, Leming - Symmans, W Fraser - Pusztai, Lajos
PY  - 2010
TI  - Effect of training-sample size and classification difficulty on the accuracy of genomic predictors
JF  - Breast Cancer Research
VL  - 12
IS  - 1
PB  - BIOMED CENTRAL LTD
SN  - 14655411
N2  - Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
ER  -

POPOVICI, Vlad; Weijie CHEN; Brandon G GALLAS; Christos HATZIS; Weiwei SHI; Frank W SAMUELSON; Yuri NIKOLSKY; Marina TSYGANOVA; Alex ISHKIN; Tatiana NIKOLSKAYA; Kenneth R HESS; Vicente VALERO; Daniel BOOSER; Mauro DELORENZI; Gabriel N HORTOBAGYI; Leming SHI; W Fraser SYMMANS a Lajos PUSZTAI. Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. \textit{Breast Cancer Research}. London, Great Britain: BIOMED CENTRAL LTD, 2010, roč.~12, č.~1, 13 s. ISSN~1465-5411. Dostupné z: https://doi.org/10.1186/bcr2468.

Přehled o publikaci