Model evaluation qualitative  following the denition of data mining (Piatetski-Shapiro, Fayaad, 90th): how new, interesting, useful and understandable the model is (not) corresponding to expectations (common sense), to knowledge of an expert quantitative 1 Model evaluation qualitative  following the denition of data mining (Piatetski-Shapiro, Fayaad, 90th): how new, interesting, useful and understandable the model is (not) corresponding to expectations (common sense), to knowledge of an expert quantitative 2 Evaluation for dierent machne learning task clustering  is the number of clusters and the structure appropriate associations  which rule is interesting outlier detection  top N outliers classication and regression 3 Evaluation for dierent machne learning task clustering  is the number of clusters and the structure appropriate associations  which rule is interesting outlier detection  top N outliers classication and regression 4 Classication Training set | Learning algorithm | input atributes of a test instance   > Model/Hypothesis/Classier   > predicted class label accuracy [celková správnost]  how often returns the correct class label speed  learning, testing robustness  to make correct predictions given noisy data or data with missing values scalability  ecient for large amounts of data 5 Classication Training set | Learning algorithm | input atributes of a test instance   > Model/Hypothesis/Classier   > predicted class label accuracy [celková správnost]  how often returns correct class label speed  learning, testing robustness  to make correct predictions given noisy data or data with missing values scalability  ecient for large amounts of data 6 Classication main criterion  how succesful Model is on data a principal decision  what data to use for the most accurate prediction of model accuracy Most common (but correct?) learning data test set cross-validation leave-one-out Is there any other possibility, maybe better? bootstraping, splitting data into disjunctive parts, ... 7 Classication main criterion  how succesful Model is on data a principal decision  what data to use for the most accurate prediction of model accuracy Most common (but correct?) learning data test set cross-validation leave-one-out Is there any other possibility, maybe better? bootstraping, splitting data into disjunctive parts, ... 8 Classication main criterion  how succesful Model is on data. a principal decision  what data to use for the most accurate prediction of model accuracy Most common (but correct?) learning data test set cross-validation leave-one-out Is there any other possibility, maybe better? bootstraping, splitting data into disjunctive parts, ... 9 Classication main criterion  how succesful Model is on data. a principal decision  what data to use for the most accurate prediction of model accuracy Most common (but correct?) learning data test set cross-validation leave-one-out Is there any other possibility, maybe better? bootstraping, splitting data into disjunctive parts, ... 9 Classication main criterion  how succesful Model is on data. a principal decision  what data to use for the most accurate prediction of model accuracy Most common (but correct?) learning data test set cross-validation leave-one-out Is there any other possibility, maybe better? bootstraping, splitting data into disjunctive parts, ... 9 Classication main criterion  how succesful Model is on data. a principal decision  what data to use for the most accurate prediction of model accuracy Most common (but correct?) learning data test set cross-validation leave-one-out Is there any other possibility, maybe better? bootstraping, splitting data into disjunctive parts, ... 9 Confusion matrix TP, TN, FP, FN ... the number of true positive, true negative, false positive, false negative P, N ... cardinality of positive and negative samples 10 Evaluation measures (overall) accuracy [celková správnost] Acc = TP+TN TP+TN+FP+FN error rate, (misclassication rate) [chyba] Err = 1 − Acc = wFP∗FP+wFN∗FN TP+TN+FP+FN wFP, wFN ... weight of FP and FN errors default wFP, wFN = 1 precision TP TP+FP sensitivity, true positive rate, recall TP TP+FN specicity, true negative rate TN TN+FP 11 Evaluation measures (overall) accuracy [celková správnost] Acc = TP+TN TP+TN+FP+FN error rate, (misclassication rate) [chyba] Err = 1 − Acc = wFP∗FP+wFN∗FN TP+TN+FP+FN wFP, wFN ... weight of FP and FN errors default wFP, wFN = 1 precision TP TP+FP sensitivity, true positive rate, recall TP TP+FN specicity, true negative rate TN TN+FP 12 Evaluation measures Accuracy for a class P, N F-measures combines precision and recall F, F1, F-score = hramonic mean of precision and recall F1 = 2∗precision∗recall precision+recall Fβ = (1+β2)precision∗recall β2∗precision+recall β ... a non-negative real number 13 Evaluation measures for comparing classiers Learning curve Accuracy as a function of number of iterations ROC curve  relation between TP and FP 14 Sampling holdout  split data randomly to learning and test data, e.g. 2/3 vs. 1/3 stratied sampling  preserve relative frequency of classes in samples Random (sub)sampling  holdout method is repeated k times The overall accuracy estimate is taken as the average of the accuracies obtained from each iteration. bootstraping undersampling/oversampling of a class  for processing imbalanced data 15