Statistical methods in biology and medicine Group of mathematical methods concerning the collection, analysis and interpretation of the data A complete description of the world is both impossible and impractical (statistics represent a tool for reducing the variability of the data) Statistics creates mathematical models of the reality that can be helpful in making decisions It works correctly only when the assumptions of its methods are met Descriptive statistics • Population-wide-works with the data related to whole surveyed population (e.g. census, medical registry) • Inductive-conclusions based on sample data (obtained from a part of the target population) are extrapolated to whole population (assumption: random selection of the sample) Statistics as a data processing tool • „raw data" - often difficult to grasp • Descriptive statistics can make the data (of given sample) understandable kod Icislo adrenalin noradrenalin hypokineza ERa 397/Pvull ERa 351/Xbal TTCB113 -2013 1 354 3E43 baze CT AG TTCKE14-2013 2 307 2955 apex TT AA TTCKH15-2013 473 ÍÚ7Í apex CT AG TTCAJ16-2013 4 341 2108 apex CT AG TTCCHM17-2013 5 321 2031 apex CC GG TTCCH SIS-2013 6 42 E 1931 apex TT AA TTCRK19-2013 7 508 1753 difuzni TT AA TTCPD20-2013 S 374 1088 difuzni CT AA TTCMJ21-2013 9 597 1798 apex CC GG TTCPO22-2013 10 420 2856 apex CT AG TTVVA23-2013 11 367 2(357 apex CT AA TTCNL24-2013 12 327 2467 apex CT AG TTCJF25-2013 13 395 3929 apex CC GG TTCZM26-2013 14 344 37ue apex CT AG TTCHJ27-2013 15 42 E 4225 apex TT AA TTCGT2S-2013 16 2E5 240B apex CT AG TTCSB29-2013 17 295 3186 apex CT AG Kinds of data • Continuous (always quantitative) - the parameter can theoretically be of any value in a given interval (e.g. mean arterial pressure: 0-°°; ejection fraction: 0-100%) • Ratio data - both difference and ratio of two values can be computed (e.g.body weight) • Interval data - only differences, but not ratios of two values can be determined (e.g. IQ score) • Categorical (usually qualitative) - the parameter can only be of some specified values (e.g. blood group: 0, A, B, AB; sex: male, female; a disease is present/absent) • Ordinal data - are categorical, but quantitative (they can be ordered - e.g. heart failure classification NYHA l-IV) • Count data - can be ordered and form a linearly increasing row (e.g. number of children in a family: 0,1,2...) - they are often treated as continuous data • Binary data - only two possibilities (patients / healthy controls) The distribution of continuous data - histograms • The distribution of a continuous parameter can be visualized graphically (e.g. using histograms) • The values usually cluster around some numbers Heights of 30 people 10 9 6 7 6 5 4 3 a 1 D 139.5 149.5 159.5 169.5 179.5 1*9.5 1*9.5 Heights in cm www.aiMlyzem.ith.com Description of continuous data Measures of central tendency • The arithmetic mean (\x) • sum of values divided by their number (n) • The median (= 50% quantile) • cuts the order of values in half • The mode • most frequent value Measures of variability • variance (a2) • standard deviation (SD, a) • coefficient of variance (CV) • CV = a/n • standard error of mean (SE, SEM = • min-max (= range) • quartiles • upper 25% • median • lower 75% • skewness • kurtosis e probability distribution of continuous random varia Negatively (left) Normal Positively (right) skewed skewed skewed distribution _ distribution _ distribution • Probability density function • In graphs each (continuously) quantifiable variable (x axis) is linked to its probability (y axis) Examples of continuous data distribution I H •-■I / A ■Ii» Histograms + corresponding probability density functions Other ways of graphical visualisation • Box and whisker plots • Instead of median e.g. mean can be used, instead of quartiles („box") ± k a z critical 0.10 1.2S 0.05 1.65 O.Ol 2.33 • H0 Is symmetric: there is no difference between drug A and drug B (i.e. A is neither better nor worse than B) • They can reveal the differences in both ways • They are usually more suitable - we don't know the result a priori, and we are interested in both possible effects Tests for continuous data, 2 samples - examples Test Parametric Non-parametric Paired Paired (dependent) Wilcoxon paired test Student's t-test Sign test Unpaired Unpaired (independent) Mann-Whitney U-test * Student's t-test Kolmogorov-Smirnov test Tests for continuous data, more than 2 samples - examples Test Parametric Non-parametric Paired Repeated measures ANOVA (Analysis Of VAriance) -RMANOVA Friedman test („ANOVA") Unpaired One-way ANOVA (and its variants) Kruskal-Wallis test („ANOVA") • When ANOVA rejects H0, it is necessary to find out which specific samples differ from each other - post hoc tests Choose the best test In a clinical trial, patients take either a new drug to treat epilepsy or a placebo. The study is randomized (the study group is randomly drawn). Only patients, which have at least one and at most ten seizures in three months are included. The study evaluates a number of seizures during the first year of treatment A. Paired t-test B. Unpaired t-test v^ C. Mann-Whitney U-test D. Sign test E. Repeated measures ANOVA AN OVA • Analysis of variance • tests null hypothesis about more than two samples • requirements: Normal distribution, equal standard deviations • requires further analyses to find out which sample is different Nonparametric „ANOVA' • Kruskal-Wallis test (unpaired) • Friedman test (paired) Multiple comparisons problem When we perform more tests at once, the probability that some of them will give a statistically significant result only due to chance (i.e. type I error - H0 is wrongly refuted) - increases (e.g. during post hoc tests following ANOVA For example, when performing 10 tests at a = 0.05, the probability that none of them will give a significant result (given that H0 is true in all of them) equals (1-a)10 =60%, i.e. in 40% the H0 is wrongly rejected. That is why the multiple comparisons corrections are applied (Bonferroni, Benjamini-Hochberg...) to further decrease a (and thus make the criteria for refuting H0 stricter). Bonferroni correction: initial a is divided by the number of tests (or, alternatively, all p-values are multiplied by the number of tests with a left unchanged) • very "conservative". Post hoc tests in ANOVA • Each group with each other ("football matches") • Bonferroni correction a / [n (n - 1) / 2] • Tukey, Scheffe (ANOVA) • Dunn (Kruskal-Wallis) • Nemenyi (Friedman) • Each group with control group • Bonferroni correction a / (n - 1) • A priori, we are not interested in comparing the other groups between themselves • Dunnett (ANOVA) • Dunnett rank sum (nonparametric tests) Manual" multiple testing correction Useful in situations, where there are no standardized post hoc test as a part of statistical software • e.g. genetic tests - parameter in many candidate polymorphisms, comparing categorical data in more groups Bonferroni: a is divided by a number of tests (k) Bonferroni-Holm: each test has a different a-value. In a test with the lowest p-value, a(corr) equals a/k, in the second one a/(k-l), in the third one a/(k-2)... until the last one where it equals a Benjamini-Hochberg (FDR): each test has a different a-value. In a test with the lowest p-value, a(corr) equals a/k, in the second one a/(k/2), in the third one a/(k/3)... until the last one where it equals a When we find p > a(corr), results of following tests are not statisticaly significant either Alternatively, we may leave the a unchanged and create p(corr)-values by multiplying the p-values by denominators (dividing a in the examples above) Tests for categorical data • From the contingency table, its probability under the assumption that H0 is valid (i.e. the p-value) can be determined, as well as the effect size - e.g. the association between a mutation and a disease (expressed as RR - relative risk; OR - odds ratio) • Sometimes, a reduction of larger tables into 2x2 table is advantageous [this is especially suitable in ordinal data - e.g. heart failure staging NYHA l-IV can be transformed into binary data as mild failure (NYHA l+ll) and severe failure (NYHA lll+IV)] nemoc zdraví mutace 50 2 ne 4 48 • Paired design can be used (typically presence/absence of the disease in time) Before & After After Before Non-Srnoker Smoker Non-Srnoker 20 5 Smoker 16 9 Relative risk and odds ratio in 2x2 tables •••• CXX3000 WINS LOSSES 0dds = Probability = oooooo oooooo Exposure Status Event Occurred Yes No Exposed a b Not Exposed c d Relative Risk = Odds Ratio = a/fa 4-b] c/(c + d) a/b c/d ad cb • probability vs. odds • RR is suitable for prospective studies, while the design is not important in OR • If the dependent (modelled) variable is the same (e.g. event in the table), values of RR (a/(a+b)) and OR (a/b) are similar in when the occurrence is low • RR is more intuitive, OR is more universal, commonly used in e.g. logistic regression • It is always necessary to determine which variable will be independent and which one dependent www.statpearls.com Test 2x2 contingency table More categories/measurements t Paired McNemar test Cochran Qtest (Binary data, more measurements) Sign test (ordinal data, two measurements Unpaired Chi-square (x2) test* Fisher exact test Chi-square (x2) test* Cochran-Armitage test (table 3x2, ordinal data) t when H0 is rejected, a serie of tests for 2 x 2 tables with appropriate multiple testing correction must follow * under the assumption of certain minimal counts in each cell of the table (cca n > 5) Example The study aimed to investigate an association between the blood group in ABO system (A, B, AB and 0) and the presence of acute complications of the blood transfusion. How many fields does the respective contingency table have? Ranking Response Votes Correct Answer 1 2 3 4 Others Example In the previous case, x2 test yielded p < 0.05 and a series of post hoc tests for 2x2 tables „each with each" followed. One of the tests showed higher number of complications in the patients with AB group compared to the A group, p = 0.05 (5 %). How will the p-value change when Bonferroni correction is applied (p, not a-value is corrected here)? The result should be in percents (a natural number), eventually rounded to percents. Ranking Response Votes Correct Answer 1 2 3 4 Others Regression models ^Regression towards the mean" (Francis Galton) - but methods already by Friedrich Gauss The goal is to estimate a value of modelled variable (dependent variable = regressand) using other known parameters (factors = regressors - categorical or continuous variables) The contribution of individual factors may be assessed separately (univariate models) or together in mutual interaction (multivariate models) For each factor, its effect size with confidence intervals may be determined (usually 95 %, i.e. where is the value with 95 % probability) Assumption: factors are independent Most often: • Linear regression (dependent variable is continuous - e.g. fasting plasma glucose) • Logistic regression (dependent variable is Binary - e.g. disease)) • Cox regression (dependent variable is survival - survival time and endpoint) Assessing the contribution of factors • Linear regression - regression coefficient 3 (standardized, unstandardized) and 95% confidence interval (CI) • Unlike in correlation, it is important which variable is independent and which one dependent • When the regressor is categorical, it is ANOVA in fact • Logistic regression - OR and 95% CI • Cox regression - hazard ratio (HR) and 95% CI Interpretation of regression models • When 3 ± 95% CI includes 0, the contribution of the factor is not significant (under 0, the value of outcome is decreased, over 0 is increased) • In OR and HR, same is true when 95% CI include 1 (under 1, the probability of an outcome is decreased, over 1 is increased) • 95% CI can thus replace the p-value • When the independent variable is categorical, one category has to be set as the reference one and regression coefficients / OR / HR are attributed to each other category • When the independent variable is continuous, 3 / OR / HR corresponds to 1 unit (e.g. 1 year of age - assumes linear effect, otherwise it is better to categorize) Choose the right statement In a cross-sectional study including 700 hospitalized patients between 80 - 90 years of age, signs of cognitive disability were found in 40 %. Association with potential risk factors (age, hypertension, diabetes) was assessed using univariate logistic regression. A presence of cognitive disability was consecutively associated with: age (OR = 1.20; 95 % CI = 1.12 - 1.40 per each year of age), hypertension (OR 1.40; 95 % CI 1.20 - 1.78) and diabetes (OR 2.80; 95 % CI 2.00 - 6.40). A. Age is not a statistically significant factor for cognitive disability B. A probability of cognitive disability occurrence is two times higher in diabetics than in hypertensive patients C. Age, diabetes and hypertension are mutually independent risk factors D. When we test a statistical significance of associations, the p -value is < 0.05 in all cases E. We may conclude that the factors lead to cognitive dysfunction. What to do with the ordinal data? • Tests for categorical data, ANOVA (but: we ignore the order) • Nonparametric tests (with many categories) • Dichotomization and tests for binary data (often in medicine) • Special tests - Cochran-Armitage (typically genetics), sign test Survival analysis • Group of methods assessing an occurrence of given even (endpoint) in typically decreasing number of study group members „survivors" • What is assessed? • Endpoint • Occurs only once (if it can occur more times, the first occurrence is usually assessed) • Censored data • still alive at the end of the study (event did not occur) • lost from the study • died for another cause • Time of follow-up (survival time) Methods of survival analysis • Life-tables • Kaplan-Meier graphs • Log rank test • Gehan-Wilcoxon's test • Cox regression Kaplan-Meier surviva curve 0,1 I—1 Survival Function o Complete + Censored 50 100 150 200 250 Survival Time [months] 300 350 400 1,0 - > > CO c o ■c o Q. o > ■-4-> JO E Ü 0,9 - 0,8 - 0,7 - 0,6 - 0,5 0 Cumulative Proportion Surviving (Kaplan-Meier) o Complete + Censored No atherosclerosis vs. insignificant atheroscleros s: p = 6. 10"3 c j. G 6---o-. i U ■1 1 O, No at I- lerosclen o-DSiS VS. ( 0- + :ad: p = 2.10"5 o 1 O. k 1 _I insignil Meant atherosclerosis vs. C J AD: p = NS Tests for survival log-rank test Gehan-Wilcoxon test 6 8 10 Time [years] 12 14 16 18 - cad ...... Insignificant atherosclerosis — No atherosclerosis Choose the right answer... Four patients enrolled to the study investigating the re-occurrence of myocardial infarction (endpoint). In following years, subsequent events took place consecutively: one patients moved to Argentina and was thus lost from follow-up, one suffered the infarction and next month he died during a car accident, further one died of lung cancer and the last one lived until the end of the study in full health. The last point of Kaplan-Meier curve is at the value of: 11111 Cluster analysis • multidimensional analysis (1 parameter = 1 dimension) • measure of distance • amalgamation algorithm • data standardization is necessary to assess different parameters togeth (to unify scales, all parameters are expressed in the same units: a of the distribution - i.e. the z-score; mean = 0) • k means clustering • hierarchical tree (dendrogram) Choose the right answer. A lonely island is visited by anthropologists, who discover human skulls of unknown origin there. They use the cluster analysis to assign them to some of the human populations nearby. Besides the genetic markers, they also measure the cranial index (in percents, mean = 85, SD = 10), facial index in percents, mean = 80, SD = 5) and the braincase volume (in cm3, mean = 1500, SD = 200). What happens if the data are not standardized before the analysis: 11111 A. Nothing, standardization is used for better visualization of the data. B. The braincase volume will not be relevant for the analysis. C. Cluster analysis will not be technically possible. D. The assignment to a cluster will depend mainly on the braincase volume. E. The mutual correlation of cranial and facial index will increase.