Přednáška 9:
Replikovatelnost výzkumu a metaanalýza
31. 11. 2020 | PSYn4790 | Psychometrika: Měření v psychologii
Katedra psychologie, Fakulta sociálních studií MU
Hynek Cígler & Vít Gabrhel | hynek.cigler@mail.muni.cz
„No isolated experiment, however
significant in itself, can suffice for the
experimental demonstration of any
natural phenomenon.“
FISHER, 1971, S. 13
Fisher, R. (1971). The Design of Experiments. New York: Hafner Publishing Company.
Metaanalýza
Meta-analýza
„Věda má kumulativní povahu, ke studiím však přistupujeme
nikoli jako jedné z mnoha, ale izolovaně, stojícími o sobě.“
◦ (Chalmers, cit. dle Borenstein et al., 2009)
Narativní review
◦ Expert shrne poznatky k danému tématu a dojde k závěru.
◦ Subjektivita
◦ Proces rozhodování není popsán (není replikovatelný).
◦ Omezení při velkém množství zdrojů.
◦ Nedostatečné narativní postižení variability velikostí účinku.
Meta-analýza
Od 90. let přechod k meta-analýze a systematické review.
◦ Proces systematického vyhledávání, hodnocení a následné syntézy dat z velkého počtu zdrojů.
Systematická review.
◦ Jasně definovaná kritéria pro volbu studií a transparentní popis.
◦ Volba kritérií stále zahrnuje určitou míru subjektivity.
◦ Obvykle zahrnuje meta-analýzu.
Meta-analýza.
◦ Statistická syntéza předchozího výzkumu .
◦ Význam té které studie je dán podle vnějších (matematických) pravidel.
◦ Cílem je odhad „souhrnné velikosti efektu“.
Meta-analýza: Pojmy
Velikost účinku (Effect size)
◦ Summary effect – vážený průměr velikostí účinku dle stanovených pravidel.
◦ Jde vlastně o odhad „skutečného efektu“ (true effect).
Přesnost summary effectu: celkové N.
Váha dílčích studií: N dané studie.
Homogenita: Míra konzistence napříč studiemi.
Signifikance summary effectu: často i grafická interpretace.
Meta-analýza: Příklad
Meta-analýza: Příklad
Meta-analýza: potíže a řešení
Lze jedinou oblast výzkumnou oblast zastoupit jedním číslem?
◦ Zkoumáme jeden (fixed) efekt nebo populaci (random) efektů?
Zdrojové studie.
◦ Zkreslené původní studie, vynechání důležitých studií. Garbage in, garbage out.
◦ Srovnávání nesrovnatelného?
◦ Rozdílné velikost efektů a interpretace testů.
Úroveň realizovaných meta-analýz.
◦ Nedostatečná kontrola kvality původních studií a korekce na publikační zkreslení.
Analytická vs. explorační meta-analýza.
Meta-analýza je kvalitní do té míry, do jaké jsou kvalitní individuální studie.
Meta-analýza: Funnel-plot
Meta-analýza: Funnel-plot
Zdrojové studie?
Silná preference statisticky signifikantních výsledků.
◦ 92 % publikovaných výsledků v psychologii je statisticky signifikantních (Fanelli, 2010)
◦ Nárůst zejména v období mezi lety 1990 a 2007 (Fanelli, 2012).
→ Konfirmační zkreslení (confirmation bias in publication).
Bakker, Van Dijk, & Wicherts (2012): 13 meta-analýz s 281 studiemi.
◦ Medián N = 40; Statistická síla = 0,35; d = 0,5.
Fraley & Marks (2007): Meta-analýza korelačních studií osobnosti
◦ Medián: N = 120, statistická síla = 0,65, r = 0,21.
„Consequently, if all effects reported in published studies were true, only 35%
would be replicable in similarly underpowered studies.“ (Asendorpf et al. 2013, s. 110)
Nic nového pod sluncem...
Cohen, J. (1962). The statistical power of abnormalsocial
psychological research: A review. The Journal
of Abnormal and Social Psychology, 65(3), 145–153.
doi:10.1037/h0045186
◦ Odhad replikovatelnosti: Statistická síla 50 %.
◦ Doporučení: Zvýšit sílu na 80 %.
A další...
Replikovatelnost
(psychologického) výzkumu
V. Gabrhel (asi, n.d.)
Radikální skepse I.
Radikální skepse II:
Estimating the reproducibility of psychological science
„We conducted a large-scale, collaborative
effort to obtain an initial estimate of the
reproducibility of psychological science.“
100 studií a výsledky jejich replikace
◦ Psychological Science
◦ Journal of Personality and Social Psychology
◦ Journal of Experimental Psychology: Learning,
Memory, and Cognition
Open Science Collaboration. (2015). Estimating
the reproducibility of psychological science.
Science, 349(6251), aac4716.
https://doi.org/10.1126/science.aac4716
Alexander A. Aarts, Joanna E. Anderson, Christopher J. Anderson, Peter R. Attridge, Angela Attwood, Jordan Axt, Molly
Babel, Štěpán Bahník, Erica Baranski, Michael Barnett-Cowan,Elizabeth Bartmess, Jennifer Beer, Raoul Bell, Heather
Bentley, Leah Beyan, Grace Binion, Denny Borsboom, Annick Bosch, Frank A. Bosco, Sara D. Bowman, Mark J. Brandt, Erin
Braswell, Hilmar Brohmer, Benjamin T. Brown, Kristina Brown, Jovita Brüning, Ann Calhoun-Sauls, Shannon P.
Callahan, Elizabeth Chagnon, Jesse Chandler, Christopher R. Chartier, Felix Cheung, Cody D. Christopherson, Linda
Cillessen, Russ Clay, Hayley Cleary, Mark D. Cloud, Michael Cohn, Johanna Cohoon,Simon Columbus, Andreas Cordes, Giulio
Costantini, Leslie D. Cramblet Alvarez, Ed Cremata, Jan Crusius, Jamie DeCoster, Michelle A. DeGaetano, Nicolás Della
Penna, Bobby den Bezemer, Marie K. Deserno, Olivia Devitt, Laura Dewitte, David G. Dobolyi, Geneva T. Dodson, M. Brent
Donnellan, Ryan Donohue, Rebecca A. Dore, Angela Dorrough, Anna Dreber, Michelle Dugas, Elizabeth W. Dunn, Kayleigh
Easey, Sylvia Eboigbe, Casey Eggleston, Jo Embley, Sacha Epskamp, Timothy M. Errington, Vivien Estel, Frank J.
Farach, Jenelle Feather, Anna Fedor, Belén Fernández-Castilla, Susann Fiedler, James G. Field, Stanka A. Fitneva, Taru
Flagan, Amanda L. Forest, Eskil Forsell, Joshua D. Foster, Michael C. Frank, Rebecca S. Frazier, Heather Fuchs, Philip
Gable, Jeff Galak,Elisa Maria Galliani, Anup Gampa, Sara Garcia, Douglas Gazarian, Elizabeth Gilbert, Roger GinerSorolla,
Andreas Glöckner, Lars Goellner, Jin X. Goh, Rebecca Goldberg, Patrick T. Goodbourn, Shauna GordonMcKeon,
Bryan Gorges, Jessie Gorges, Justin Goss, Jesse Graham, James A. Grange, Jeremy Gray, Chris Hartgerink, Joshua
Hartshorne, Fred Hasselman, Timothy Hayes, Emma Heikensten, Felix Henninger, John Hodsoll,Taylor Holubar, Gea
Hoogendoorn, Denise J. Humphries, Cathy O.-Y. Hung, Nathali Immelman, Vanessa C. Irsik, Georg Jahn, Frank Jäkel, Marc
Jekel, Magnus Johannesson, Larissa G. Johnson, David J. Johnson, Kate M. Johnson, William J. Johnston, Kai Jonas, Jennifer
A. Joy-Gaba, Heather Barry Kappes, Kim Kelso, Mallory C. Kidwell, Seung Kyung Kim, Matthew Kirkhart, Bennett
Kleinberg, Goran Knežević,Franziska Maria Kolorz, Jolanda J. Kossakowski, Robert Wilhelm Krause, Job Krijnen, Tim
Kuhlmann, Yoram K. Kunkels, Megan M. Kyc, Calvin K. Lai, Aamir Laique, Daniël Lakens,Kristin A. Lane, Bethany
Lassetter, Ljiljana B. Lazarević, Etienne P. LeBel, Key Jung Lee,Minha Lee, Kristi Lemm, Carmel A. Levitan, Melissa Lewis, Lin
Lin, Stephanie Lin,Matthias Lippold, Darren Loureiro, Ilse Luteijn, Sean Mackinnon, Heather N. Mainard,Denise C.
Marigold, Daniel P. Martin, Tylar Martinez, E.J. Masicampo, Josh Matacotta,Maya Mathur, Michael May, Nicole
Mechin, Pranjal Mehta, Johannes Meixner, Alissa Melinger, Jeremy K. Miller, Mallorie Miller, Katherine Moore, Marcus
Möschl, Matt Motyl, Stephanie M. Müller, Marcus Munafo, Koen I. Neijenhuijs, Taylor Nervi, Gandalf Nicolas, Gustav
Nilsonne, Brian A. Nosek, Michèle B. Nuijten, Catherine Olsson,Colleen Osborne, Lutz Ostkamp, Misha Pavel, Ian S. PentonVoak,
Olivia Perna, Cyril Pernet, Marco Perugini, R. Nathan Pipitone, Michael Pitts, Franziska Plessow, Jason M.
Prenoveau, Rima-Maria Rahal, Kate A. Ratliff, David Reinhard, Frank Renkewitz,Ashley A. Ricker, Anastasia Rigney, Andrew
M. Rivers, Mark Roebke, Abraham M. Rutchick, Robert S. Ryan, Onur Sahin, Anondah Saide, Gillian M. Sandstrom, David
Santos, Rebecca Saxe, René Schlegelmilch, Kathleen Schmidt, Sabine Scholz,Larissa Seibel, Dylan Faulkner
Selterman, Samuel Shaki, William B. Simpson, H. Colleen Sinclair, Jeanine L. M. Skorinko, Agnieszka Slowik, Joel S.
Snyder, Courtney Soderberg,Carina Sonnleitner, Nick Spencer, Jeffrey R. Spies, Sara Steegen, Stefan Stieger, Nina
Strohminger, Gavin B. Sullivan, Thomas Talhelm, Megan Tapia, Anniek te Dorsthorst,Manuela Thomae, Sarah L. Thomas, Pia
Tio, Frits Traets, Steve Tsang, Francis Tuerlinckx, Paul Turchan, Milan Valášek, Anna E. van 't Veer, Robbie Van Aert, Marcel
van Assen, Riet van Bork, Mathijs van de Ven, Don van den Bergh, Marije van der Hulst,Roel van Dooren, Johnny van
Doorn, Daan R. van Renswoude, Hedderik van Rijn, Wolf Vanpaemel, Alejandro Vásquez Echeverría, Melissa
Vazquez, Natalia Velez, Marieke Vermue, Mark Verschoor, Michelangelo Vianello, Martin Voracek, Gina Vuu, Eric-Jan
Wagenmakers, Joanneke Weerdmeester, Ashlee Welsh, Erin C. Westgate, Joeri Wissink,Michael Wood, Andy Woods, Emily
Wright, Sining Wu, Marcel Zeelenberg, Kellylynn Zuni
Radikální skepse II:
Estimating the reproducibility of psychological science
Původní velikost efektů:
◦ Průměrná velikost účinku
Mr = 0,403; SD = 0,188
◦ Statistická signifikance: 97 % studií p < 0,05
Replikovaná velikost efektů:
◦ Průměrná velikost účinku
Mr = 0,197; SD = 0,257
◦ Statistická signifikance: 36 % studií p < 0,05
Hodnota velikostí účinku z původních studií se
nacházela v 95% intervalu spolehlivosti při
replikaci v 47 % případů.
Počátky současné krize
DARYL J. BEM JOHN BARGH
Feeling the Future (2011) (Elderly) priming (2013)
Pochybné praktiky ve výzkumu
„In a poll of more than 2000 psychologists, prevalences of ‘Deciding whether to
collect more data after looking to see whether the results were significant’ and
‘Stopping data collection earlier than planned because one found the result that
one had been looking for’ were subjectively estimated at 61% and 39%,
respectively.“
◦ John, Loewenstein, & Prelec, cit. dle Asendorpf et al., 2013
Questionable research practices.
Podvodné vs. pochybné jednání?
◦ Fraud is typically limited to cases in which researchers create false data.
◦ In contrast, QRPs typically involve the exclusion of data that are inconsistent with a
theoretical hypothesis. QRPs are treated differently than fraud because QRPs can sometimes
be used for legitimate purposes. (John, Loewenstein, & Prelec, 2012)
Reproducibility, replicability, generalizability
Reproducibility (Reprodukovatelnost)
◦ „Researcher B must have the following: (a) the raw data; (b) the code book (variable names and labels,
value labels, and codes formissing data); and (c) knowledge of the analyses that were performed by
Researcher A (e.g. the syntax of a statistics program).“
Replicability (Replikovatelnost)
◦ „The finding can be obtained with other random samples drawn from a multidimensional space that
captures the most important facets of the research design. In psychology, the facets typically include the
following: (a) individuals (or dyads or groups); (b) situations (natural or experimental); (c)
operationalizations (experimental manipulations, methods, and measures); and (d) time points.“
Generalizability (Zobecnitelnost)
◦ „It does not depend on an originally unmeasured variable that has a systematic effect. In psychology,
generalizability is often demonstrated by showing that a potential moderator variable has no effect on
a group difference or correlation.“
Asendorpf et al. (2013)
Kde je zakopaný pes?
(John, Loewenstein, & Prelec, 2012) (Simmons, Nelson, & Simonsohn, 2011)
Příklady nereplikovatelných efektů
Priming (social priming).
◦ elderly priming, MacBeth effect, cleanliness priming, money priming...
Ego deplation (vyčerpání ega).
Power posing
Vybrané aspekty facial-feedback hypothesis
◦ „smiling will make you feel happier“
Marshmallow test
Příklady nereplikovatelných efektů
Priming (social priming).
◦ elderly priming, MacBeth effect, cleanliness priming, money priming...
Ego deplation (vyčerpání ega).
Power posing
Vybrané aspekty facial-feedback hypothesis
◦ „smiling will make you feel happier“
Marshmallow test
A Multilab Preregistered Replication of the
Ego-Depletion Effect
◦ Hagger, M. S., et al. (2016). A Multilab
Preregistered Replication of the EgoDepletion
Effect. Perspectives on
Psychological Science, 11(4), 546–573.
„Although a meta-analysis of egodepletion
experiments found a
medium-sized effect, subsequent metaanalyses
have questioned the size and
existence of the effect and identified
instances of possible bias. [...] Multiple
laboratories (k = 23, total N = 2,141)
conducted replications of a
standardized ego-depletion protocol
[...] the size of the ego-depletion effect
was small with 95% confidence
intervals (CIs) that encompassed zero
(d = 0.04, 95% CI [−0.07, 0.15].“
Many Labs 1
◦ Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R.
B., Jr., Bahník, Š., Bernstein, M. J., . . . Nosek, B.
A. (2014). Investigating variation in replicability:
A “many labs” replication project. Social
Psychology, 45(3), 142-152.
„This research tested variation in the
replicability of 13 classic and contemporary
effects across 36 independent samples totaling
6,344 participants [...] We compared whether
the conditions such as lab versus online or US
versus international sample predicted effect
magnitudes. By and large they did not.“
Many Labs 2
◦ Klein, R. A., et al. (2018). Many Labs 2:
Investigating Variation in Replicability Across
Samples and Settings. Advances in Methods and
Practices in Psychological Science, 1(4), 443–
490.
„Across settings, the Q statistic indicated significant
heterogeneity in 11 (39%) of the replication effects,
and most of those were among the findings with the
largest overall effect sizes; only 1 effect that was near
zero in the aggregate showed significant
heterogeneity according to this measure. [...]
Moderation tests indicated that very little
heterogeneity was attributable to the order in which
the tasks were performed or whether the tasks were
administered in lab versus online. [...] Cumulatively,
variability in the observed effect sizes was
attributable more to the effect being studied than to
the sample or setting in which it was studied.“
Many Labs 4
◦ Klein, R. A., et al. (2019, December 11).
Many Labs 4: Failure to Replicate Mortality
Salience Effect With and Without Original
Author Involvement.
◦ preprint
„We (N = 21 Labs and N = 2,220 participants)
experimentally tested whether original author
involvement improved replicability of a classic
finding from Terror Management Theory
(Greenberg et al., 1994). Our results were nondiagnostic
of whether original author involvement
improves replicability because we were unable to
replicate the finding under any conditions. This
suggests that the original finding was either a
false positive or the conditions necessary to obtain
it are not yet understood or no longer exist.“
Many Labs 5
◦ Ebersole, C.R., et al. (2020). Many Labs 5:
Testing Pre-Data-Collection Peer Review as
an Intervention to Increase Replicability.
Advances in Methods and Practices in
Psychological Science, 3(3), 309–331.
„If these [replication] studies use methods that are
unfaithful to the original study or ineffective in eliciting
the phenomenon of interest, then a failure to replicate
may be a failure of the protocol rather than a challenge
to the original finding. Formal pre-data-collection peer
review by experts may address shortcomings and
increase replicability rates. [...] Overall, following the
preregistered analysis plan, we found that the revised
protocols produced effect sizes similar to those of the
RP:P protocols (Δr = .002 or .014, depending on analytic
approach).“
Replikační krize nejen v psychologii.
◦ Kaplan, R.M., Irvin, V.L. (2015). Likelihood of Null
Effects of Large NHLBI Clinical Trials Has
Increased over Time. PLoS ONE 10(8): e0132382.
„We identified all large NHLBI supported RCTs
between 1970 and 2012 evaluating drugs or
dietary supplements for the treatment or
prevention of cardiovascular disease. Trials
were included if direct costs >$500,000/year,
participants were adult humans, and the
primary outcome was cardiovascular risk,
disease or death. [...] The number NHLBI trials
reporting positive results declined after the
year 2000. Prospective declaration of
outcomes in RCTs, and the adoption of
transparent reporting standards, as required
by clinicaltrials.gov, may have contributed to
the trend toward null findings.“
Disclaimer
Susan Fiske: „Metodological terrorism“, „self-appointed data police“.
Kontroverze.
Ztráta důvěry ve vědu.
Osobní zodpovědnost výzkumníků?
„Tak se to dělalo...“
Běžná praxe.
A Multilab Preregistered Replication of the
Ego-Depletion Effect
◦ Hagger, M. S., et al. (2016). A Multilab
Preregistered Replication of the EgoDepletion
Effect. Perspectives on
Psychological Science, 11(4), 546–573.
A multi-site preregistered paradigmatic test of the
ego depletion effect
◦ Vohs, K. (2020, November). A multi-site
preregistered paradigmatic test of the ego
depletion effect. Psychological Science.
◦ Preprint.
„We conducted a preregistered multi-laboratory
project (k = 36; N = 3531) to assess the size and
robustness of ego depletion effects using a novel
replication method, termed the paradigmatic
replication approach. [...] non-significant result, d =
0.06. Confirmatory Bayesian meta-analyses using an
informed prior hypothesis (δ = 0.30; SD = 0.15) found
the data were four times more likely under the null
than the alternative hypothesis. Hence, preregistered
analyses did not find evidence for a depletion effect.“
Doporučení: Design a analýza
Zmenšit chybu měření
◦ Zvýšením velikosti vzorku
◦ Zvýšením statistické síly
◦ Zvýšením reliability měřícího nástroje
◦ Korektním užíváním korekcí pro vícenásobná srovnání
◦ Užívání postupů typu Bonferroniho korekce snižuje statistickou sílu
Od "p < 0,05" k...
◦ ... reportování skutečné velikosti "p"
◦ ... důrazu na ukazatele velikosti účinku
◦ ... důrazu na intervaly spolehlivosti apod.
Doporučení: Publikační proces
Autoři studií, výzkumníci: transparence.
◦ Literature review ve vztahu k dosavadnímu stavu replikace.
◦ Existují dřívější replikační studie? Podařilo se původní výsledek replikovat? Apod.
◦ Zdůvodnění volby velikosti vzorku
◦ Zveřejnění dat, postupů analýz, work-in-progress, pre-registrací
◦ Provádění replikací, účast na diskuzích odborné veřejnosti atd.
Žurnály, recenzenti, editoři: Podpora dobrých výzkumných praktik.
◦ Publikování replikací a podpora autorů v této činnosti
◦ Ústup od konfirmačního zkreslení v publikačním procesu
Doporučení: Vyučující metodologie
Aneb: Co mají studenti chtít po svých učitelích?
Rigorózní výuka metodologie, statistické analýzy dat apod.
◦ Statistická síla, velikost účinku, zobecnitelnost atd.
◦ Informace o replikovatelnosti efektů při výuce jiných kurzů.
Podpora transparentnosti.
◦ Publikování dat, skriptů apod., analýza takovýchto souborů.
Podpora studentských replikací.
◦ Přínos pro studenty i pro obor.
Podpora kritického myšlení.
◦ Obsahuje studie veškeré podstatné informace? Zvolili výzkumníci vhodnou proceduru pro ověření
stanovené hypotézy? Jsou závěry korektně interpretovány?
◦ Na úrovni jednotlivých studií i v rámci meta-analýz
Doporučení: Instituce
Změna Publish or Perish politiky:
◦ Počet publikací a impact faktor jako rozhodující proměnná při přidělování
grantů, přijetí do zaměstnání či kariérním postupu
Alternativy:
◦ Oceňování a podpora replikační činnosti
◦ Vynaložení části prostředků v rámci výzkumu na replikaci
Doporučení: Obor
Přesun od efektů k teoriím.
Přesun od dílčích studií k agregaci výzkumného poznání.
Větší důraz na způsob, kvalitu a podstatu měření.
◦ Vzhledem k měřenému atributu.
Větší míra standardizace výzkumných nástrojů.
Adekvátní statistické postupy.
Příklady dobré praxe
https://www.cos.io/initiatives/prereg
https://aspredicted.org/
Velikost vzorku
Používání větších datových
souborů.
Pečlivá power-analýza.
A 21 Word Solution
Simmons, Joseph P. and Nelson, Leif D. and
Simonsohn, Uri, A (2012 ). 21 Word Solution.
SSRN. http://dx.doi.org/10.2139/ssrn.2160588
Kontrola předchozích zjištění
P-CHECKER P-HACKER