Chris Fraley and Simine Vizare (F&V) published a very interesting paper in PlosOne where they propose to evaluate journals based on the sample size and statistical power of the studies.
Chris Fraley and Simine Vizare (F&V) published a very interesting paper in PlosOne where they propose to evaluate journals based on the sample size and statistical power of the studies.

[a slightly adapted version of this blog post is now in press at QJEP: see https://osf.io/ycag9/ for the manuscript and R scripts] In this blog, I'll explain how p-hacking will not lead to a peculiar prevalence of p-values just below .05 (e.g., in the 0.045-0.05 range) in the literature at large, but will instead lead to a difficult to identify increase in the Type 1 error rate across the 0.00-0.05 range.

I'd like to gratefully acknowledge the extremely helpful comments and suggestions by Anton Kühberger and his co-authors on an earlier draft of this blog post, who patiently answered my questions and shared additional analyses. I'd also like to thank Marcel van Assen, Christina Bergmann, JP de Ruiter, and Uri Simonsohn for comments and suggestions.

This Thursday I’ll be giving a workshop on good research practices in Leuven, Belgium. The other guest speaker at the workshop is Eric-Jan Wagenmakers, so I thought I’d finally dive in to the relationship between Bayes Factors and p -values to be prepared to talk in the same workshop as such an expert on Bayesian statistics and methodology.

I have no idea how many people take the effort to reproduce a meta-analysis in their spare time. What I do know, based on my personal experiences of the last week, is that A) it’s too much work to reproduce a meta-analysis, primarily due to low reporting standards, and B) we need to raise the bar when doing meta-analyses.

I'll be teaching my first statistics lectures this September together with +Chris Snijders. While preparing some assignments, I gained a new appreciation for clearly explaining the basics of statistics.
Recently some statisticians have argued we have to lower the widely used p < .05 threshold. David Colquhoun got me thinking about this by posting a manuscript here, but Valen Johnson’s paper in PNAS is probably better known. They both suggest a p < .001 threshold would lower the false discovery rate.

You might have looked at your data while the data collection was still in progress, and have been tempted to stop the study because the result was already significant. Alternatively, you might have analyzed your data, only to find the result was not yet significant, and decided to collect additional data. There are good ethical arguments to do this.
Most of this post is inspired by a lecture on probabilities by Ellen Evers during a PhD workshop we taught (together with Job van Wolferen and Anna van ‘t Veer) called ‘How do we know what’s likely to be true’. I’d heard this lecture before (we taught the same workshop at Eindhoven a year ago) but now she extended her talk to the probability of observing a mix of significant an non-significant findings.
An often heard criticism of null-hypothesis significance testing is that the null is always false. The idea is that average differences between two samples will never be exactly zero (there will practically always be a tiny difference, even if it is only 0.001). Furthermore, if the sample size is large enough, tiny differences can be statistically significant. Both these statements are correct, but they do not mean the null is never true.
[Now with update for STATA by my colleague +Chris Snijders] [Now with update about using the MBESS package for within-subject designs] [Now with an update on using ESCI] Confidence intervals are confusing intervals. I have nightmares where my students will ask me what they are, and then I try to define them, and I mumble something nonsensical, and they all point at me and laugh.