Shifting the Evidence

An excellent paper published a few years ago, Sifting the Evidence, highlighted many of the problems inherent in significance testing, and the use of P-values. One particular problem highlighted was the use of arbitrary thresholds (typically P < 0.05) to divide results into “significant” and “non-significant”. More recently, there has been a lot of coverage of the problems of reproducibility in science, and in particular distinguishing true effects from false positives. Confusion about what P-values actually tell us may contribute to this.

It is often not made clear whether research is exploratory or confirmatory. This distinction is now commonly made in genetic epidemiology, where individual studies routinely report “discovery” and “replication” samples. That in itself is helpful – it’s all too common for post-hoc analyses (e.g., of sub-groups within a sample) to be described as having been based on a priori hypotheses. This is sometimes called HARKing (Hypothesising After the Results are Known), which can make it seem like results were expected (and therefore more likely to be true), when in fact they were unexpected (and therefore less likely to be true). In other words, a P-value alone is often not very informative in telling us whether an observed effect is likely to be true – we also need to take into account whether it conforms with our prior expectations.

statisticalpower

One way we can do this is by taking into account the pre-study probability that the effect or association being investigated is real. This is difficult of course, because we can’t know this with certainty. However, what we perhaps can estimate is the extent to which a study is exploratory (the first to address a particular question, or use a newly-developed methodology) or confirmatory (the latest in a long series of studies addressing the same basic question). Broer et al (2013) describe a simple way to take this into account and increase the likelihood that a reported finding is actually true. Their basic point is that the likelihood that a claimed finding is actually true (which they call the positive predictive value, or PPV) is related to three things: the prior probability (i.e., whether the study is exploratory or confirmatory), the statistical power (i.e., the probability of finding an effect if it really exists), and the Type I error rate (i.e., the P-value or significance threshold used). We have recently described the problems associated with low statistical power in neuroscience (Button et al., 2013).

What Broer and colleagues show is that if we adjust the P-value threshold we use, depending on whether a study is exploratory or confirmatory, we can dramatically increase the likelihood that a claimed finding is true. For highly exploratory research, with a very low prior probability, they suggest a P-value of 1 × 10-7. Where the prior probability is uncertain or difficult to estimate, they suggest a value of 1 × 10-5. Only for highly confirmatory research, where the prior probability is high, do they suggest that a “conventional” value of 0.05 is appropriate.

Psychologists are notorious for having an unhealthy fixation on P-values, and particularly the 0.05 threshold. This is unhelpful for lots of reasons, and many journals now discourage or even ban the use of the word “significant”. The genetics literature that Broer and colleagues draw on has learned these lessons from bitter experience. However, if we are going to use thresholds, it makes sense that these reflect the exploratory or confirmatory nature of our research question. Fewer findings might pass these new thresholds, but those that do will be much more likely to be true.

References:

Broer L, Lill CM, Schuur M, Amin N, Roehr JT, Bertram L, Ioannidis JP, van Duijn CM. (2013). Distinguishing true from false positives in genomic studies: p values. Eur J Epidemiol; 28(2): 131-8.

Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, Munafò MR. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci; 14(5): 365-76.

Posted by Marcus Munafo and thanks to Mark Stokes at Oxford University for the ‘Statistical power is truth power’ image.

Health Technology Assessment report finds computer and other electronic aids can help people stop smoking

Smoking continues to be the greatest single preventable cause of premature illness and death in developed countries. Although rates of smoking have fallen, over 20% of the adult population in the UK continues to smoke. Anything which can be done to help people stop smoking will therefore have substantial public health benefits.

More and more people now have access to computers and other electronic devices (such as mobile ‘phones), and there is growing interest in whether these can be used to prompt or support attempts to stop smoking. This could be by providing a prompt to quit, reaching smokers who would otherwise use no support, and/or supporting the degree to which people use their smoking cessation medication (e.g., nicotine replacement therapy).

A recent Health Technology Assessment review assessed the effectiveness of internet sites, computer programs, mobile telephone text messages and other electronic aids for helping smokers to quit, and/or to reduce relapse to smoking among those who had quit.

Methods

The reviewers conducted a systematic review of the literature from 1980 to 2009 and found 60 randomised controlled trials (RCTs) and quasi-RCTs evaluating smoking cessation programmes that utilised computer, internet, mobile telephone or other electronic aids. The review was restricted to studies of adult smokers.

The primary outcomes were smoking abstinence, measured in two ways: Point prevalence abstinence and prolonged abstinence. The first is typically available in more studies (because it is easier to measure) but a rather liberal measure of abstinence (since the smoker need only be abstinent at the point the assessment is made to count as having quit). The latter is more conservative (since it requires the smoker to have been abstinent for an extended period to count as having quit), and is generally the preferred measure. Smoking abstinence at the longest follow-up available in each study was used, again because this is most conservative.

Results

Combining the data from the 60 trials indicated that, overall, the use of computer and other electronic aids increased quit rates for both prolonged (pooled RR = 1.32, 95% CI 1.21 to 1.45) and point prevalence (pooled RR = 1.14, 95% CI 1.07 to 1.22) abstinence at longest follow-up,  compared with no intervention or generic self-help materials.

The authors also looked at whether studies which aided cessation differed from those which prompted cessation, and found no evidence of any difference in the effect size between these. The effectiveness of the interventions also did not appear to vary with respect to mode of delivery or the concurrent use non-electronic co-interventions (e.g., nicotine replacement therapies).

Conclusions

Computer and other electronic aids do indeed increase the likelihood of cessation compared with no intervention or generic self-help materials, but the effect is small

The review concluded that computer and other electronic aids do indeed increase the likelihood of cessation compared with no intervention or generic self-help materials, but the effect is small. However, even a small effect is likely to have important public health benefits, given the large number of people who smoke and the impact of smoking on health. The authors also note that uncertainty remains around the comparative effectiveness of different types of electronic intervention, which will require further study.

The authors argue that further research is needed on the relative benefits of different forms of delivery for electronic aids, the content of delivery, and the acceptability of these technologies for smoking cessation with subpopulations of smokers, particularly disadvantaged groups. More evidence is also required on how electronic aids developed and tested in research settings are applied in routine practice and in the community.

Link

Chen YF, Madan J, Welton N, Yahaya I, Aveyard P, Bauld L, Wang D, Fry-Smith A, Munafò MR. Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis (PDF). Health Technol Assess 2012; 16(38): 1-205, iii-v. doi: 10.3310/hta16380.

This article first appeared on the Mental Elf website on 11th March 2013 and is posted by Marcus Munafo