Denis Jabaudon writes:

I was thinking that perhaps you could help me with the following “paradox?” that I often find myself in when discussing with students (I am a basic neuroscientist and my unit of counting is usually cells or animals):

When performing a “pilot” study on say 5 animals, and finding an “almost significant” result, or a “trend”, why is it incorrect to add another 5 animals to that sample and to look at the P value then?

Notwithstanding inducing the bias towards false positive (we would not add 5 animals if there was no trend), which I understand, why would the correct procedure to start again from scratch with 10 animals?

Why do these first 5 results (or hundreds of patients depending on context) need to be discarded?

If you have any information on this it would be greatly appreciated; this is such a common practice that I’d like to have good arguments to counter it.

This one comes up a lot, in one form or another. My quick answer is as follows:

1. Statistical significance doesn’t answer any relevant question. Forget statistical significance and p-values. The goal is not to reject a null hypothesis; the goal is to estimate the treatment effect or some other parameter of your model.

2. You can do Bayesian analysis. Adding more data is just fine, you’ll just account for it in your posterior distribution. Further discussion here.

3. If you go long enough, you’ll eventually reach statistical significance at any specified level—but that’s fine. True effects are not zero (or, even if they are, there’s always systematic measurement error of one sort or another).

In short, adding more animals to your experiment is fine. The problem is in using statistical significance to make decisions about what to conclude from your data.