Controversies in vaping statistics, leading to a general discussion of dispute resolution in science

Episode 2

Brad Rodu writes:

The Journal of the American Heart Association on June 5, 2019, published a bogus research article, “Electronic cigarette use and myocardial infarction among adults in the US Population Assessment of Tobacco and Health [PATH],” by Dharma N. Bhatta and Stanton A. Glantz (here).

Drs. Bhatta and Glantz used PATH Wave 1 survey data to claim that e-cigarette use caused heart attacks. However, the public use data shows that 11 of the 38 current e-cigarette users in their study had a heart attack years before they first started using e-cigarettes.

The article misrepresents the research record; presents a demonstrably inaccurate analysis; and omits critical information with respect to (a) when survey participants were first told that they had a heart attack, and (b) when participants first started using e-cigarettes. The article represents a significant departure from accepted research practices.

For more background, see this news article by Jayne O’Donnell, “Study linking vaping to heart attacks muddied amid spat between two tobacco researchers,” which discusses the controversy and also gives some background on Rodu and Glantz.

I was curious, so I followed the instructions on Rodu’s blog to download the data and run the R script. I did not try to follow all the code; I just ran it. Here’s what pops up:

This indeed appears consistent with Rodu’s statement that “11 of the 38 current e-cigarette users were first told that they had a heart attack years before they started using e-cigarettes.” The above table only has 34 people, not 38; I asked Rodu about this and he said he that the table doesn’t include the 4 participants who had missing info on age at first heart attack or age at first use of e-cigarettes.

How does this relate to the published paper by Bhatta and Glantz? I clicked on the link and took a look.

Here’s the relevant data discussion from Bhatta and Glantz:

As discussed above, we cannot infer temporality from the cross‐sectional finding that e‐cigarette use is associated with having had an MI and it is possible that first MIs occurred before e‐cigarette use. PATH Wave 1 was conducted in 2013 to 2014, only a few years after e‐cigarettes started gaining popularity on the US market around 2007. To address this problem we used the PATH questions “How old were you when you were first told you had a heart attack (also called a myocardial infarction) or needed bypass surgery?” and the age when respondents started using e‐cigarettes and cigarettes (1) for the very first time, (2) fairly regularly, and (3) every day. We used current age and age of first MI to select only those people who had their first MIs at or after 2007 (Table S6). While the point estimates for the e‐cigarette effects (as well as other variables) remained about the same as for the entire sample, these estimates were no longer statistically significant because of a small number of MIs among e‐cigarette users after 2007. . . .

And here’s the relevant table (from an earlier version of Bhatta and Glantz, sent to me by Rodu):

699 patients with MI’s, of whom 38 were vaping.

Table 1 of the paper shows the descriptive statistics at Wave 1 baseline; 643 (2.4%) adults reported that they had a myocardial infarction. Out of those 643 people, a weighted 10.2% were former e-cigarette users, 1.6% some day e-cigarette users, and 1.5% some-day cigarette users. 1.6% + 1.5% = 3.1%, and 3.1% * 643 = 20, not 34 or 38. It seems that the discrepancy here arises from comparing weighted proportions with raw numbers, an issue that often arises with survey data and does not necessarily imply any problems with the published analysis.

But Rodu’s criticism seems more serious. Bhatta and Glantz are making causal claims based on correlation between heart problems and e-cigarette use, so it does seem like it would be appropriate for them to exclude from their analysis the people who didn’t start e-cigarette use until after their heart attacks. Even had they done this, I could see concerns with any results—the confounding with cigarette smoking is the 800-pound gorilla in the room, and any attempt to adjust for this confounding will necessarily depend strongly on the model being used for this adjustment—but removing those 11 people from the analysis, that seems like a freebie.

Is it appropriate for Rodu to describe Bhatta and Glantz’s article as “bogus”? That seems a bit strong. It seems like a real article with a data issue that Rodu found, and the solution would seem to be to perform a corrected analysis removing the data from the people who had heart problems before they started vaping. This won’t make the resulting findings bulletproof but it will at least fix this one problem, and that’s something. One step at a time, right?

Episode 1

Rodu has had earlier clashes with this research group.

Last year, he sent me the following email:

An article recently published in the journal Pediatrics claimed that teen experimental smokers who were e-cigarette triers or past-30-day users at baseline were more likely to be regular smokers one year later than experimental smokers who hadn’t used e-cigs. The authors used regression analysis of a publicly available longitudinal FDA survey dataset (baseline ~2013, follow-up survey one year later). Although the authors used lifetime cigarette consumption to restrict their study to experimental smokers at baseline (LCC ranging from one puff but never a whole cigarette to 99 cigarettes), they ignored this baseline variable as a confounder in their analysis. When I reproduced their analysis and added the LCC variable, the positive results for e-cigarettes essentially disappeared, negating the authors’ core claim.

I [Rodu] called in my blog (here and here) for retraction of this study because the analysis was fatally flawed, and I published a comment on the journal’s website (here). The authors dismissed my criticism, responding with the strange explanation that LCC at baseline is a mediator rather than a confounder. The journal editors apparently believe that the authors’ response is adequate; I believe it is nonsensical.

I believe that this study uses faulty statistics to make unfounded causal claims that will be used to justify public health policies and regulatory actions.

Rodu added:

In my second blog post (here), I stated that “Chaffee et al. called our addition of the LCC information a ‘statistical trick.’” They used that term in a response appearing on the Pediatrics website from March 30 to April 23 (here, courtesy of Wayback Machine). Yesterday a completely new response appeared with the same March 30 date; “statistical trick” disappeared and “mediator” appeared (here).

I agree with Rodu that in this study you should be adjusting for lifetime cigarette consumption at baseline. How exactly to perform this adjustment is a statistical and substantive question, but I’m inclined to agree that not performing the adjustment is a mistake. So, yeah, this seems like a problem. Also, a pre-treatment exposure variable is not a mediator, and “statistical tricks” are OK by me!

I was curious enough about this to want to dig in more—if nothing else, this seemed like a great example of measurement error in regression and the perils of partial adjustment for a confounder. It can be good to work on a live example where there is active controversy, rather than reanalyzing the Electric Company example and the LaLonde data one more time.

So I asked Rodu for the data, and shared it with some colleagues. Unfortunately we got tangled in the details—this often happens with real survey data! We contacted the authors of the paper in question to clear up some questions, and they, like Rodu, were very helpful. Everyone involved was direct and open. However, the data were still a mess and eventually we gave up trying to figure out exactly what was happening. As far as I’m concerned, this is still an open problem, and a student with some persistence should be able to get this all to work.

So, for now, I’d say that Rodu’s statistical point is valid and that the authors should redo the analysis as he suggests. Or maybe some third party can do so, if they’re willing to put in the effort.

Where there’s smoking, there’s fire

Tobacco research is a mess, and it’s been a mess forever.

On one side, you have industry-funded work. Notoriously, in past decades the cigarette industry was not just sponsoring biased studies (forking paths, file drawers, etc.); they were actively spreading disinformation, purposely polluting scientific and public discourse with the goal of delaying or reducing the impact of public awareness of the dangers of smoking, and delaying or reducing the impact of public regulation of cigarettes and smoking.

On the other side, the malign effects of smoking, and the addictive nature of nicotine, have been known for so long that anti-smoking studies are sometimes not subject to strict scrutiny. Anti-smoking researchers are the good guys, right?

There’s still a lot of debate about second-hand smoke, and I don’t really know what to think. Being trapped in a car with two heavy smokers is one thing; working in a large office space where one or two people are smoking is something much less.

There are similar controversies regarding studies of social behavior. When, a couple decades ago, cities started banning smoking in restaurants, bars, and other indoor places, there were lots of people who were saying this was a bad idea, Prohibition Doesn’t Work, etc.—but it seems that these indoor smoking bans worked fine. Lots of smokers wanted to quit and didn’t mind the inconvenience.

So, moving to these recent disputes: both sides are starting with strong positions and potential conflicts of interests. But these data questions are specific enough that they should be resolvable.

How to resolve scientific disputes?

But this gets us to the other problem with science, which is that it does not have clear mechanisms for dispute resolution. As we’ve discussed many times in this space, retraction is not scalable, twitter fights are a disaster, we can’t rely on funding agencies to save us—certainly not in this example!

I get lots of emails from people who see me as a sort of court of last resort, a trusted third party who will look at the evidence and report my conclusions without fear or favor, and that’s fine—but I’m just one person, and I make mistakes too!

One could imagine some sort of loose confederation of vetters—various people like me who’d look at the evidence in individual disputes. But is that scalable? And if it became more formal, I’d be concerned that it would be subject to the same distortions regarding the power structure. Can you imagine: a dispute-resolution committee in social psychology, under the supervision of Robert Sternberg, Susan Fiske, and the editorial board of Perspectives in Psychological Science? Fox in the goddamn chicken coop.

It may be that, right now, Pubpeer is the best thing going, and maybe it can be souped up in some way to be even more useful. I have some concern that Pubpeer can be gamed in the same way as Amazon reviews—but even a gamed Pubpeer could be better than nothing.