Should we eat 7 portions of fruit and veg a day?

Today's big health news story is a new study showing that eating 5 portions of fruit and veg a day is not enough, and that we all need to eat at least 7.

According to The Guardian, "Eating at least seven portions of fresh fruit and vegetables a day was linked to a 42% lower risk of death from all causes." That's one of my pet hates in health reporting right there: nothing you do lowers your risk of death from all causes. Your risk of death remains 100% no matter what you eat. Few things are more certain in medicine.

But pedantry aside, did the study really show that eating at least 7 portions prolongs life?

Well, no. There are a number of problems with the study.

The most obvious, and serious, problem with the study is that it is observational. What we are looking at here is correlation, and as I'm sure we all know, correlation does not equal causation. The study found that people who ate at least 7 portions of fruit and veg a day lived longer than those who ate little fruit and veg, but that doesn't mean that the fruit and veg was the cause their longevity. You might imagine that people who eat lots of fruit and veg are less prone to disease in many ways: perhaps they take more exercise, are richer, smoke less, etc etc.

In fact if you look at table 1 in the paper, you find that there are indeed strong correlations between intake of fruit and veg and other risk factors. People who ate more fruit and veg were more likely to be female, work in non-manual occupations, have a degree or equivalent qualification, be a lifelong non-smoker, be physically active, and avoid excessive alcohol intake.

Now, that in itself doesn't completely invalidate the results. The other risk factors, known as confounding variables if you want a bit of statistics-speak, can be taken into account in the analysis. And indeed that's what the researchers attempted to do.

But when you have such strong confounding, it is really important to be sure that you are measuring the confounding variables very carefully so that the adjustment can be as complete as possible. And sadly, the confounding variables were not very carefully measured.

For example, smokers were simply categorised as current smokers, ex-smokers, or never smokers. There was no distinction between someone who smokes a few cigarettes at the weekend and someone with a 60-a-day habit. Would adjusting more carefully for smoking have changed the results of the analysis? Quite possibly, but we can't know, because the paper only reports the crude adjustment.

Another very strong risk factor for mortality is socioeconomic status. Rich people live longer than poor people. I suspect rich people are also more likely to eat 7 or more portions of fruit and veg a day than poor people. If we are to conclude anything about the effect of dietary fruit and veg, it is really important to disentangle those variables very carefully indeed.

Sadly, that's not what the researchers did: like smoking, socioeconomic status was measured very crudely, by categorising social class simply as "manual", "non-manual", or "other". So a call centre worker on a zero hours contract would be in the same category as the CEO of a FTSE 100 company. I'm pretty sure adjusting more carefully for socioeconomic status would have given different results, though again, we can't know for sure, as the analyses weren't reported.

And it's possible that there were other important confounding variables that they didn't measure at all. Intake of salty or fatty foods comes immediately to mind, but no doubt there are others.

So all in all, it is impossible to infer causality from a study such as this. The study has shown correlation, but far more careful adjustment for confounding variables would be required before you could even begin to conclude that it was likely that the relationship between diet and mortality was causal (and even then, you could never be sure).

Another problem with the way the results have been reported in the media is that the finding of a "42% reduction in the risk of death" (or if you want to be more correct about it, a hazard ratio of 0.58) did not come from the primary analysis. It came from a sensitivity analysis. In the primary analysis, the hazard ratio was 0.67, which is not as impressive as 0.58. Well, I say the primary analysis. I'm assuming the primary analysis is model 2 as reported in table 2 of the paper, though the authors don't actually state which analysis they had pre-specified as primary, which is disappointing. The analysis that gave a hazard ratio of 0.58 was described as a sensitivity analysis, which says to me it was not the primary analysis.

To be fair to the journalists who covered this story, the focus on the sensitivity analysis was not necessarily their fault: the "42% reduction in the risk of death" was specifically highlighted in the press release put out by the BMJ group (sadly I can't link to that, but I've seen it by email). It strikes me as a little naughty for those writing press releases to focus on numbers that don't come from the primary analysis.

But one thing about the way this has been reported in the media that fills me with despair is the message that we need to eat 7 portions a day because 5 is not enough. Now, the paper has shown that eating 7 portions a day is associated with (but doesn't necessarily cause) significantly lower mortality than eating less than 1 portion a day. But if we are going to conclude that 5 portions a day is not enough, then even if we ignore all the massive problems with inferring causality from observational data, don't you think the analysis ought to show a statistically significant difference in mortality between those who ate 5 portions a day and those who at 7 or more?

Well, it didn't. In the same analysis that showed a hazard ratio for 7 or more portions a day of 0.67 (95% CI 0.58 to 0.78), the hazard ratio for 5 to < 7 portions a day was 0.70 (95% CI 0.63 to 0.79). You will notice that those confidence intervals are massively overlapping. The authors do not report a statistical test for the difference between those categories, but my back-of-the-envelope calculations based on those confidence intervals tell me that the difference is not even close to being statistically significant. Even in their sensitivity analysis, which gave the more impressive hazard ratio, the difference between the 5 to < 7 and 7 or more groups was not statistically significant.

I could go on, and talk about other problems such as measuring fruit and veg consumption on only a single day based on retrospective reporting, but I think I'll stop there.

Despite the problems with the study, I'm still pretty sure that a diet with plenty of fruit and veg is probably good for you. But I for one will not be obsessing about how many portions a day I am eating.

