Olympics medals table: a statistician's view
Much has been made in recent days of the standing of various countries in the Olympics medals table. As I write this, the USA is currently in the lead with 81 medals, China are second with 77, and Russia are 3rd with 53.
Now, it's fairly obvious that that doesn't necessarily prove that those 3 countries are the most sportingly talented countries in the world. They are all countries with huge populations, so you would expect them to win more medals than smaller countries. (The excellent statistics programme on BBC Radio 4 "More or Less" covered this concept recently, and it's well worth a listen.)
So if we want to see which countries are the most talented at sports, we could look at the number of medals per capita, taking account of the population of each country. A helpful website called "Medals per Capita" has done just that, and you can see the medals per capita league table here. What's striking about that table is the top few places are dominated by small countries. Tiny Grenada is in the top spot, and Jamaica, Estonia, Slovenia, and Cyprus are also in the top 10 places.
It seems that, while looking at crude number of medals may give an unfair advantage to large countries, calculating a simple rate per capita probably gives an advantage to small countries. It makes the assumption that number of medals should be in direct proportion to population size, which, when you think about it, doesn't seem completely reasonable. Would you really expect country A to win 50 times as many medals as country B if it has 50 times the population? More medals, yes, but probably not as many as 50 times more. Each country still has only one squad of elite olympic athletes to train, so it doesn't seem reasonable that the expected advantage in terms of medals should be in direct proportion to the population size.
But it gets more complicated on that. As mentioned on More or Less, we should also consider the relative financial resources of different countries. You would expect richer countries to win more medals than smaller countries. So we could also look at medals per unit of GDP. The helpful Medals per Capita website has done that as well. Again, it is noteworthy that the top spots are dominated by tiny countries. The problem here is similar to the problem of medals per capita, because GDP is highly correlated with population size. So that table also doesn't seem completely fair.
So, what tools does a statistician have to solve this problem? I think the answer is to develop a realistic statistical model to predict the number of medals each country should get, based on its population and GDP, but not making restrictive assumptions about number of medals being in proportion to either.
This is not a completely trivial task. The problem with number of medals is that it is highly skewed: most countries have a small number of medals, and there are just a small number such as the US and China with very many medals. This means that using a linear regression model to predict the number of medals probably won't be appropriate, as it requires normally distributed data. A log transformation helps to some extent (analysing the logarithm of the number of medals rather than the actual number), but it still doesn't produce data that give a good fit to the assumptions of a normal distribution underlying linear regression analysis.
Number of medals is a type of count data, and those of you who have studied statistics will be thinking at this point that we could model the medals as a Poisson variable. That seems like a good plan, until you try it and check the assumptions behind that model. It turns out that the data suffer from a phenomenon known as over-dispersion (this means that the data are more highly variable than would be expected from a true Poisson distribution), so the Poisson analysis also isn't appropriate.
Fortunately, there is a statistical technique we can use for analysing over-dispersed Poisson data, which is known as the negative binomial distribution. So to calculate the predicted number of medals for each country, I used a negative binomial regression model.
It turns out that the logarithm of the population size is a better predictor of the number of medals than the straightforward population. This fits the idea that the expected number of medals probably won't be directly proportional to the population size, but will rise more slowly than the population.
GDP, as I've already mentioned, is highly correlated with population size, and we are already including population size in the model, so it makes sense to use GDP per capita (ie GDP divided by the population size) in the model instead of GDP itself. And in fact, again, a log transformation works better than pure GDP per capita.
So, after fitting the model, we have an equation for the number of medals each country would be expected to win, based on its population size and GDP. That equation is:
number of medals = exp(-10.17 + 0.501 log(population) + 0.383 log(GDP per capita))
I don't claim that that is necessarily the best statistical model you could fit here (and suggestions for how it could be improved welcome via the comments form below!), but it does seem to me to be a reasonable one that takes account of the observed relationship between medals, population size, and GDP, without making too many assumptions.
So, who is now in the lead? Well, using data current as of lunchtime on Thursday, the winner is Jamaica, with 341% of the number of medals that would be expected for a country of its size and GDP. The full list (which of course may look different by the time the Olympics are over) is here:
|Rank||Country||Medals won||Predicted medals||% of predicted medals|
|46||Trinidad and Tobago||1||1.9||53|
I think this is a fairer league table than the crude number of medals, medals per capita, or medals per GDP, but (and again you will already know this if you have listened to More or Less) it still doesn't give the whole picture. There are two rather important variables not accounted for in the analysis.
The first of them is the extent to which Olympic sports figure among national sporting priorities. This accounts for India's poor position: while India is indeed a proud and successful sporting nation, the sport that raises the greatest passions there is cricket, which is not an Olympic sport. Jamaicans, on the other hand, are rather good at running, and there are many running events at the Olympics where they can pick up medals.
The other important variable is the amount of money actually spent on training elite athletes. Just because a country has high GDP doesn't necessarily mean they will choose to prioritise spending on elite sports.
So even the above table, while fairer than most other tables you see, still doesn't really tell you about which nation can claim the greatest sporting prowess. All it can really tell you is which nation has the greatest combination of sporting prowess and a willingness to spend money on elite sports.