I’ve been a bit quiet lately for a number of reasons – I’ve been dipping into Neverwinter, have just finished Deus Ex: Human Revolution with the help of this fantastic video guide, plus the usual rigours of life. Blogging-wise I put time into a defence-of-sorts for Metacritic and its relationship to video games (which will be posted later on) but reached a point where I took pause. One of the big claims against the reliability of Metacritic (for video games, anyway) was that because the underlying data was weighted in a black box-kind of way when turned into a metascore, it can’t be trusted.
There was the Full Sail study that tried to uncover these hidden weightings behind Metacritic’s calculations, but Metacritic indicated it was way off and Kotaku noted there were some flaws in the data behind the work. But that study seemed to me to step passed a large question: how much did the weighted metascore – Metacritic’s adjusted average that takes all the professional critic site reviews, weights them and turns them into a single score – differ from just looking at the unweighted average of those original critic review scores?
After all, if the hidden weights caused large differences, then they are having a large impact and there’s value in knowing how each professional gaming site is weighted. But if there is little difference between the metascore and its unweighted average equivalent, then we could conclude that Metacritic’s weightings actually have little impact and aren’t having much impact on the score.
To my knowledge, no-one has looked at this. So I did.
Obviously, the following original data set was sourced from Metacritic and I make no claim to the authorship of that information. The aim of this research is to analyse Metacritic’s data as a partial examination of its composed metascore number. This should be considered fair use and fair dealing with the data set. No monetary benefit exists to me from my analysis of this data.
A range of Xbox 360, Playstation 3 and PC titles have been included in this analysis – all up 8668 titles were collected for analysis, but only 4256 games in this list had a metascore created and could be analysed. Results for platforms such as the Wii, Wii U, Vita, DS etc have not been examined.
Just Answer the Damn Question In The Title!
To get to it: on average, the metascore only differs from the unweighted average by -0.3 within this sample. In short, Metacritic’s weighting to create a metascore overall has almost no difference compared to just averaging out the data without the weights. (For those interested in such things, the median is -0.25 difference.)
There are some extremes: the metascore swings from being -9.2 below an uweighted average (for Suzuki Alstare Extreme Racing (PC)) to +6.5 above (for Open Kart (PC)). But these are outliers – the majority of titles see very little difference between their metascore and the unweighted average. Standard deviation for this difference is 1.32, meaning that (assuming normal distribution) 68% of metascores fall within +/- 1.32 points of the unweighted average, while 95% of metascores fall within +/- 2.64 points of that unweighted average.
Here’s a chart showing the spread of those differences, ordered lowest to highest:
What does have an impact on the size of the difference is how many reviews were used to create the metascore. The fewer reviews used to create a metascore, the more likely it is to have a large difference. Here are the same results as above, but sorted by the number of reviews included from largest to smallest. On the left side are titles like Heavy Rain (PS3) with 107 reviews and Uncharted 2: Among Thieves (PS3) with 105 reviews; and the right are titles with only the bare minimum 4 reviews. See how things start to swing as the number of reviews drop:
I’d also hypothesize that those titles with the greatest differences between metascore and unweighted average seem to be ‘older’ titles. Metacritic has no doubt finessed its approach over time, so that large differences are less likely to happen with newer titles.
A Tiny Anchor
Going back a bit, you know how I mentioned that the average difference is -0.3? And looking at that first chart, doesn’t it seem a touch heavier on the negative side? This would indicate to me that Metacritic’s weights tend more to downweight values (i.e. reduce them) than upweight them (i.e. increase them). About 60% of the differences between the metascore and the unweighted average show a negative score, even if it is just a small one.
This makes a degree of sense – there are lots of games review sites out there that haven’t established a reliable reputation. There’s a logic to downweighting those sites and minimising their ability to wildly influence the metascore until they have established themselves in Metacritic’s eyes.
But again, it doesn’t seem to be a particularly big effect. (Of course, if you are a studio that only gets paid if you reach a certain Metacritic score you’d probably prefer every site was upweighted rather than the reverse.)
(EDIT: I did have the thought that the slight negative skew might also be a result of more downweighted sites reviewing games than upweighted ones – after all, the smaller sites chasing viewers / reputation might try to get more reviews out to keep that regular content flowing and might also review games that larger sites might ignore. But that’s just a hypothesis.)
A Note On Fallout: New Vegas
One of the more famous cases of Metacritic’s influence being criticised occured with Fallout: New Vegas. Obsidian had been promised a bonus from Bethseda if Fallout: New Vegas got an 85 metascore, but this didn’t happen – it got an 84 metascore on PC. Metacritic got a bit of a kicking in some circles for somehow depriving Obsidian of that bonus, but if you take the unweighted average of those 39 review scores that made up the metascore for this title, you get 83.6. So even if Bethseda was basing a bonus off getting a raw average of 85 from critics / professional review sites, Obsidian still wouldn’t have received their money.
Criticising Metacritic over that particular event seems to be a case of shooting the messenger rather than looking at the source.
If you have concerns about the size and hidden nature of Metacritic’s weightings used to create the metascore, there’s no need to worry about it. In the majority of cases (67%) you can look at a metascore and know that it is +/- 1 from the unweighted average of the professional critic reviews listed, especially if the metascore is based on a good number of critical reviews. There are other valid concerns around Metacritic, but the idea that site weightings to the metascore are somehow distorting that figure shouldn’t be a factor outside of exceptional cases.
Next up, a defence (of sorts) of Metacritic for video games…
Having occasionally used Metacritic to influence my purchasing decisions, I’ve long wondered how their weighting system might affect things. Thanks for looking into this.
Pretty cool stuff. Here’s something I’d like to see, and since you have the data, maybe you can do the analysis for me? 😉
What would Metacritic look like under a Rotten Tomatoes type of system? Take each review, apply a threshold marking it good or bad (maybe 70%?), and then use the percentage of “good” reviews as the final score.
How do those results compare to the straight average of the scores? Is there a threshold value that best approximates the average?
That’s not a bad idea. I’ll try that out once I’ve finished this batch of analysis.
Pingback: This Week in Video Game Criticism: From Man Caves to Tropes vs Women » Gaming News Alerts
Pingback: Are Day 1 Game Reviews Positively Biased? Short Answer: Yes | Vicarious Existence
Pingback: In Defence of Metacritic for Video Games | Vicarious Existence
Pingback: Metacritic: The Data | Vicarious Existence