A Gamereactor study in Spain has shown that review scores are slightly inflated. It used NLP to analyse the sentiment of 16,000 reviews based on a pre-trained BERT multilingual model. On a 1-10 scale, the median point would be 5 but many people take 7 as their middle point:
The first conclusion is that, as expected, average scores are above 7, both in human and robots reviews. But there is also a deviation in every multiplatform website, meaning that human scores are inflated between 3.7% and 5.4%. The analysis goes on isolating different factors, as you can see in the interactive visualisation online TCC. The second conclusion revolves around how human scores tends to extremes, and what if means for every company.
This looked at a handful of Spanish sites but then looked at a wider range of authors so I’d be interested to see how different the scores would be for English-speaking sites. And, based on what I’ve read about sentiment analysis lately, I wonder how accurate it would be.
In general, it’s not enough to see a high score and assume the game/product is good. For 5-star ratings, for example, I tend to check the 1-2 star ratings to see exactly what the issue was and also combine the total 4-5 star ratings. If the latter is less than 75% of all scores, I’d be a little sceptical. And, I read the reviews to make sure people are actually critiquing the game or product and not the delivery or some kind of personal vendetta.
Filed under: linguistics machine learning natural language processing