Predicting Success of Steam Games MICHAL TRNĚNÝ MASARYK UNIVERSITY, 2016 Task Investigate how well can be post-release success of games predicted from basic information about games (available before release) No data regarding previews, early reviews, user activity on social networks… Steam Largest platform for selling PC games Since 2004 Over 10,000 games in total Steam Charts Tracks how many players are in game every hour Monthly statistics Since July 2012 Not very accurate measure of success (?) (however, Steam Charts is the best source with sufficient history) SteamSpy Tracks number of owners Since April, 2015 No access to history Manually collected data for about 1,400 games Still not a good measure of success? Data About 4,700 games from July, 2012 – July, 2016 Completely omitted Early Access titles and Free-to-play titles Info such as genres, price, release date, descriptions, languages, features + Thumbnail and screenshots of every game K-means used to extract 5 most dominant colors First screenshot and thumbnail Screenshots Subsetting Pick only games whose publisher has a history of at least 2 games Covers 30-40 % games Notable attributes Previous games: how many, max and min players of any game, Gini index Developer, Publisher (limited to max 53, rest is “other”) text descriptions Price HW requirements: disk space, RAM, GPU Languages Number of colors on thumbnail and first screenshot + dominant thumbnail color Results Classification (10 classes) Random Forest Regression (values 0-15) SVM Correlation coefficient 0.80 MAE 1.24 RMSE 1.67 Accuracy 46 % / 44 % within +-1 from actual 53 % / 52 % within +-1 from actual 87 % / 84 % within +-2 from actual 82 % / 77 % Training, validation, test set (60 %, 20 %, 20 %) Classification Reference Prediction 1 2 3 4 5 6 7 8 9 10 1 79 32 8 4 0 0 0 0 0 0 2 42 43 21 5 3 0 0 1 0 0 3 0 2 1 0 1 0 0 0 0 0 4 1 0 4 3 5 0 1 1 0 0 5 1 0 3 1 2 0 0 0 2 0 6 1 1 0 1 3 2 2 0 0 0 7 0 0 0 1 1 2 0 0 1 1 8 0 0 0 0 0 0 0 1 0 0 9 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0