| Student's name | SCIPER |
|---|---|
| Alice Reymond | 325763 |
| Lorie Xu | 327573 |
| Valentin Porchet | 347219 |
Milestone 1 • Milestone 2 • Milestone 3
10% of the final grade
This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas.
We have chosen to create our own dataset for this project through scrapping, using a modified version of the Steam-Games-Scraper from FronkonGames. While pre-built datasets generated by this script are available on Kaggle, our variant will offer greater flexibility and allows us to include more recent data on successful releases, such as Clair Obscur: Expedition 33. The data accuracy is very high because the script retrieves information from two highly reliable sources: the official Steam Web API for direct game data, and SteamSpy for other data. However, certain very specific events within the Steam platform and its community may introduce some bias. For instance: The “free weekend effect”: some obscure free games are free for a weekend, and they are counted as “owned” for everyone in the database. (more detailed explanations) Review bombing: coordinated effort of an online community to sabotage a game. These are usually handled on the games public web pages, but probably not for raw data fetched from the APIs We will only be able to catch those when our custom dataset will be built.
Our visualization is intended to help game developers understand what makes a game become popular based purely on data and statistics. This will be achieved with multiple tools, for example:
- Game-specific statistics, such as, but not limited to:
- a heatmap that shows how the target audience for a game was dispatched across the world
- insights on text reviews, positive VS negative reviews, recommendations
- peak-playing times depending on periods of the year or new game patches/versions.
- A bar chart race to visualize the most played games on Steam, across the years.
- A map of the world that gives country-specific statistics about games (most played games, categories, genres).
- A correlation between games’ buying peaks and discounts or release date.
- A world map that allows the user to view general gaming tendencies in each country.
As we have not built our custom dataset yet, we rely on this dataset, resulting from the Steam Games Scraper script mentioned above, to present a small preview of the data in the screenshot above (it’s tiny sorry). The image shows the first few most interesting columns of data that we can get for each game (the games have been sorted by Most Owned). There are over 40 columns of data for every game on the Steam platform. We intend to fetch the most relevant data for our website by excluding columns that we won’t be using in our visualizations (required age, movies, support email address, …). We will also fetch the first couple hundred most popular/recent/played games in order to filter out smaller irrelevant games.
Based on a paper published by Hu and co. [1], the researchers used a dataset similar to ours and were able to collect and use multiple Machine Learning techniques to predict the popularity of Steam games. Their approach was to analyze the data to predict the favourability of the reviews given to the games. They correlated the amount of platforms the games were compatible on, and their popularity and ratings.
For this project, we plan to use easy-to-understand graphics and novel correlation pairings in order to better pinpoint the underlying causes of a game’s popularity, differing from more surface-level analysis. We also want to make our data visualization website about video games as playful as possible; after all, we’re gamers before we are developers. Our project is inspired by SteamDB, a huge database of Steam’s games inventory and filled with diverse information on each game such as genres, price history, ratings history, peak number of concurrent users (CCU) and much more. We will also take some inspiration on various types of graph animations such as bar chart races and build our own with the dataset.
[1] Hu, W., Wang, Y. and Xia, R. Machine Learning-Based Steam Platform Game’s Popularity Analysis, 2024. DOI: 10.5220/0012853800004547
10% of the final grade
80% of the final grade
- < 24h: 80% of the grade for the milestone
- < 48h: 70% of the grade for the milestone