Project of Data Visualization (COM-480)

Student's name	SCIPER
Alice Reymond	325763
Lorie Xu	327573
Valentin Porchet	347219

Milestone 1 • Milestone 2 • Milestone 3

Milestone 1 (20th March, 5pm)

10% of the final grade

This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas.

Dataset

We have chosen to create our own dataset for this project through scrapping, using a modified version of the Steam-Games-Scraper from FronkonGames. While pre-built datasets generated by this script are available on Kaggle, our variant will offer greater flexibility and allows us to include more recent data on successful releases, such as Clair Obscur: Expedition 33. The data accuracy is very high because the script retrieves information from two highly reliable sources: the official Steam Web API for direct game data, and SteamSpy for other data. However, certain very specific events within the Steam platform and its community may introduce some bias. For instance: The “free weekend effect”: some obscure free games are free for a weekend, and they are counted as “owned” for everyone in the database. (more detailed explanations) Review bombing: coordinated effort of an online community to sabotage a game. These are usually handled on the games public web pages, but probably not for raw data fetched from the APIs We will only be able to catch those when our custom dataset will be built.

Problematic

Our visualization is intended to help game developers understand what makes a game become popular based purely on data and statistics. This will be achieved with multiple tools, for example:

Game-specific statistics, such as, but not limited to:
- a heatmap that shows how the target audience for a game was dispatched across the world
- insights on text reviews, positive VS negative reviews, recommendations
- peak-playing times depending on periods of the year or new game patches/versions.
A bar chart race to visualize the most played games on Steam, across the years.
A map of the world that gives country-specific statistics about games (most played games, categories, genres).
A correlation between games’ buying peaks and discounts or release date.
A world map that allows the user to view general gaming tendencies in each country.

Exploratory Data Analysis

As we have not built our custom dataset yet, we rely on this dataset, resulting from the Steam Games Scraper script mentioned above, to present a small preview of the data in the screenshot above (it’s tiny sorry). The image shows the first few most interesting columns of data that we can get for each game (the games have been sorted by Most Owned). There are over 40 columns of data for every game on the Steam platform. We intend to fetch the most relevant data for our website by excluding columns that we won’t be using in our visualizations (required age, movies, support email address, …). We will also fetch the first couple hundred most popular/recent/played games in order to filter out smaller irrelevant games.

Related work

Based on a paper published by Hu and co. [1], the researchers used a dataset similar to ours and were able to collect and use multiple Machine Learning techniques to predict the popularity of Steam games. Their approach was to analyze the data to predict the favourability of the reviews given to the games. They correlated the amount of platforms the games were compatible on, and their popularity and ratings.

For this project, we plan to use easy-to-understand graphics and novel correlation pairings in order to better pinpoint the underlying causes of a game’s popularity, differing from more surface-level analysis. We also want to make our data visualization website about video games as playful as possible; after all, we’re gamers before we are developers. Our project is inspired by SteamDB, a huge database of Steam’s games inventory and filled with diverse information on each game such as genres, price history, ratings history, peak number of concurrent users (CCU) and much more. We will also take some inspiration on various types of graph animations such as bar chart races and build our own with the dataset.

References

[1] Hu, W., Wang, Y. and Xia, R. Machine Learning-Based Steam Platform Game’s Popularity Analysis, 2024. DOI: 10.5220/0012853800004547

Milestone 2 (17th April, 5pm)

10% of the final grade

Milestone 3 (29th May, 5pm)

80% of the final grade

Late policy

< 24h: 80% of the grade for the milestone
< 48h: 70% of the grade for the milestone

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Data		Data
Preprocessing		Preprocessing
milestones		milestones
steaminghot		steaminghot
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project of Data Visualization (COM-480)

Milestone 1 (20th March, 5pm)

Dataset

Problematic

Exploratory Data Analysis

Related work

References

Milestone 2 (17th April, 5pm)

Milestone 3 (29th May, 5pm)

Late policy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project of Data Visualization (COM-480)

Milestone 1 (20th March, 5pm)

Dataset

Problematic

Exploratory Data Analysis

Related work

References

Milestone 2 (17th April, 5pm)

Milestone 3 (29th May, 5pm)

Late policy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages