Skip to content

com-480-data-visualization/STEAMingHot

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project of Data Visualization (COM-480)

Student's name SCIPER
Alice Reymond 325763
Lorie Xu 327573
Valentin Porchet 347219

Milestone 1Milestone 2Milestone 3

Milestone 1 (20th March, 5pm)

10% of the final grade

This is a preliminary milestone to let you set up goals for your final project and assess the feasibility of your ideas.

Dataset

We have chosen to create our own dataset for this project through scrapping, using a modified version of the Steam-Games-Scraper from FronkonGames. While pre-built datasets generated by this script are available on Kaggle, our variant will offer greater flexibility and allows us to include more recent data on successful releases, such as Clair Obscur: Expedition 33. The data accuracy is very high because the script retrieves information from two highly reliable sources: the official Steam Web API for direct game data, and SteamSpy for other data. However, certain very specific events within the Steam platform and its community may introduce some bias. For instance: The “free weekend effect”: some obscure free games are free for a weekend, and they are counted as “owned” for everyone in the database. (more detailed explanations) Review bombing: coordinated effort of an online community to sabotage a game. These are usually handled on the games public web pages, but probably not for raw data fetched from the APIs We will only be able to catch those when our custom dataset will be built.

Problematic

Our visualization is intended to help game developers understand what makes a game become popular based purely on data and statistics. This will be achieved with multiple tools, for example:

  • Game-specific statistics, such as, but not limited to:
    • a heatmap that shows how the target audience for a game was dispatched across the world
    • insights on text reviews, positive VS negative reviews, recommendations
    • peak-playing times depending on periods of the year or new game patches/versions.
  • A bar chart race to visualize the most played games on Steam, across the years.
  • A map of the world that gives country-specific statistics about games (most played games, categories, genres).
  • A correlation between games’ buying peaks and discounts or release date.
  • A world map that allows the user to view general gaming tendencies in each country.

Exploratory Data Analysis

As we have not built our custom dataset yet, we rely on this dataset, resulting from the Steam Games Scraper script mentioned above, to present a small preview of the data in the screenshot above (it’s tiny sorry). The image shows the first few most interesting columns of data that we can get for each game (the games have been sorted by Most Owned). There are over 40 columns of data for every game on the Steam platform. We intend to fetch the most relevant data for our website by excluding columns that we won’t be using in our visualizations (required age, movies, support email address, …). We will also fetch the first couple hundred most popular/recent/played games in order to filter out smaller irrelevant games.

Related work

Based on a paper published by Hu and co. [1], the researchers used a dataset similar to ours and were able to collect and use multiple Machine Learning techniques to predict the popularity of Steam games. Their approach was to analyze the data to predict the favourability of the reviews given to the games. They correlated the amount of platforms the games were compatible on, and their popularity and ratings.

For this project, we plan to use easy-to-understand graphics and novel correlation pairings in order to better pinpoint the underlying causes of a game’s popularity, differing from more surface-level analysis. We also want to make our data visualization website about video games as playful as possible; after all, we’re gamers before we are developers. Our project is inspired by SteamDB, a huge database of Steam’s games inventory and filled with diverse information on each game such as genres, price history, ratings history, peak number of concurrent users (CCU) and much more. We will also take some inspiration on various types of graph animations such as bar chart races and build our own with the dataset.

References

[1] Hu, W., Wang, Y. and Xia, R. Machine Learning-Based Steam Platform Game’s Popularity Analysis, 2024. DOI: 10.5220/0012853800004547

Milestone 2 (17th April, 5pm)

10% of the final grade

Milestone 3 (29th May, 5pm)

80% of the final grade

Late policy

  • < 24h: 80% of the grade for the milestone
  • < 48h: 70% of the grade for the milestone

About

Initial group project repository for the data visualization course at EPFL

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 38.6%
  • Jupyter Notebook 22.1%
  • HTML 21.1%
  • CSS 18.2%