Skip to content

Scrapy project for data capture of vgchartz

Notifications You must be signed in to change notification settings

hechmik/vgchartzScrape

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vgchartzfull - A crawler to download data from Global Videogame Sales

vgchartz-full-crawler.py is a python@3 crawler script based on BeautifulSoup. It creates a csv dataset with data from more than 57,000 games. based on data from VGChartz Site.

Output

The dataset is saved in the file specified at cfg/resources.json, by default "dataset/vgsales.csv".

Install & execution

You will need to have some depencies compiled at requirements.txt.

It can be installed by pip.

  # Install dependencies
  $> pip install -r requirements.txt
  
  # Run
  $> python vgchartzfull.py
  

Dictionary

The dataset it's composed by this fields, and the data is collected with this methodology.

Field Description
Rank Ranking of overall sales
Name The games name
Genre Genre of the game
Platform Platform of the games release (i.e. PC,PS4, etc.)
Developer Developer of the game
Publisher Publisher of the game
Vgchartz_Score Score at VGcharz site
Critic_Score Score at Critic
User_Score Score by VGcharts users' site
Total_Shipped Total worldwide shipments (in millions)
Total_Sales Total worldwide sales (in millions)
NA_Sales Sales in North America (in millions)
EU_Sales Sales in Europe (in millions)
JP_Sales Sales in Japan (in millions)
Other_Sales Sales in the rest of the world (in millions)
Release_Date Year of the game's release
Last_Update Last update of this register

TODO

  • Remap the columns according the selected values at resources.json
  • Add some unit testing
  • Dockerize (w/ alpine-python) to ease use and avoid intallations
  • Publish at Docker hub

Links

Greetings

Thanks to Chris Albon

About

Scrapy project for data capture of vgchartz

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.5%
  • Shell 3.5%