Skip to content

mbzhuang/TrafficAnalysisEnglishWikipedia

Repository files navigation

Traffic Analysis of English Wikipedia Website

About the project

The goal of this project is to construct, analyze, and publish a dataset of monthly traffic on English Wikipedia from Jan 1st, 2008 through Sep 30th, 2017. Wikipedia traffic from two different Wikimedia REST API endpoints are acquired through their respective API, combined into a single dataset, and finally visualized to show both the mobile and main site traffic change from 2008 to 2017.

License of source data

API

Pagecounts API (documentation, endpoint) provides access to desktop and mobile traffic data from January 2008 through July 2016.

Pageviews API (documentation, endpoint) provides access to desktop, mobile web, and mobile app traffic data from July 2015 through September 2017.

Both are licensed under the CC-BY-SA 3.0 and GFDL licenses.

Steps of the project

*Find the steps in PageviewAnalysis.ipynb

  1. Data acquisition: retrieve raw datasets from Pagecount API and Pageview API and save them as JSON files in the JSON_Data folder. Note that Pageview API excludes spiders/crawlers, while data from the Pagecounts API does not.

  2. Data processing: read the JSON files and process the raw data into a final csv file, en-wikipedia_traffic_200801-201709.csv

    The final csv file has 8 columns:

    Column Value
    year YYYY
    month MM
    pagecount_all_views num_views
    pagecount_desktop_views num_views
    pagecount_mobile_views num_views
    pageview_all_views num_views
    pageview_desktop_views num_views
    pageview_mobile_views num_views
  3. Data analysis: read the csv file, analyze and visualize the traffic data, PageviwPlot.png.

Organization of the project

The project has the following structure:

TrafficAnalysisEnglishWikipedia/
  |- JSON_Data/
     |- pagecounts_desktop-site_200801-201607.json
     |- pagecounts_mobile-site_200801-201607.json 
     |- pageviews_desktop-site_201507-201709.json
     |- pageviews_mobile-web_201507-201709.json
     |- pageviews_mobile-app_201507-201709.json
  |- LICENSE
  |- PageviewAnalysis.ipynb
  |- PageviewPlot.png
  |- README.md
  |- en-wikipedia_traffic_200801-201709.csv

*This is an assignment for Data 512, University of Washington.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published