Skip to content

lamthuyvo/social-media-data-book

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Mining Social Media

Finding great stories in Internet Data

computer party

About

Mining Social Media will show you the kind of data that can be mined on the social web, the insights that can be gained from it, and the limitations of its scope. You’ll learn how to find out what kind of data is available on popular social media juggernauts like Facebook and Twitter and how to recognize the value of what is measured.

Practical exercises interweave with conceptual lessons that cover ways to use Python to extract data from social media sources, analyze it, and make sense of it visually. You’ll learn how to write a script that taps into an API, how to scrape data from websites, and even how to analyze data from an automated Twitter bot.

This repository holds code and data related to the exercises detailed in the book. It is set to publish at the end of 2019.

Getting started

Computer setup

1. Make sure you have OS level dependencies

  • Python 3
  • more to come

2. Clone this repo

git clone https://github.com/lamthuyvo/social-media-data-book.git
cd social-media-data-book

3. Install required python libraries

Optional but recommended: make a virtual environment using venv.

[more details about the computer setup to come]

Data files

While most coding files are hosted on this repository some data files were too large to be included her. Below are instructions on how to access them:

  • askscience_submissions.csv — This file is required for the data exercises in chapter 8 and 9. If you're working with a downloaded version of this repository, you will need to first create a data inside the chapter08_09 folder, then download the data file askscience_submissions.csv and, lastly, place the data file inside the data folder. You can download the file here. The data was provided by data archivist Jason Baumgartner and represents a small sliver of the data he makes available to academics and researchers at Pushshift.io.

  • iranian_tweets_csv_hashed.csv — This file is required for the data exercises in chapter 10. If you're working with a downloaded version of this repository, you will need to first create a data inside the chapter10 folder, then download the data file iranian_tweets_csv_hashed.csv and, lastly, place it inside the data folder. You can download the file here or directly from Twitter. You can find more information about this data on Twitter's elections integrity page.

Content breakdown

[More to come]

Contact

Please feel free to contact me on Github or via [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published