Skip to content

Latest commit

 

History

History
 
 

2018-12-04

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Medium Data Science articles

This week's dataset was submitted by Matthew Hendrickson, thanks! Also credit to Kanishka Misra who wanted to work with some text-based data via the tidytext package.

Data

Data was originally scraped by Harrison Jansma and submitted as a Kaggle dataset.

The data-set originally consisted of 1.4 million stories from 95 of Medium’s most popular story-tags. Every story was published between August 1st, 2017 and August 1st, 2018.

For each story, Harrison collected all of the information present on a Medium story-card.

Here is the full list of the information he was able to collect for each story: Title, Sub-Title, Author, Publication, Date, Tags, Read-Time, Claps-Received, Story-URL, and Author-URL.

Given that this file was ~660 MB, I filtered the dataset to only articles with tags in: AI, Artificial Intelligience, big data, data, data science, data visualization, deep learning and machine learning. Feel free to use the original dataset if you want a deeper dive.

column description
title Title of the article
subtitle Subtitle of the article
image Header image present
author Author's name
publication Publication that the article was published under
year Year
month Month
day Day
reading_time Estimated reading time in minutes
claps Number of claps (similar to likes, 1 person could clap many times)
url url for the article
author_url url for the author's Medium page
tag_ai tag AI
tag_artificial_intelligience tag artificial intelligience
tag_big_data tag big data
tag_data tag data
tag_data_science tag data science
tag_data_visualization tag data visualization
tag_deep_learning tag deep learning
tag_machine_learning tag machine learning