PyVideo Scrapers

Scrapy to generate the JSON files similar to the original pyvideo-data repo.

Python Version

Python 3.4+

Usage

Scraping YouTube playlist

After activating the virtual environment, simply call:

scrapy runspider videodata/spiders/youtube_playlist.py \
    -a playlist_id=<playlist_id> \
    [-a api_key=<google_api_key>] \
    [-s OUTPUT_DIR=<output_root_directory>]

where:

playlist_id is a list query parameter from the YouTube playlist URL (example: https://www.youtube.com/playlist?list=PLqtzN042QpfcOm_sOXxAixvNs9QWhhX5w)
google_api_key is a secret key for Google APIs (required only if public API usage quota is exhausted) - for more info how to obtain the API key, visit: https://support.google.com/cloud/answer/6158862
output_root_directory a root directory where the scraping results will be stored (default: <current-working-directory>/scraped_data)

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
tests		tests
utility		utility
videodata		videodata
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
setup.py		setup.py
test_requirements.txt		test_requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyVideo Scrapers

Python Version

Usage

Scraping YouTube playlist

About

Releases

Packages

Languages

License

pyvideo/scraper

Folders and files

Latest commit

History

Repository files navigation

PyVideo Scrapers

Python Version

Usage

Scraping YouTube playlist

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages