Scraping Bestselling Books on the NYT Bestselling List

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

The New York Times puts out a bestsellers list of books doing well. It can be helpful to get some information from the site to get good books to read.

There were a couple tools I used to complete this project. I used the computer programming language Python for the bulk of the code. Experience with HTML and CSS were particularly useful in grabbing the class names to specify my searches when parsing.

I also used a couple Python packages and libraries. Requests (to send HTTP requests extremely easily), Beautiful Soup (to parse HTML documents), Pandas (to develop easy to read and understand data structures), and OS (which offers functionality to interact with my operating system)

Outline of the Project

used requests to download the page
used BS4 to parse and extract information
converted into a pandas dataframe
converted into a CSV file

Scraped https://www.nytimes.com/books/best-sellers/
Compiled a list of categories provided, such as Hardcover Nonfiction, Advice, How To, etc
From each category, the top 15 books were selected, and information including the title, author, publisher, and its time on the list was grabbed
The information was put into a Pandas DataFrame and then transfered into CSV files

All functions created were commented thoroughly for ease of understanding

Resources Used:

Beautiful Soup Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Requests Documentation: https://docs.python-requests.org/en/master/
Reading Relative File Paths in Python: https://www.youtube.com/watch?v=B3M1bQD1Xyk&pp=sAQA

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
bestseller-categories		bestseller-categories
bestsellers-data		bestsellers-data
individual-categories		individual-categories
README.md		README.md
bestsellers-download.py		bestsellers-download.py
bestsellers.html		bestsellers.html
bestselling_scraping_function.py		bestselling_scraping_function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Bestselling Books on the NYT Bestselling List

About

Releases

Packages

Languages

anishshriram/scraping-nyt-bestsellers

Folders and files

Latest commit

History

Repository files navigation

Scraping Bestselling Books on the NYT Bestselling List

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages