Collect Amazon product IDs and safe in *.json #48

johndoe-dev00 · 2021-04-08T16:59:59Z

Use case
Scrape Amazon product IDs from each book in order to later scrape the Amazon product pages for the review information.

Functionality
I scraped the Amazon product IDs ("ASIN") from the "Buy" button on the https://www.blinkist.com/en/books/... pages.
The ASIN is required to generate the Amazon product links https://www.amazon.com/dp/<ASIN>
If available, the product IDs are stored in the *.json files.
This feature needs to be enabled through the commandline switch --get-amazon-url

Also I added a "category_id" to the *.json files, that represents the index of the scraped category.

leoncvlt · 2021-04-17T16:13:06Z

This looks fine, but is there any reason to have this other than "it's a cool feature"? I'm a huge advocate of the "do one thing and do it well" philosophy, and I see this tool as a scraper of the blinkist material mainly, what's the advantage in lengthening this by visiting one extra page for each book just to get the Amazon asin? Even if put behind an optional flag, it add more logic and arguments to keep track of to the scrape_book_data method and others.

johndoe-dev00 · 2021-04-21T16:47:37Z

Well, I guess my use case is a bit different, than the usual offline reading.
I find the Blinkist Smartphone App pretty crappy in terms of deciding, which book to listen to next.
So I want to listen to the books with the most amazon reviews. And with the Amazon IDs I gathered, I was able to scrape Amazon and generate a list of all blinkist books, ordered by number of amazon reviews:

https://htmlpreview.github.io/?https://github.com/johndoe-dev00/blinkist-books-sorted-by-amazon-reviews/blob/main/!index.html

Other people might have similar use cases, that involve a books Amazon ID, thats why I thought, I would create a pull requests.
If you think this is out of scope of the intended use case, feel free to reject the pull request.

johndoe-dev00 and others added 3 commits April 8, 2021 16:59

added CLI option to scrape Amazon AZIN from 'Buy book' button

8ba7282

improved 'buy' button wait handling + some cleanup

8f7151b

Update README.md

f4aca02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect Amazon product IDs and safe in *.json #48

Collect Amazon product IDs and safe in *.json #48

johndoe-dev00 commented Apr 8, 2021

leoncvlt commented Apr 17, 2021

johndoe-dev00 commented Apr 21, 2021

Collect Amazon product IDs and safe in *.json #48

Are you sure you want to change the base?

Collect Amazon product IDs and safe in *.json #48

Conversation

johndoe-dev00 commented Apr 8, 2021

leoncvlt commented Apr 17, 2021

johndoe-dev00 commented Apr 21, 2021