Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Amazon product IDs and safe in *.json #48

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

johndoe-dev00
Copy link
Contributor

Use case
Scrape Amazon product IDs from each book in order to later scrape the Amazon product pages for the review information.

Functionality
I scraped the Amazon product IDs ("ASIN") from the "Buy" button on the https://www.blinkist.com/en/books/... pages.
The ASIN is required to generate the Amazon product links https://www.amazon.com/dp/<ASIN>
If available, the product IDs are stored in the *.json files.
This feature needs to be enabled through the commandline switch --get-amazon-url

Also I added a "category_id" to the *.json files, that represents the index of the scraped category.

@leoncvlt
Copy link
Owner

This looks fine, but is there any reason to have this other than "it's a cool feature"? I'm a huge advocate of the "do one thing and do it well" philosophy, and I see this tool as a scraper of the blinkist material mainly, what's the advantage in lengthening this by visiting one extra page for each book just to get the Amazon asin? Even if put behind an optional flag, it add more logic and arguments to keep track of to the scrape_book_data method and others.

@johndoe-dev00
Copy link
Contributor Author

Well, I guess my use case is a bit different, than the usual offline reading.
I find the Blinkist Smartphone App pretty crappy in terms of deciding, which book to listen to next.
So I want to listen to the books with the most amazon reviews. And with the Amazon IDs I gathered, I was able to scrape Amazon and generate a list of all blinkist books, ordered by number of amazon reviews:

Other people might have similar use cases, that involve a books Amazon ID, thats why I thought, I would create a pull requests.
If you think this is out of scope of the intended use case, feel free to reject the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants