-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasets #2
Comments
Possible starting point here. One approach could be to get all possible ISBNs from Open Library (and anywhere else), and then send these ISBNs as API requests to ISBNDb, but we can decide on this once we have seen all variables from datasets etc. |
Book Database Size Comparison
|
Comparison of Open Library and ISDBdb Metadata
|
this seems useful while handling ISBNs: https://github.com/xlcnd/isbnlib |
About Goodreads Data For example, if we check a larger ID like https://www.goodreads.com/book/show/223810007 it represents a newly uploaded book(published in January 19, 2025) that not yet have any reviews. So maybe we can sequentially going through the IDs, we can scrape all available book data, and match them with their ISBNs. Reference of Scraper Version |
@yuetongwu7 I think we can reach the book using an ISBN. For example, I tried the ISBN (9781609450786) of My Brilliant Friend and it actually leads me to the book: Python code (we can tweak it to get the book id, and then feed it into the maria-anotniak package, OR we can directly scrape rating info ourselves): from bs4 import BeautifulSoup
import requests
def get_goodreads_info(isbn):
url = f"https://www.goodreads.com/search?q={isbn}"
headers = {
"User-Agent": "Mozilla/5.0"
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
book_title = soup.select_one("a.bookTitle span").text if soup.select_one("a.bookTitle span") else "Not found"
return {"ISBN": isbn, "Title": book_title}
return None
# Example usage
isbns = ["9780143126560", "9780062316097"]
for isbn in isbns:
book_info = get_goodreads_info(isbn)
print(book_info) |
Check data qualityNext steps:
Send to @saurabh-khanna
|
ISBN10 to 13 code: def isbn10_to_isbn13(isbn10):
"""
Convert an ISBN-10 to ISBN-13.
Args:
isbn10 (str): ISBN-10 number (with or without hyphens)
Returns:
str: Corresponding ISBN-13 number
Raises:
ValueError: If the input is not a valid ISBN-10
"""
# Remove hyphens and spaces
isbn10 = isbn10.replace('-', '').replace(' ', '')
# Validate ISBN-10 format
if len(isbn10) != 10 or not isbn10[:-1].isdigit() or (isbn10[-1] not in '0123456789X'):
raise ValueError("Invalid ISBN-10 format")
# Calculate check digit for ISBN-13
prefix = '978' + isbn10[:9]
# Calculate check digit
total = sum((3 if i % 2 else 1) * int(digit) for i, digit in enumerate(prefix))
check_digit = (10 - (total % 10)) % 10
# Construct ISBN-13
isbn13 = prefix + str(check_digit)
return isbn13
# Example usage
print(isbn10_to_isbn13('0-306-40615-2')) # Will print 978-0-306-40615-7
print(isbn10_to_isbn13('007-6092012X')) # Will print 978-0-07-6092012-6 |
Let's start looking at different book datasets, and add:
The text was updated successfully, but these errors were encountered: