Confused by the import results #1742

steinarb · 2021-12-31T16:32:10Z

steinarb
Dec 31, 2021

When I first looked at bookwyrm I couldn't figure out how to add my own books.

So I asked around and was told that I could do a CSV export from goodreads and import that into bookwyrm.

However, I didn't have any goodreads books to export (I have an account there, but I haven't ever used it for anything).

So I googled and found this page describing the goodreads CSV format: https://zief0002.github.io/epsy-8251/codebooks/goodreads.html

I figured "how hard can a book database be...?" and wrote a reactjs web app with PostgreSQL storage, and a datatabase schema made to store the data needed to generate the above CSV format: https://github.com/steinarb/bokbase

However the first imports went badly: none of the books were imported.

So I started looking at the bookwyrm import tests and running them on my generated CSV files. First using the goodreads import test, and later switching to the generic CSV import (since the columns in that format seems to be the one actually used).

I rewrote the database schema a little (changed publication time from year to date, and added a "finished read date" field, and added an ISBN13 field) and used the generic.csv column names.

And then I tried importing (unfortunately using a CSV generated from my dummy test database instead of the actual PostgreSQL database) and I am confused by the results: https://bookwyrm.social/import/1078

3 real books were imported
One book was not imported because it couldn't be matched
One bogus book in the CSV was actually imported and seems to have been matched with a book with the same title as this book's bogus ISBN number

Also, when I look at the successfully imported real books, they seem to contain lots of stuff not in the generic CSV import, and (as far as I could tell) not in the fields used in the goodreads CSV imports (but values found in the above URL describing the goodreads import, such as series and publisher).

So I am confused.

Could someone who understand what goes on, perhaps explain why things turned out as they did?

And also what I should to to create CSV files that are more acceptable to bookwyrm? (because that's what I want to do...)

Thanks in advance!

And happy new year!
generatedBy_react-csv.csv

mouse-reeve · 2021-12-31T17:10:56Z

mouse-reeve
Dec 31, 2021
Maintainer

I really appreciate how much effort and thought you're putting into this! Hopefully I can lend some clarity to what happens inside BookWyrm.

Here's how import works:

The headers in the csv are matched up with a set list of headers that represent fields like ISBN, title, author, and your reading metadata. This allows it to understand a small variety of variations on common headers, for example it can use "isbn" "isbn13" or "isbn 13" as the ISBN column.
For each row in the file, it tries to find an entry in the book database that matches the book's identifiers provided in the csv file. First it will look up by ISBN, and if there's no match found or no ISBN in the csv, it will search by title and author. It looks first in the instance's book database, and if it doesn't find it there, it will search external databases like other BookWyrm instances, OpenLibrary, and Inventaire.
When a book is matched, it links the csv row to the book database entry and creates any reading metadata, like dates read or ratings and reviews and associates them with the book.
When a book has ambiguous results from a title/author search, it will ask you to manually approve or reject the best-guess book that it found.
When no match is found, you have the option to re-try those items. Sometimes this is because none of the databases had a match for the book, sometimes it's because of a transient error (like OpenLibrary was too slow to respond), and sometimes it's because of a bug in BookWyrm.

It sounds like 3 of your books were matched, and the reason they have metadata that wasn't present in the csv is that the book isn't being created from the csv, but rather looked up in a database based on the ISBN in the csv. The book with a dummy ISBN would have fallen back on a title/author search, which is why it got a match. Did the match have the correct title and author that corresponded to the csv data?

And the book that didn't match should show you a message about what went wrong -- either "Could not find a match for book" or "Error loading book" -- which will hint at what happened.

Can you tell me more specifics about what outcome you expected from the CSV vs the outcome you got? I'm not able to view the link to the import on BookWyrm, since it's only visible to you, so if you can post screenshots where it's relevant, that would be helpful.

(The blog post you found appears to be out of date or possibly a homebrew system of storing book data, as those csv headers don't reflect that columns that Goodreads uses in any export I've encountered recently.)

0 replies

anonyth · 2022-01-04T06:48:29Z

anonyth
Jan 4, 2022

I recently exported a TSV from LibraryThing and tried importing to BookWyrm on https://bookwyrm.social.

The import job has been running for ~23 hours without update.
There is no ability in the the UI to kill an existing job -- and I had tried running it again, so there are 2 hanging jobs, alas.

Is it possible to kill those imports and advise on LibraryThing import gotchas. I don't see any related bugs/problems under /issues.

2 replies

mouse-reeve Jan 7, 2022
Maintainer

Is it still hanging? After a certain period of time it should give you the option to re-try an import item. That said, imports are designed to be a very low-priority task, so it's normal that they can take quite a while.

anonyth Jan 10, 2022

Looks like it. "6 days, 18 hours ago" for me and still says Pending. Perhaps ~450 entries in the source file.

steinarb's import took 6 days 23 hours, so perhaps I'm in the zone.

steinarb · 2022-01-07T15:37:18Z

steinarb
Jan 7, 2022
Author

>>>> Mouse Reeve ***@***.***>: I really appreciate how much effort and thought you're putting into this! Hopefully I can lend some clarity to what happens inside BookWyrm. Here's how import works: 1. The headers in the csv are matched up with a set list of headers that represent fields like ISBN, title, author, and your reading metadata. This allows it to understand a small variety of variations on common headers, for example it can use "isbn" "isbn13" or "isbn 13" as the ISBN column.

Yup, I found the mapping in the python code, and ended up with a minimal CSV containing only the fields mappend to. That's the one I was partially successful in importing.

2. For each row in the file, it tries to find an entry in the book database that matches the book's identifiers provided in the csv file. First it will look up by ISBN, and if there's no match found or no ISBN in the csv, it will search by title and author. It looks first in the instance's book database, and if it doesn't find it there, it will search external databases like other BookWyrm instances, OpenLibrary, and Inventaire.

Are there any detailed info to be found about this book database? A way to browse it? A REST API that can be used to query it?

3. When a book is matched, it links the csv row to the book database entry and creates any reading metadata, like dates read or ratings and reviews and associates them with the book.

Right!

4. When a book has ambiguous results from a title/author search, it will ask you to manually approve or reject the best-guess book that it found. 5. When no match is found, you have the option to re-try those items. Sometimes this is because none of the databases had a match for the book, sometimes it's because of a transient error (like OpenLibrary was too slow to respond), and sometimes it's because of a bug in BookWyrm.

Ok.

It sounds like 3 of your books were matched, and the reason they have metadata that wasn't present in the csv is that the book isn't being created from the csv, but rather looked up in a database based on the ISBN in the csv. The book with a dummy ISBN would have fallen back on a title/author search, which is why it got a match. Did the match have the correct title and author that corresponded to the csv data?

Nope! A completely different title and author. :-)

And the book that didn't match should show you a message about what went wrong -- either "Could not find a match for book" or "Error loading book" -- which will hint at what happened.

It was "Could not find a match for book".

Can you tell me more specifics about what outcome you expected from the CSV vs the outcome you got? I'm not able to view the link to the import on BookWyrm, since it's only visible to you, so if you can post screenshots where it's relevant, that would be helpful.

I expected to be able to fill my bookshelves with books from the CSV. I expected having to provide all of the metadata for the books. I expected to be able to add new books to bookwyrm. I have attached a screen shot of the import to this email message. If it is stripped off I'll upload the screen shot to the thread using the web GUI.

(The blog post you found appears to be out of date or possibly a homebrew system of storing book data, as those csv headers don't reflect that columns that Goodreads uses in any export I've encountered recently.)

Ok! Won't look at that anymore! :-) Thanks!

2 replies

steinarb Jan 7, 2022
Author

Forgot to attach the screen shot, so I don't know if that works, but anyway, here's the screenshot of the partially successful import.

mouse-reeve Jan 7, 2022
Maintainer

Are there any detailed info to be found about this book database? A way to browse it? A REST API that can be used to query it?

http://openlibrary.org/ -- JSON search: https://openlibrary.org/search.json?q=<query>
https://inventaire.io/ -- JSON search: https://inventaire.io/api/search?types=works&types=works&search=<query>
http://bookwyrm.social/ -- JSON search: http://bookwyrm.social/search.json?q=<query>

Nope! A completely different title and author. :-)

The search endpoint checks if a query looks like an ISBN, and if it doesn't (like 1234), it searches it as a free text query. This is a classic case of "garbage in, garbage out" -- for the purpose of import, it would help if it enforced ISBN search, but it's not a high priority issue

I expected to be able to fill my bookshelves with books from the CSV. I expected having to provide all of the metadata for the books. I expected to be able to add new books to bookwyrm.

I see! It sounds like you understood the goal of the feature differently than was intended. It does populate your shelves and add books to BookWyrm, but it doesn't create them from the provided CSV metadata.

steinarb · 2022-01-10T16:27:44Z

steinarb
Jan 10, 2022
Author

>>>> Mouse Reeve ***@***.***>: Is it still hanging?

I don't think the import is hanging, exactly...? It just failed on one book (and mis-imported another).

After a certain period of time it should give you the option to re-try an import item. That said, imports are designed to be a very low-priority task, so it's normal that they can take quite a while.

Ok.

0 replies

steinarb · 2022-01-10T16:32:30Z

steinarb
Jan 10, 2022
Author

>>>> Mouse Reeve ***@***.***>: > Are there any detailed info to be found about this book database? A way to browse it? A REST API that can be used to query it? http://openlibrary.org/ -- JSON search: `https://openlibrary.org/search.json?q=<query>` https://inventaire.io/ -- JSON search: `https://inventaire.io/api/search?types=works&types=works&search=<query>` http://bookwyrm.social/ -- JSON search: `http://bookwyrm.social/search.json?q=<query>`

Thanks! I will investigate.

> Nope! A completely different title and author. :-) The search endpoint checks if a query looks like an ISBN, and if it doesn't (like 1234), it searches it as a free text query. This is a classic case of "garbage in, garbage out" -- for the purpose of import, it would help if it enforced ISBN search, but it's not a high priority issue

Ok. :-)

> I expected to be able to fill my bookshelves with books from the CSV. I expected having to provide all of the metadata for the books. I expected to be able to add new books to bookwyrm. I see! It sounds like you understood the goal of the feature differently than was intended. It does populate your shelves and add books to BookWyrm, but it doesn't create them from the provided CSV metadata.

Yep! So I'm not sure my book database has a mission anymore...? :-) Unless you have a different way to contribute books to your database? If so the application may still have a mission...:-) In fact, it may have two: 1. Provide a way to edit and contribute book metadaata 2. Provide a way to set up bookshelves to import into bookwyrm So maybe not wasted work after all...?

0 replies

steinarb · 2022-01-10T16:59:13Z

steinarb
Jan 10, 2022
Author

>>>> Steinar Bang ***@***.***>: >>>> Mouse Reeve ***@***.***>: >> Are there any detailed info to be found about this book database? A way to browse it? A REST API that can be used to query it? > http://openlibrary.org/ -- JSON search: `https://openlibrary.org/search.json?q=<query>` > https://inventaire.io/ -- JSON search: `https://inventaire.io/api/search?types=works&types=works&search=<query>` > http://bookwyrm.social/ -- JSON search: `http://bookwyrm.social/search.json?q=<query>` Thanks! I will investigate.

You wouldn't happen to have a reference to the syntax of the queries as well...? :-) (a link to the code in bookwyrm setting up such queries would suffice)

3 replies

mouse-reeve Jan 10, 2022
Maintainer

I included that in the quoted text

steinarb Jan 10, 2022
Author

Oh, so basically query is either an ISBN or free text?

That's the bookwyrm search presumably? Is it the same for the other two?

mouse-reeve Jan 10, 2022
Maintainer

Yes, the query is the isbn or<title> <author> in all the examples

silexy · 2022-02-12T19:56:22Z

silexy
Feb 12, 2022

@mouse-reeve , is it possible to use worldcat.org to lookup the imported books?
I get a lot of unmatched books, because I am reading Dutch books or perhaps they are less popular books. I can get a search hit from these unmatched books in worldcat.org.

If that isn't possible, it would be nice to have the option to add books manually from this import list, because most information is correctly imported (from Calibre catalogue), while the only problem seems to be that Bookwyrm cannot get a match from the book in openlibrary, inventaire and bookwyrm.

2 replies

mouse-reeve Feb 13, 2022
Maintainer

Unfortunately, WorldCat does not have an API we're able to use -- I would very much like to use that data as well! There is a github issue available for creating books from import item: #1820

silexy Feb 13, 2022

Thanks!
Hopefully it is not too difficult to make it possible creating books from import items. Would be a great help.

steinarb · 2022-02-13T15:13:06Z

steinarb
Feb 13, 2022
Author

>>>> Peter ***@***.***>: Thanks! Hopefully it is not too difficult to make it possible creating books from import items. Would be a great help.

Heh! :-) If it becomes possible creating books from import items, then my https://github.com/steinarb/bokbase app will actually have aquired a mission...;-)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused by the import results #1742

{{title}}

Replies: 8 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Confused by the import results #1742

steinarb Dec 31, 2021

Replies: 8 comments · 9 replies

mouse-reeve Dec 31, 2021 Maintainer

anonyth Jan 4, 2022

mouse-reeve Jan 7, 2022 Maintainer

anonyth Jan 10, 2022

steinarb Jan 7, 2022 Author

steinarb Jan 7, 2022 Author

mouse-reeve Jan 7, 2022 Maintainer

steinarb Jan 10, 2022 Author

steinarb Jan 10, 2022 Author

steinarb Jan 10, 2022 Author

mouse-reeve Jan 10, 2022 Maintainer

steinarb Jan 10, 2022 Author

mouse-reeve Jan 10, 2022 Maintainer

silexy Feb 12, 2022

mouse-reeve Feb 13, 2022 Maintainer

silexy Feb 13, 2022

steinarb Feb 13, 2022 Author

steinarb
Dec 31, 2021

Replies: 8 comments 9 replies

mouse-reeve
Dec 31, 2021
Maintainer

anonyth
Jan 4, 2022

mouse-reeve Jan 7, 2022
Maintainer

steinarb
Jan 7, 2022
Author

steinarb Jan 7, 2022
Author

mouse-reeve Jan 7, 2022
Maintainer

steinarb
Jan 10, 2022
Author

steinarb
Jan 10, 2022
Author

steinarb
Jan 10, 2022
Author

mouse-reeve Jan 10, 2022
Maintainer

steinarb Jan 10, 2022
Author

mouse-reeve Jan 10, 2022
Maintainer

silexy
Feb 12, 2022

mouse-reeve Feb 13, 2022
Maintainer

steinarb
Feb 13, 2022
Author