-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All work links to Open Library are broken due to wrong casing #6739
Comments
Thanks for your thoughts @alexshpilkin. I am struggling to find the specification for Open Library IDs. Do you have a link? |
@wjrsimpson That’s a fair question. While you can tell that there are broken Open Library links at the ORCID website and that the Open Library website as it is now is case sensitive by simply poking at them, I don’t actually know that Open Library IDs are supposed to be case-insensitive, that was just me trusting your implementation and my own experience. So maybe I shouldn’t have said that with such confidence. I’ve looked around the OL developer and librarian docs and, surprisingly, I can’t find much about how OLIDs are supposed to work. The best I’ve seen is a brief mention in the “Understanding Identifiers” section of the librarians-in-training guide. There are some schemas and schema-adjacent things in the Open Library code, though. First, the official client library has a JSON schema for the API, which contains a (case-sensitive) regular expression for Finally, the Wikidata definition for this identifier type says That’s all I could find, unfortunately, but if you want an official word on this I guess asking the Open Library maintainers is also an option. |
@alexshpilkin Thaks for the additional info. @TomDemeranville Do you happen to know? |
I think we can pretty easily fix this to not alter the case we have in the database for OL identifiers. We still won't be able to can't guarantee they're correct, but we will be able to preserve the case. I've raised a bug here: https://trello.com/c/z3efGnyn/781-preserve-case-when-normalising-open-library-identifiers |
@TomDemeranville From my outside point of view that sounds like a boring but workable solution. One thing I’m concerned about is existing links: am I right that they are persisted in the database as normalized (currently lowercased) links? and if so, do you plan to fix those up in the data from before the normalization is fixed? |
Open Library IDs are case-insensitive in that their casing does not bear information, but the server requires the ID to be passed in uppercase: https://openlibrary.org/b/OL38581116M is a book, while https://openlibrary.org/b/ol38581116m is a 404. However, all (?) Open Library URLs that appear on ORCID web pages seem to be in lowercase, thus 404: see e.g. https://orcid.org/0000-0003-1199-7080.
This seems to happen because when the URL is generated from an
ol
-type work ID by the resolver service, it is passed (among other things) throughorg.orcid.core.utils.v3.identifier.normalizers.CaseSensitiveNormalizer
, which is under the impression that case-insensitive identifiers entail that the URL can be harmlessly lowercased:ORCID-Source/orcid-core/src/main/java/org/orcid/core/utils/v3/identifiers/normalizers/CaseSensitiveNormalizer.java
Lines 26 to 30 in 70964ce
Open Library has a different opinion of what the normal form of the ID (and therefore URL) should be.
As far as possible solutions are concerned, either the case sensitivity flag needs to become a tristate (uppercase, lowercase, preserve); a further normalization step that fixes the casing for OL links needs to be added; or declaring Open Library IDs to be case sensitive. All of these seem a bit meh. Neither deals with the fact that there are plenty of wrong URLs already stored in the database (they are stored, right?).
The text was updated successfully, but these errors were encountered: