Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow more types (schemes) of URLs (URIs) in metadata fields #7117

Closed
poikilotherm opened this issue Jul 23, 2020 · 1 comment
Closed

Allow more types (schemes) of URLs (URIs) in metadata fields #7117

poikilotherm opened this issue Jul 23, 2020 · 1 comment
Labels
Feature: Metadata Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc.

Comments

@poikilotherm
Copy link
Contributor

poikilotherm commented Jul 23, 2020

Disclaimer: I'm very aware of #6030 but this can't wait for us. If IQSS is unhappy with this, it can reside in our mini-fork.

tl;dr: Add a new metadata field uri_no representing absolute, non-opaque URIs, being the correct term for "URL". Using url type is not sufficient as HTTP/S only.

Context:
For Jülich DATA, we want our contributors to provide URLs to data or at least documentation of whereabouts, when they don't or can't upload the data. (Which is our major use case...)

Lots of our data references will reside on windows network shares (smb://foo.bar/share/folder) and other obscure places (think rsync://..., ipfs://..., s3://, gpfs://..., git+xxx://, http://, ftp://..., ...). Thus we need a broader support for likely any kind of URL to come no matter if a browser understands it.

Our use case is also described at https://jugit.fz-juelich.de/fdm/schemas/-/issues/2 and will be documented in depth in our guide.

Technical Background:
Please keep in mind that a "URL" (uniform resource locator) is only a colloquial term. It's a common practice to use it, but strictly speaking, URLs are a subset of URIs (uniform resource identifiers), defined in RFC 3986.

URIs become URLs by adding an "authority" part - usually meaning a network resource - which is anything after the schema (e.g. http:), like //dataverse.org and before the "path", like /login.xhtml.

URIs without an authority are "opaque" (leaving out some other cases), URLs are always non-opaque.
Some good examples for commonly known opaque URIs: tel:+13930303, isbn:1292-92219-1212, mailto:[email protected].
That could be even more formalised into URN (uniform resource name): urn:isbn:1292-92219-1212

In Java both concepts exist as java.net.URL and java.net.URI. The key difference is that a URL object in Java always has to be backed by a scheme handler, as the API promises you can open a stream for it.

Problem:
Currently, when using a url typed metadata field in a (custom) metadata block, this will only support URLs with http, https, file and jar scheme. (See topic protocol handlers in Java Docs)

Also, the current field type url might be used to implicitly rely on being HTTP/S only. There are a lot of places in upstream metadata blocks where placeholder tell people they should provide a "full URL, starting with http://".

image
image

Suggestion:
Instead of changing the current url type and any logic beyond that, I propose to add a new type uri_no, meaning "non-opaque URI" being an alias for URLs. This will exclude URNs and URIs without the authority part, leaving any kind of URL as allowed usage for the field.

This is a surprisingly small change, PR forthcoming. It hasn't much UI impact, it's more on the metadata side of things.

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 23, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 23, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 23, 2020
…ed tests, preparing for adding more test for uri_no. IQSS#7117
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 23, 2020
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 23, 2020
@poikilotherm
Copy link
Contributor Author

poikilotherm commented Aug 10, 2020

Thank you @djbrooke and @jggautier for discussing this in our call today.

As you folks don't see Dataverse as a metadata registry only (which is one of our main use cases), I'm going to close this issue. The code will live on in our fork. If anyone is interested for a use case, feel welcome to reach out.

Curious people might want to take a look at our extensive guide on data storage linking: https://data.fz-juelich.de/guide/juelich/data-linking.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Metadata Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant