diff --git a/README.md b/README.md index 2edbb1b0c..66536cc3a 100644 --- a/README.md +++ b/README.md @@ -30,11 +30,12 @@ For more information see our HTML documentation links in the table below. | **Branch** | **Reference Documentation** | **[OpenAPI YAML description](openapi/data_repository_service.swagger.yaml)** | | --- | --- | --- | -| **master**: the current release | [HTML](https://ga4gh.github.io/data-repository-service-schemas/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/swagger-ui/#/DataRepositoryService/) | +| **master**: The current release | [HTML](https://ga4gh.github.io/data-repository-service-schemas/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/swagger-ui/#/DataRepositoryService/) | | **develop**: the stable development branch, into which feature branches are merged | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/swagger-ui/#/DataRepositoryService/) | -| **release 1.0.0**: the 1.0.0 release of DRS that we are submitting to GA4GH for standards approval | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.0.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.0.0/swagger-ui/#/DataRepositoryService/) | -| **release 0.1**: simplifying DRS to core functionality | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-0.1.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-0.1.0/swagger-ui/#/DataRepositoryService/) | -| **release 0.0.1**: the initial DRS after the rename from DOS | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/swagger-ui/#/DataRepositoryService/) | +| **release 1.1.0**: The 1.1.0 release of DRS that includes *no* API changes only documentation changes. This introduces a new URI convention using compact identifiers along with clear directions on how to use identifiers.org/n2t.net to resolve them. | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.1.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.1.0/swagger-ui/#/DataRepositoryService/) | +| **release 1.0.0**: The 1.0.0 release of DRS that is now an approved GA4GH standard | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.0.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.0.0/swagger-ui/#/DataRepositoryService/) | +| **release 0.1**: Simplifying DRS to core functionality | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-0.1.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-0.1.0/swagger-ui/#/DataRepositoryService/) | +| **release 0.0.1**: The initial DRS after the rename from DOS | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/swagger-ui/#/DataRepositoryService/) | To monitor development work on various branches, add 'preview/\' to the master URLs above (e.g., 'https://ga4gh.github.io/data-repository-service-schemas/preview/\/docs'). diff --git a/build.gradle b/build.gradle index af804d0b7..26956fa81 100644 --- a/build.gradle +++ b/build.gradle @@ -61,7 +61,7 @@ asciidoctor { sourceDir asciiDocDir outputDir file("docs") sources { - include 'index.adoc' + include 'index.adoc', 'more_background_on_compact_identifiers.adoc' } backends = ['html5', 'pdf'] attributes = [ diff --git a/docs/asciidoc/back_matter.adoc b/docs/asciidoc/back_matter.adoc index a0fc26948..882871c51 100644 --- a/docs/asciidoc/back_matter.adoc +++ b/docs/asciidoc/back_matter.adoc @@ -1,5 +1,3 @@ - - == Appendix: Motivation [cols="40a,60a"] @@ -39,3 +37,88 @@ This spec defines a standard **Data Repository Service (DRS) API** (“the yello The world's biomedical data is controlled by groups with very different policies and restrictions on where their data lives and how it can be accessed. A primary purpose of DRS is to support unified access to disparate and distributed data. (As opposed to the alternative centralized model of "let's just bring all the data into one single data repository”, which would be technically easier but is no more realistic than “let’s just bring all the websites into one single web host”.) In a DRS-enabled world, tool builders don’t have to worry about where the data their tools operate on lives -- they can count on DRS to give them access. And tool users only need to know which DRS server is managing the data they need, and whether they have permission to access it; they don’t have to worry about how to physically get access to, or (worse) make a copy of the data. For example, if I have appropriate permissions, I can run a pooled analysis where I run a single tool across data managed by different DRS servers, potentially in different locations. + +== Appendix: Background Notes on DRS URIs + +=== Design Motivation + +DRS URIs are aligned with the https://www.nature.com/articles/sdata201618[FAIR data principles] and the https://doi.org/10.1038/sdata.2018.2[Joint Declaration of Data Citation Principles] -- both hostname-based and compact identifier-based URIs provide globally unique, machine-resolvable, persistent identifiers for data. + +* We require all URIs to begin with `drs://` as a signal to humans and systems consuming these URIs that the response they will ultimately receive, after transforming the URI to a fetchable URL, will be a DRS JSON packet. This signal differentiates DRS URIs from the wide variety of other entities (HTML documents, PDFs, ontology notes, etc.) that can be represented by compact identifiers. +* We support hostname-based URIs because of their simplicity and efficiency for server and client implementers. +* We support compact identifier-based URIs, and the meta-resolver services of identifiers.org and n2t.net (Name-to-Thing), because of the wide adoption of compact identifiers in the research community. as detailed by https://doi.org/10.1038/sdata.2018.29[Wimalaratne et al (2018)] in "Uniform resolution of compact identifiers for biomedical data." + +== Appendix: Compact Identifier-Based URIs + +.Note: Identifiers.org/n2t.net API Changes +**** +The examples below show the current API interactions with https://n2t.net/e/compact_ids.html[n2t.net] and https://docs.identifiers.org/[identifiers.org] which may change over time. Please refer to the documentation from each site for the most up-to-date information. We will make best efforts to keep the DRS specification current but DRS clients MUST maintain their ability to use either the identifiers.org or n2t.net APIs to resolve compact identifier-based DRS URIs. +**** + +=== Registering a DRS Server on a Meta-Resolver + +See the documentation on the https://n2t.net/e/compact_ids.html[n2t.net] and https://docs.identifiers.org/[identifiers.org] meta-resolvers for adding your own compact identifier type and registering your DRS server as a resolver. You can register new prefixes (or mirrors by adding resource provider codes) for free using a simple online form. For more information see link:more_background_on_compact_identifiers[More Background on Compact Identifiers]. + +=== Calling Meta-Resolver APIs for Compact Identifier-Based DRS URIs + +Clients resolving Compact Identifier-based URIs need to convert a prefix (e.g. “drs.42”) into an URL pattern. They can do so by calling either the identifiers.org or the n2t.net API, since the two meta-resolvers keep their mapping databases in sync. + +==== Calling the identifiers.org API as a Client + +It takes two API calls to get the URL pattern. + +(i) The client makes a GET request to identifiers.org to find information about the prefix: + + GET https://registry.api.identifiers.org/restApi/namespaces/search/findByPrefix?prefix=drs.42 + +This request returns a JSON structure including various URLs containing an embedded namespace id, such as: + + "namespace" : { + "href":"https://registry.api.identifiers.org/restApi/namespaces/1234" + } + +(ii) The client extracts the namespace id (in this example 1234), and uses it to make a second GET request to identifiers.org to find information about the namespace: + + GET https://registry.api.identifiers.org/restApi/resources/search/findAllByNamespaceId?id=1234 + +This request returns a JSON structure including an urlPattern field, whose value is an URL pattern containing a `${id}` parameter, such as: + + "urlPattern" : "https://drs.myexample.org/ga4gh/drs/v1/objects/{$id}" + +==== Calling the n2t.net API as a Client + +It takes one API call to get the URL pattern. + +The client makes a GET request to n2t.net to find information about the namespace. (Note the trailing colon.) + + GET https://n2t.net/drs.42: + +This request returns a text structure including a redirect field, whose value is an URL pattern containing a `$id` parameter, such as: + + redirect: https://drs.myexample.org/ga4gh/drs/v1/objects/$id + +=== Caching with Compact Identifiers + +Identifiers.org/n2t.net compact identifier resolver records do not change frequently. This reality is useful for caching resolver records and their URL patterns for performance reasons. Builders of systems that use compact identifier-based DRS URIs should cache prefix resolver records from identifiers.org/n2t.net and occasionally refresh the records (such as every 24 hours). This approach will reduce the burden on these community services since we anticipate many DRS URIs will be regularly resolved in workflow systems. Alternatively, system builders may decide to directly mirror the registries themselves, instructions are provided on the identifiers.org/n2t.net websites. + +=== Security with Compact Identifiers + +As mentioned earlier, identifiers.org/n2t.net performs some basic verification of new prefixes and provider code mirror registrations on their sites. However, builders of systems that consume and resolve DRS URIs may have certain security compliance requirements and regulations that prohibit relying on an external site for resolving compact identifiers. In this case, systems under these security and compliance constraints may wish to whitelist certain compact identifier resolvers and/or vet records from identifiers.org/n2t.net before enabling in their systems. + +=== Accession Encoding to Valid DRS IDs + +The compact identifier format used by identifiers.org/n2t.net does not percent-encode reserved URI characters but, instead, relies on the first ":" character to separate prefix from accession. Since these accessions can contain any characters, and characters like "/" will interfere with DRS API calls, you _must_ percent encode the accessions extracted from DRS compact identifier-based URIs when using as DRS IDs in subsequent DRS GET requests. An easy way for a DRS client to handle this is to get the initial DRS object JSON response from whatever redirects the compact identifier resolves to, then look for the `self_uri` in the JSON, which will give you the correctly percent-encoded DRS ID for subsequent DRS API calls such as the `access` method. + +=== Additional Examples + +For additional examples, see the document link:more_background_on_compact_identifiers[More Background on Compact Identifiers]. + +== Appendix: Hostname-Based URIs + +=== Encoding DRS IDs + +In hostname-based DRS URIs, the ID is always percent-encoded to ensure special characters do not interfere with subsequent DRS endpoint calls. As such, ":" is not allowed in the URI and is a convenient way of differentiating from a compact identifier-based DRS URI. Also, if a given DRS service implementation uses compact identifier accessions as their DRS IDs, they must be percent encoded before using them as DRS IDs in hostname-based DRS URIs and subsequent GET requests to a DRS service endpoint. + +=== Future DRS Versions and Service Registry/Info + +In the future, as new major versions of DRS are released, a DRS server might support multiple API versions on different URL paths. At that point we expect to add support for https://github.com/ga4gh-discovery/ga4gh-service-registry[service-registry] and https://github.com/ga4gh-discovery/ga4gh-service-info[service-info] endpoints to the API, and to update the URI resolution logic to describe how to use those endpoints when translating hostname-based DRS URIs to URLs. diff --git a/docs/asciidoc/front_matter.adoc b/docs/asciidoc/front_matter.adoc index 5c4dd2101..492772213 100644 --- a/docs/asciidoc/front_matter.adoc +++ b/docs/asciidoc/front_matter.adoc @@ -1,8 +1,8 @@ == Introduction -The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it's stored and how it's managed. This document describes the DRS API and provides details on the specific endpoints, request formats, and responses. It is intended for developers of DRS-compatible services and of clients that will call these DRS services. +The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data objects in a single, standard way regardless of where they are stored and how they are managed. The primary functionality of DRS is to map a logical ID to a means for physically retrieving the data represented by the ID. The sections below describe the characteristics of those IDs, the types of data supported, how they can be pointed to using URIs, and how clients can use these URIs to ultimately make successful DRS API requests. This document also describes the DRS API in detail and provides information on the specific endpoints, request formats, and responses. This specification is intended for developers of DRS-compatible services and of clients that will call these DRS services. -The primary functionality of DRS is to map a logical ID to a means for physically retrieving the data represented by the ID. The sections below describe the characteristics of those IDs, the types of data supported, and how the mapping works. +The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in https://tools.ietf.org/html/rfc2119[RFC 2119]. == DRS API Principles @@ -11,16 +11,99 @@ The primary functionality of DRS is to map a logical ID to a means for physicall Each implementation of DRS can choose its own id scheme, as long as it follows these guidelines: * DRS IDs are strings made up of uppercase and lowercase letters, decimal digits, hypen, period, underscore and tilde [A-Za-z0-9.-_~]. See https://tools.ietf.org/html/rfc3986#section-2.3[RFC 3986 § 2.3]. -* Note to server implementors: internal IDs can contain other characters, but they MUST be encoded into valid DRS IDs whenever exposed by the API. +* DRS IDs can contain other characters, but they MUST be encoded into valid DRS IDs whenever they are used in API calls. This is because non-encoded IDs may interfere with the interpretation of the `objects/{id}/access` endpoint. To overcome this limitation use percent-encoding of the ID, see https://tools.ietf.org/html/rfc3986#section-2.4[RFC 3986 § 2.4] * One DRS ID MUST always return the same object data (or, in the case of a collection, the same set of objects). This constraint aids with reproducibility. -* DRS v1 does NOT support semantics around multiple versions of an object. (For example, there’s no notion of “get latest version” or “list all versions”.) Individual implementation MAY choose an ID scheme that includes version hints. * DRS implementations MAY have more than one ID that maps to the same object. +* DRS version 1.x does NOT support semantics around multiple versions of an object. (For example, there’s no notion of “get latest version” or “list all versions”.) Individual implementations MAY choose an ID scheme that includes version hints. + === DRS URIs -For convenience, including when passing content references to a WES server, we define a URI syntax for DRS-accessible content. Strings of the form `drs:///` mean _“you can fetch the content with DRS id `` from the DRS server at `` "_. +For convenience, including when passing content references to a https://github.com/ga4gh/workflow-execution-service-schemas[WES server], we define a https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Generic_syntax[URI scheme] for DRS-accessible content. This section documents the syntax of DRS URIs, and the rules clients follow for translating a DRS URI into a URL that they use for making the DRS API calls described in this spec. + +There are two styles of DRS URIs, Hostname-based and Compact Identifier-based, both using the `drs://` URI scheme. DRS servers may choose either style when exposing references to their content;. DRS clients MUST support resolving both styles. + +TIP: See <<_appendix_background_notes_on_drs_uris>> for more information on our design motivations for DRS URIs. + +==== Hostname-based DRS URIs + +Hostname-based DRS URIs are simpler than compact identifier-based URIs. They contain the DRS server name and the DRS ID only and can be converted directly into a fetchable URL based on a simple rule. They take the form: + + drs:/// + +DRS URIs of this form mean _"you can fetch the content with DRS id from the DRS server at "_. +For example, here are the client resolution steps if the URI is: + + drs://drs.example.org/314159 + +1) The client parses the string to extract the hostname of “drs.example.org” and the id of “314159”. +2) The client makes a GET request to the DRS server, using the standard DRS URL syntax: + + GET https://drs.example.org/ga4gh/drs/v1/objects/314159 + +The protocol is always https and the port is always the standard 443 SSL port. It is invalid to include a different port in a DRS hostname-based URI. + +TIP: See the <<_appendix_hostname_based_uris>> for information on how hostname-based DRS URI resolution to URLs is likely to change in the future, when the DRS v2 major release happens. + +==== Compact Identifier-based DRS URIs + +Compact Identifier-based DRS URIs use resolver registry services (specifically, https://identifiers.org/[identifiers.org] and https://n2t.net/[n2t.net (Name-To-Thing)]) to provide a layer of indirection between the DRS URI and the DRS server name -- the actual DNS name of the DRS server isn’t present in the URI. This approach is based on the Joint Declaration of Data Citation Principles as detailed by https://doi.org/10.1038/sdata.2018.29[Wimalaratne et al (2018)]. + +For more information, see the document link:more_background_on_compact_identifiers[More Background on Compact Identifiers]. + +Compact Identifiers take the form: + + drs://[provider_code/]namespace:accession + +Together, provider code and the namespace are referred to as the _prefix_. The provider code is optional and is used by identifiers.org/n2t.net for compact identifier resolver mirrors. Both the `provider_code` and `namespace` disallow spaces or punctuation, only lowercase alphanumerical characters, underscores and dots are allowed (e.g. [A-Za-z0-9._]). + +TIP: See the <<_appendix_compact_identifier_based_uris>> for more background on Compact Identifiers and resolver registry services like identifiers.org/n2t.net (aka meta-resolvers), how to register prefixes, possible caching strategies, and security considerations. + +===== For DRS Servers + +If your DRS implementation will issue DRS URIs based on _your own_ compact identifiers, you MUST first register a new prefix with identifiers.org (which is automatically mirrored to n2t.net). You will also need to include a provider resolver resource in this registration which links the prefix to your DRS server, so that DRS clients can get sufficient information to make a successful DRS GET request. For clarity, we recommend you choose a namespace beginning with `drs.`. + +===== For DRS Clients + +A DRS client parses the DRS URI compact identifier components to extract the prefix and the accession, and then uses meta-resolver APIs to locate the actual DRS server. For example, here are the client resolution steps if the URI is: + + drs://drs.42:314159 + +1) The client parses the string to extract the prefix of `drs.42` and the accession of `314159`, using the first occurrence of a colon (":") character after the initial `drs://` as a delimiter. (The colon character is not allowed in a Hostname-based DRS URI, making it easy to tell them apart.) + +2) The client makes API calls to a meta-resolver to look up the URL pattern for the namespace. (See <<_calling_meta_resolver_apis_for_compact_identifier_based_drs_uris>> for details.) The URL pattern is a string containing a `{$id}` parameter, such as: + + https://drs.myexample.org/ga4gh/drs/v1/objects/{$id} + +3) The client generates a DRS URL from the URL template by replacing {$id} with the accession it extracted in step 1. It then makes a GET request to the DRS server: + + GET https://drs.myexample.org/ga4gh/drs/v1/objects/314159 + +4) The client follows any HTTP redirects returned in step 3, in case the resolver goes through an extra layer of redirection. + +For performance reasons, DRS clients SHOULD cache the URL pattern returned in step 2, with a suggested 24 hour cache life. + +==== Choosing a URI Style + +DRS servers can choose to issue either hostname-based or compact identifier-based DRS URIs, and can be confident that compliant DRS clients will support both. DRS clients *must* be able to accommodate both URI types. Tradeoffs that DRS server builders, and third parties who need to cite DRS objects in datasets, workflows or elsewhere, may want to consider include: + +.Table Choosing a URI Style +|=== +| |Hostname-based |Compact Identifier-based + +|URI Durability +|URIs are valid for as long as the server operator maintains ownership of the published DNS address. (They can of course point that address at different physical serving infrastructure as often as they’d like.) +|URIs are valid for as long as the server operator maintains ownership of the published compact identifier resolver namespace. (They also depend on the meta-resolvers like identifiers.org/n2t.net remaining operational, which is intended to be essentially forever.) + +|Client Efficiency +|URIs require minimal client logic, and no network requests, to resolve. +|URIs require small client logic, and 1-2 cacheable network requests, to resolve. + +|Security +|Servers have full control over their own security practices. +|Server operators, in addition to maintaining their own security practices, should confirm they are comfortable with the resolver registry security practices, including protection against denial of service and namespace-hijacking attacks. (See the <<_appendix_compact_identifier_based_uris>> for more information on resolver registry security.) -For example, if a WES server was asked to process `drs://drs.example.org/314159`, it would know that it could issue a GET request to `https://drs.example.org/ga4gh/drs/v1/objects/314159` to learn how to fetch that object. +|=== === DRS Datatypes @@ -35,17 +118,17 @@ DRS v1 is a read-only API. We expect that each implementation will define its ow === Standards -The DRS API specification is written in OpenAPI and embodies a RESTful service philosophy. It uses JSON in requests and responses and standard HTTPS for information transport. +The DRS API specification is written in OpenAPI and embodies a RESTful service philosophy. It uses JSON in requests and responses and standard HTTPS on port 443 for information transport. == Authorization & Authentication === Making DRS Requests -The DRS implementation is responsible for defining and enforcing an authorization policy that determines which users are allowed to make which requests. GA4GH recommends that DRS implementations use an OAuth 2.0 https://oauth.net/2/bearer-tokens/[bearer token], although they can choose other mechanisms if appropriate. +The DRS implementation is responsible for defining and enforcing an authorization policy that determines which users are allowed to make which requests. GA4GH recommends that DRS implementations use an OAuth 2.0 https://oauth.net/2/bearer-tokens/[bearer token], although they can choose other mechanisms if appropriate. === Fetching DRS Objects -The DRS API allows implementers to support a variety of different content access policies, depending on what `AccessMethod` s they return: +The DRS API allows implementers to support a variety of different content access policies, depending on what `AccessMethod` records they return: * public content: ** server provides an `access_url` with a `url` and no `headers` diff --git a/docs/asciidoc/more_background_on_compact_identifiers.adoc b/docs/asciidoc/more_background_on_compact_identifiers.adoc new file mode 100644 index 000000000..1eb2bdcf1 --- /dev/null +++ b/docs/asciidoc/more_background_on_compact_identifiers.adoc @@ -0,0 +1 @@ +include::more_background_on_compact_identifiers_content.adoc[] diff --git a/docs/asciidoc/more_background_on_compact_identifiers_content.adoc b/docs/asciidoc/more_background_on_compact_identifiers_content.adoc new file mode 100644 index 000000000..2d29c72a1 --- /dev/null +++ b/docs/asciidoc/more_background_on_compact_identifiers_content.adoc @@ -0,0 +1,128 @@ +== About + +This document contains more examples of resolving compact identifier-based DRS URIs than we could fit in the DRS specification or appendix. It's provided here for your reference as a supplement to the specification. + +== Background on Compact Identifier-Based URIs + +Compact identifiers refer to locally-unique persistent identifiers that have been namespaced to provide global uniqueness. See https://www.biorxiv.org/content/10.1101/101279v3["Uniform resolution of compact identifiers for biomedical data"] for an excellent introduction to this topic. By using compact identifiers in DRS URIs, along with a resolver registry (identifiers.org/n2t.net), systems can identify the current resolver when they need to translate a DRS URI into a fetchable URL. This allows a project to issue compact identifiers in DRS URIs and not be concerned if the project name or DRS hostname changes in the future, the current resolver can always be found through the identifiers.org/n2t.net registries. Together the identifiers.org/n2t.net systems support the resolver lookup for over 700 compact identifiers formats used in the research community, making it possible for a DRS server to use any of these as DRS IDs (or to register a new compact identifier type and resolver service of their own). + +We use a DRS URI scheme rather than https://en.wikipedia.org/wiki/CURIE[Compact URIs (CURIEs)] directly since we feel that systems consuming DRS objects will be able to better differentiate a DRS URI. CURIEs are widely used in the research community and we feel the fact that they can point to a wide variety of entities (HTML documents, PDFs, identities in data models, etc) makes it more difficult for systems to unambiguously identify entities as DRS objects. + +Still, to make compact identifiers work in DRS URIs we leverage the CURIE format used by identifiers.org/n2t.net. Compact identifiers have the form: + + prefix:accession + +The prefix can be divided into a `provider_code` (optional) and `namespace`. The `accession` here is an Ark, DOI, Data GUID, or another issuers's local ID for the object being pointed to: + + [provider_code/]namespace:accession + +Both the `provider_code` and `namespace` disallow spaces or punctuation, only lowercase alphanumerical characters, underscores and dots are allowed. + +https://n2t.net/e/compact_ids.html[Examples] include (from n2t.net): + + PDB:2gc4 + Taxon:9606 + DOI:10.5281/ZENODO.1289856 + ark:/47881/m6g15z54 + IGSN:SSH000SUA + +TIP: DRS URIs using compact identifiers with resolvers registered in identifiers.org/n2t.net can be distinguished from the hostname-based DRS URIs below based on the required ":" which is not allowed in hostname-based URI. + + +See the documentation on https://n2t.net/e/compact_ids.html[n2t.net] and https://docs.identifiers.org/[identifiers.org] for much more information on the compact identifiers used there and details about the resolution process. + +== Registering a DRS Server on a Meta-Resolver + +See the documentation on the https://n2t.net/e/compact_ids.html[n2t.net] and https://docs.identifiers.org/[identifiers.org] meta-resolvers for adding your own compact identifier type and registering your DRS server as a resolver. You can register new prefixes (or mirrors by adding resource provider codes) for free using a simple online form. + +Keep in mind, while anyone can register prefixes, the identifiers.org/n2t.net sites do basic hand curation to verify new prefix and resource (provider code) requests. See those sites for more details on their security practices. For more information see + +Starting with the prefix for our new compact identifier, let's register the namespace `mydrsprefix` on identifiers.org/n2t.net and use 5-digit numeric IDs as our accessions. We will then link this to the DRS server at `https://mydrs.server.org/ga4gh/drs/v1/` by filling in the provider details. Here's what that the registration for our new namespace looks like on https://registry.identifiers.org/prefixregistrationrequest[identifiers.org]: + +image::prefix_register_1.png[] + +image::prefix_register_2.png[] + +== Example DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider + +A DRS client identifies the a DRS URI compact identifier components using the first occurance of "/" (optional) and ":" characters. These are not allowed inside the provider_code (optional) or the namespace. The ":" character is not allowed in a Hostname-based DRS URI, providing a convenient mechanism to differentiate them. Once the provider_code (optional) and namespace are extracted from a DRS compact identifier-based URI, a client can use services on identifiers.org to identify available resolvers. + +_Let's look at a specific example DRS compact identifier-based URI that uses DOIs, a popular compact identifier, and walk through the process that a client would use to resolve it. Keep in mind, the resolution process is the same from the client perspective if a given DRS server is using an existing compact identifier type (DOIs, ARKs, Data GUIDs) or creating their own compact identifier type for their DRS server and registering it on identifiers.org/n2t.net._ + +Starting with the DRS URI: + +[source,bash] +---- +drs://doi:10.5072/FK2805660V +---- + +with a namespace of "doi", the following GET request will return information about the namespace: + + GET https://registry.api.identifiers.org/restApi/namespaces/search/findByPrefix?prefix=doi + +This information then points to resolvers for the "doi" namespace. This "doi" namespace was assigned a namespace ID of 75 by identifiers.org. This "id" has nothing to do with compact identifier accessions (which are used in the URL pattern as `{$id}` below) or DRS IDs. This namespace ID (75 below) is purely an identifiers.org internal ID for use with their APIs: + + GET https://registry.api.identifiers.org/restApi/resources/search/findAllByNamespaceId?id=75 + +This returns enough information to, ultimately, identify one or more resolvers and each have a URL pattern that, for DRS-supporting systems, provides a URL template for making a successful DRS GET request. For example, the DOI urlPattern is: + + urlPattern: "https://doi.org/{$id}" + +And the `{$id}` here refers to the accession from the compact identifier (in this example the accession is `10.5072/FK2805660V`). If applicable, a provide code can be supplied in the above requests to specify a particular mirror if there are multiple resolvers for this namespace. In the case of DOIs, you only get a single resolver. + +Given this information you now know you can make a GET on the URL: + + GET https://doi.org/10.5072/FK2805660V + +_The URL above is valid for a DOI object but it is not actually a DRS server! Instead, it redirects to a DRS server through a series of HTTPS redirects. This is likely to be common when working with existing compact identifiers like DOIs or ARKs. Regardless, the redirect should eventually lead to a DRS URL that percent-encodes the accession as a DRS ID in a DRS object API call. For a **hypothetical** example, here's what a redirect to a DRS API URL might ultimately look. A client doesn't have to do anything other than follow the HTTPS redirects. The link between the DOI resolver on doi.org and the DRS server URL below is the result of the DRS server registering their data objects with a DOI issuer._ + + GET https://drs.example.org/ga4gh/drs/v1/objects/10.5072%2FFK2805660V + +IDs in DRS hostname-based URIs/URLs are always percent-encoded to eliminate ambiguity even though the DRS compact identifier-based URIs and the identifier.orgs API do not percent-encode accessions. This was done in order to 1) follow the CURIE conventions of identifiers.org/n2t.net for compact identifier-based DRS URIs and 2) to aid in readability for users who understand they are working with compact identifiers. **The general rule of thumb, when using a compact identifier accession as a DRS ID in a DRS API call, make sure to percent-encode it. An easy way for a DRS client to handle this is to get the initial DRS object JSON response from whatever redirects the compact identifier resolves to, then look for the `self_uri` in the JSON, which will give you the correctly percent-encoded DRS ID for subsequent DRS API calls such as the `access` method.** + + +== Example DRS Client Compact Identifier-Based URI Resolution Process - Registering a new Compact Identifier for Your DRS Server + +See the documentation on https://n2t.net/e/compact_ids.html[n2t.net] and https://docs.identifiers.org/[identifiers.org] for adding your own compact identifier type and registering your DRS server as a resolver. We document this in more detail in the link:index.html#_registering_a_drs_server_on_a_meta_resolver[main specification document]. + +Now the question is how does a client resolve your newly registered compact identifier for your DRS server? _It turns out, whether specific to a DRS implementation or using existing compact identifiers like ARKs or DOIs, the DRS client resolution process for compact identifier-based URIs is exactly the same._ We briefly run through process below for a new compact identifier as an example but, again, a client will not need to do anything different from the resolution process documented in "DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider". + +Now we can issue DRS URI for our data objects like: + +[source,bash] +---- +drs://mydrsprefix:12345 +---- + +This is a little simpler than working with DOIs or other existing compact identifier issuers out there since we can create our own IDs and not have to allocate them through a third-party service (see "Issuing Existing Compact Identifiers for Use with Your DRS Server" below). + +With a namespace of "mydrsprefix", the following GET request will return information about the namespace: + + GET https://registry.api.identifiers.org/restApi/namespaces/search/findByPrefix?prefix=mydrsprefix + +_Of course, this is a hypothetical example so the actual API call won't work but you can see the GET request is identical to "DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider"._ + +This information then points to resolvers for the "mydrsprefix" namespace. Hypothetically, this "mydrsprefix" namespace was assigned a namespace ID of 1829 by identifiers.org. This "id" has nothing to do with compact identifier accessions (which are used in the URL pattern as `{$id}` below) or DRS IDs. This namespace ID (1829 below) is purely an identifiers.org internal ID for use with their APIs: + + GET https://registry.api.identifiers.org/restApi/resources/search/findAllByNamespaceId?id=1829 + +_Like the previous GET request this URL won't work but you can see the GET request is identical to "DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider"._ + +This returns enough information to, ultimately, identify one or more resolvers and each have a URL pattern that, for DRS-supporting systems, provides a URL template for making a successful DRS GET request. For example, the "mydrsprefix" urlPattern is: + + urlPattern: "https://mydrs.server.org/ga4gh/drs/v1/objects/{$id}" + +And the `{$id}` here refers to the accession from the compact identifier (in this example the accession is `12345`). If applicable, a provide code can be supplied in the above requests to specify a particular mirror if there are multiple resolvers for this namespace. + +Given this information you now know you can make a GET on the URL: + + GET https://mydrs.server.org/ga4gh/drs/v1/objects/12345 + +So, compared to using a third party service like DOIs and ARKs, this would be a direct pointer to a DRS server. However, just as with "DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider", the client should always be prepared to follow HTTPS redirects. + +_To summarize, a client resolving a custom compact identifier registered for a single DRS server is actually the same as resolving using a third-party compact identifier service like ARKs or DOIs with a DRS server, just make sure to follow redirects in all cases._ + +.Note: Issuing Existing Compact Identifiers for Use with Your DRS Server +**** +See the documentation on https://n2t.net/e/compact_ids.html[n2t.net] and https://docs.identifiers.org/[identifiers.org] for information about all the compact identifiers that are supported. You can choose to use an existing compact identifier provider for your DRS server, as we did in the example above using DOIs ("DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider"). Just keep in mind, each provider will have their own approach for generating compact identifiers and associating them with a DRS data object URL. Some compact identifier providers, like DOIs, provide a method whereby you can register in their network and get your own prefix, allowing you to mint your own accessions. Other services, like the University of California's https://ezid.cdlib.org/[EZID] service, provide accounts and a mechanism to mint accessions centrally for each of your data objects. For experimentation we recommend you take a look at the EZID website that allows you to create DOIs and ARKs and associate them with your data object URLs on your DRS server for testing purposes. +**** diff --git a/docs/asciidoc/prefix_register_1.png b/docs/asciidoc/prefix_register_1.png new file mode 100644 index 000000000..ac4902b9d Binary files /dev/null and b/docs/asciidoc/prefix_register_1.png differ diff --git a/docs/asciidoc/prefix_register_2.png b/docs/asciidoc/prefix_register_2.png new file mode 100644 index 000000000..ec6a2e3a6 Binary files /dev/null and b/docs/asciidoc/prefix_register_2.png differ diff --git a/docs/prefix_register_1.png b/docs/prefix_register_1.png new file mode 100644 index 000000000..ac4902b9d Binary files /dev/null and b/docs/prefix_register_1.png differ diff --git a/docs/prefix_register_2.png b/docs/prefix_register_2.png new file mode 100644 index 000000000..ec6a2e3a6 Binary files /dev/null and b/docs/prefix_register_2.png differ diff --git a/openapi/data_repository_service.swagger.yaml b/openapi/data_repository_service.swagger.yaml index ad646e57c..2e023d37b 100644 --- a/openapi/data_repository_service.swagger.yaml +++ b/openapi/data_repository_service.swagger.yaml @@ -2,7 +2,7 @@ swagger: '2.0' basePath: '/ga4gh/drs/v1' info: title: Data Repository Service - version: 1.0.0 + version: 1.1.0 description: 'https://github.com/ga4gh/data-repository-service-schemas' termsOfService: 'https://www.ga4gh.org/terms-and-conditions/' contact: @@ -81,7 +81,7 @@ paths: contains only those objects directly contained in the bundle. That is, if the bundle contains other bundles, those other bundles are not recursively included in the result. - + If true and the object_id refers to a bundle, then the entire set of objects in the bundle is expanded. That is, if the bundle contains aother bundles, then those other bundles are recursively expanded and included in the result. @@ -203,8 +203,8 @@ definitions: self_uri: type: string description: |- - A drs:// URI, as defined in the DRS documentation, that tells clients how to access this object. - The intent of this field is to make DRS objects self-contained, and therefore easier for clients to store and pass around. + A drs:// hostname-based URI, as defined in the DRS documentation, that tells clients how to access this object. + The intent of this field is to make DRS objects self-contained, and therefore easier for clients to store and pass around. For example, if you arrive at this DRS JSON by resolving a compact identifier-based DRS URI, the `self_uri` presents you with a hostname and properly encoded DRS ID for use in subsequent `access` endpoint calls. example: drs://drs.example.org/314159 size: @@ -218,7 +218,7 @@ definitions: format: date-time description: |- Timestamp of content creation in RFC3339. - (This is the creation time of the underlying content, not of the JSON object.) + (This is the creation time of the underlying content, not of the JSON object.) updated_time: type: string format: date-time @@ -373,7 +373,7 @@ definitions: A name declared by the bundle author that must be used when materialising this object, overriding any name directly associated with the object itself. - The name must be unique with the containing bundle. + The name must be unique with the containing bundle. This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282[portable filenames]. id: type: string @@ -399,7 +399,7 @@ definitions: describe the objects within the nested bundle. items: $ref: '#/definitions/ContentsObject' - + required: - name tags: diff --git a/package.json b/package.json index b8873c585..dabfe3c46 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "Data-Repository-Service-openapi-spec", - "version": "1.0.0", + "version": "1.1.0", "dependencies": { "bower": "^1.7.7", "connect": "^3.4.1", diff --git a/scripts/stagepages.sh b/scripts/stagepages.sh index 72f495458..ce103e6bc 100644 --- a/scripts/stagepages.sh +++ b/scripts/stagepages.sh @@ -11,7 +11,9 @@ if [ "$TRAVIS_BRANCH" != "gh-pages" ]; then echo $branchpath mkdir -p "$branchpath/docs" cp docs/html5/index.html "$branchpath/docs/" + cp docs/html5/more_background_on_compact_identifiers.html "$branchpath/docs/" cp docs/pdf/index.pdf "$branchpath/docs/" + cp docs/pdf/more_background_on_compact_identifiers.pdf "$branchpath/docs/" cp docs/asciidoc/*.png "$branchpath/docs/" cp openapi/data_repository_service.swagger.yaml "$branchpath/swagger.yaml" cp -R web_deploy/* "$branchpath/"