Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registries for extension discovery vs. machine-readable descriptions #5

Open
tedepstein opened this issue Oct 1, 2017 · 29 comments
Open

Comments

@tedepstein
Copy link
Contributor

No description provided.

@tedepstein
Copy link
Contributor Author

tedepstein commented Oct 1, 2017

Continuing from this thread, @dret wrote:

On 2017-10-01 19:37, Ted Epstein wrote:

Today's OpenAPI editors /tolerate/ these extensions, but don't usually
provide specific code assist, tool tips, or validation for them.
Specification extensions are kind of a second-class feature of the
language, where most editors are concerned, unless we go to the trouble
of supporting specific extensions that we think are important, in a
completely proprietary way.

not having an... extension description model does not turn them
into "second-class" in a negative sense, but of course they are not part
of the spec itself. not having extension discovery is a different
matter, and in my mind is the more important first step:

https://tools.ietf.org/html/draft-wilde-registries

afaict, there are far more open standards where the discovery/registry
model proved helpful and successful, than there are open standards where
a universal description model had the same success. the usual problem is
that extensions have different motivations, and it's hard to predict these.

but: i think that machine-readable registries alone would be a step
forward, so that at least there is some possible automation that can be
built around how to handle unknown extensions. but what to expect when
then looking at them, that is an entirely different matter.

(To clarify, a machine-readable metadata standard is not a registry, nor
a pre-requisite to creating a useful registry. It's also not in scope as
a feature of the OpenAPI specification itself. But I'm posting the link
here because it's related, and hopefully of interest to you if you're
following this issue. )

to me, such a standard could be one that extensions use when you started
to explore them. but the first question always will be: how to you even
find a place where to go, when encountering an extension?

@tedepstein
Copy link
Contributor Author

tedepstein commented Oct 1, 2017

@dret, as I explained on the original comment thread, Semoasa isn't trying to compete with a human-readable list or table of OpenAPI specification extensions, intended for extension discovery. That kind of document could optionally refer to a Semoasa document, where available, for some of those extensions. But I would only suggest adding that information as and when there is significant adoption of Semoasa by tool vendors and extension providers.

I don't think Semoasa is an elaborate format. That's certainly not the intent. If you think there are areas that are unnecessarily complex, given the use cases that Semoasa is trying to support, please comment specifically on those areas.

but: i think that machine-readable registries alone would be a step
forward, so that at least there is some possible automation that can be
built around how to handle unknown extensions. but what to expect when
then looking at them, that is an entirely different matter.

Do you think a registry like the one proposed in the OAS specification repo should be machine-readable?

One possibility is to use a format like Semoasa, using only the purely descriptive metadata, omitting the schema and oas2/oas3 usage contexts. If we anticipate that kind of usage, we'll want to make sure that the schema and usage context properties are optional.

@dret
Copy link

dret commented Jan 11, 2019

a machine-readable registry definitely would be a good idea, if only for the reason that then it's easy to harvest and use the data without brittle screen-scraping. i think by now we should all have learned from the IANA registry that it can be inconvenient to not be able to process registry data.
a different is whether descriptions of registry entries should be machine-readable. that's what semoasa does, right? i think there the bar is much higher, and it may be hard to define a format that works for all potential extension authors.
but i can imagine a model where there is a small snippet (not really the definition of extensions, but a small subset of their definition that is supposed to show up in the registry) that must be submitted in a machine-readable format for all extensions, which then would serve as a foundation and make it easier to manage registry entries, and publish the registry in a machine-readable way.

@tedepstein
Copy link
Contributor Author

tedepstein commented Jan 11, 2019

@dret, Again, Semoasa does not replace or compete with the official OpenAPI registry; it serves a different purpose.

There is now a draft page for the OpenAPI registry of extensions, formats, alternative schema types, etc. The current specification extension registry is very simple. It's backed by an API for machine readability. But the data it provides is really only suitable for human consumption, either directly through the web page, or indirectly through the API, to be presented in some user-facing context.

Semoasa is intended to solve a different problem: tool support for specification extensions, especially validation and code assist. It is indeed a "higher bar," and having a Semoasa document is not a prerequisite to the inclusion of a specification extension in the OAI registry.

The core of Semoasa is a Schema Object, which is the machine-readable metadata that an editor can use to provide code assist and validation support. This is taken directly from OpenAPI, and based on JSON Schema, which is proven to provide good-enough support for many use cases, in many editors.

For example, early versions of KaiZen OpenAPI Editor provided code assist and validation of OpenAPI documents using a JSON Schema description of the OpenAPI specification. It left some gaps, and we later added advanced validations, code assist for $ref properties, etc. to pick up where the schema left off. But even those early releases of KZOE were a much better than typing into a standard YAML editor that doesn't know anything about OpenAPI.

Likewise, specification extensions may have semantic constraints that can't be captured in a schema. That's OK. There's still a lot of value to be gained by facilitating editing and providing basic schema-driven validation, even if that validation can't check all of the logical rules defined in a human-readable specification. Having basic editor support for specification extensions is a win for end users and extension providers.

@dret
Copy link

dret commented Jan 11, 2019 via email

@tedepstein
Copy link
Contributor Author

@darrelmiller just pointed me to
http://spec.openapis.org/registry/extension/
(https://twitter.com/darrel_miller/status/1083771084506304513) so i am
guessing this draft page and repo now have been incorporated into the
main repo?

Yes, thanks for that update. I think this got published to OpenAPIs.org just after this week's TSC call.

my thought that it would be good to have a "mini-schema" that actually
would be required for a submission to the registry. it would be what
would generate the registry entry, if the submission is accepted.
semoasa then could be used to generate that "mini-schema" from it,
right? but i guess that's a pointless discussion because right now it is
not defined what a PR or an issue needs to include in order to request a
registry update.

I see. So there would be some kind of JSON object that represents an OAI registry listing, and a "mini-schema" that describes that JSON format. Is that right?

If that's the idea, and if it's natural for Semoasa to contain a superset of that OAI registry listing, then it should be easy to generate.

On the TSC call, @MikeRalphson also mentioned the possibility of using APIs.json as the metadata format for registry listings. APIs.json would probably need to be extended to include metadata for things other than APIs. (@kinlane, interested to know if you think this kind of use case would be in scope...)

If nothing else, APIs.json encapsulates a lot of good thinking about how to capture the details that make for a robust catalog/registry ecosystem. Definitely worth a look as we're evolving Semoasa and the OAI Registry.

@dret
Copy link

dret commented Jan 12, 2019 via email

@MikeRalphson
Copy link
Contributor

@tedepstein did I mention APIs.json for this? Was it this week's call or a previous one?

My intention is that the extensions registry MUST include a schema in the metadata (YAML front-matter) of each extension entry. This would be codified by a (as yet not-existing) CONTRIBUTING.md guidance/policy document. The YAML front-matter is available via the registry API, though the template used to generate the API could be revised to make the output smaller and more easily machine-readable (this would include the schema object).

@dret
Copy link

dret commented Jan 14, 2019 via email

@MikeRalphson
Copy link
Contributor

An example of the metadata (+markdown) for an extension within the current registry is: https://raw.githubusercontent.com/OAI/OpenAPI-Specification/523ad5747c7be6e523bef8768fa13a8a37044ea0/registries/_extension/x-twitter.md

There you can see we plan on holding:

owner: MikeRalphson
issue: XXX
description: Used to hold a reference to the API provider's Twitter account.
schema:
  type: string
objects: [ "contactObject" ]

The owner would be the point of contact for the registry entry, and the issue (or PR) number would capture originating author, date and discussion prior to acceptance.

There is no real complexity in adding an API feed which includes all (typed) registry entries.

@dret
Copy link

dret commented Jan 14, 2019 via email

@tedepstein
Copy link
Contributor Author

tedepstein commented Jan 14, 2019

Hi @dret , @MikeRalphson , just catching up.

@tedepstein did I mention APIs.json for this? Was it this week's call or a previous one?

I thought it came up on this week's call, and thought it was mentioned in the context of the registry. I could be mistaken. In any case, I think Erik is right that APIs.json is a good example to follow in the general sense, but probably not directly applicable to the OAI registry.

My intention is that the extensions registry MUST include a schema in the metadata (YAML front-matter) of each extension entry.

FYI, I'm finding this pretty confusing.

First, the term "YAML front-matter" is new to me. The YAML specification doesn't define "front-matter," but it says this:

YAML uses three dashes (“---”) to separate directives from document content. This also serves to signal the start of a document if no directives are present.

Since the only YAML directives are YAML and tag, I assume we're talking about the latter case - a compound YAML stream consisting of a metadata document, followed by a content document. And I see that "YAML front matter" seems to be a convention adopted by Jekyll, assemble.io, and maybe others, following this pattern of prepending the main content with some YAML metadata, offset by the --- delimiter into its own YAML document.

Fair enough. Except...

  • Is there a plan for our registry API to allow submission of new registry listings? Would you want that API to be described in OpenAPI? Consider that OpenAPI doesn't (AFAIK) have any way to specify that a message payload should be a composite, with the first document confoming to a specified YAML schema, and the second document being markdown.
  • Overall what I'm seeing your example looks like it's optimized for easy ingest by a CMS or static site generator, not optimized to capture descriptive registry data in a uniform and structured way. Assuming the markdown content is specified in the registry submission, this seems to allow a lot of freedom for the listing to control its own presentation. Is this the intent? Or should the listing just provide its relevant metadata, and allow the registry to present it in a uniform way, using templates?
  • The metadata looks incomplete. I don't see a property for extension property name, though I can see from the content that this should be x-twitter.

Also, I think this discussion may have gotten pretty severely muddled in its use of the word "schema." A Semoasa description contains a schema for each defined extension property. That schema is intended for use by tools to validate correct usage of the extension property. So for example, the x-twitter extension property has to have a string value, maybe with some other constraints.

@dret, what started this discussion was my posting a link to Semoasa in the Extension Registry issue in OAI/OpenAPI-Specification. Condensing the discussion considerably:

  • You said machine-readable descriptions of specification extensions were too high a bar, and shouldn't be a required part of a specification extension registry.
  • I agreed, and said that's why Semoasa is a separate specification.
  • After some back-and-forth, we seemed to be in heated agreement that the OAI registry should be lightweight, built for human consumption; and Semoasa should be separate, providing a deeper level of machine-readability, intended for tool support.
  • As a minor point of convenience, Semoasa could be a superset of an OAI extension registry listing, in which case the latter can be generated from the former.

You also thought a "mini-schema" would be useful in the context of the OAI extension registry, but after some discussion, I thought we had established that you were talking about a completely different kind of schema. Specifically, you meant a schema that describes a valid extension registry listing. So anyone submitting a registry listing knowns what data is required and what format it should be in; and anyone wanting to read extension registry listings through the API knows what to expect.

This schema usage is exactly what we see in OpenAPI documents. It's intended to describe and validate extension registry listings, not intended to describe or validate usage of an extension property.

@dret, just to be sure, did I understand that correctly?

@MikeRalphson, looking at your example, I see that it includes these bits:

schema:
  type: string
objects: [ "contactObject" ]

That seems like it's going in the opposite direction from what we've been discussing, now integrating the metadata required for machine-readability and schema validation. That would make Semoasa redundant.

If it's really going to be one registry for extensions, covering both human-focused use cases and machine-readability for tools integration, I actually do not have a strong objection (though @dret might feel more strongly about this than I do).

But in that case, there are some structural features of Semoasa that we might want to consider in the registry design:

  • Grouping of related extension properties under common control within a namespace.
  • Support for catalog patterns, so organizations can create internal registries and integrate them seamlessly into supporting tools.
  • Support for unrestricted extension properties that can be used wherever specification extensions are allowed.
  • Applicability of an extension property to different OpenAPI versions, each with its own set of allowed objects.
  • Ability to mark an extension property as deprecated
  • Link to externalDocs

I would be happy to see Semoasa and the OAI registry as separate things, with optional linkages between the two, and an eventual path to convergence as and when it makes sense.

I would be happy to see the Semoasa use cases fully supported in the OAI specification extension registry, so there's no need for Semoasa. But I think it might be premature to do that.

I would not be too happy to see the OAI specification extension registry supporting its own "mini-schema" approach to machine-readability, without fully addressing the needs of tool providers and adopting organizations. I think it will make it harder for Semoasa to get buy-in, and we won't get the opportunity to find out what's really going to work best for the ecosystem.

@tedepstein
Copy link
Contributor Author

tedepstein commented Jan 14, 2019

@MikeRalphson, I may have misunderstood your example. I assumed that this is what an extension registry listing would look like, as submitted by the owner, and as retrieved through the API.

But maybe you meant that only the YAML front-matter is what's submitted and managed as the extension listing record; while the markdown that follows is a template, generated from that metadata and used only to render the extension listing page on the website.

Is that right? If so, please disregard these two bullet points:

  • Is there a plan for our registry API to allow submission of new registry listings? ...
  • Overall what I'm seeing your example looks like it's optimized for easy ingest by a CMS...

Why I thought the markdown was part of the extension listing data:

# <a href="..">{{ page.collection }}</a>

## {{ page.slug }} - {{ page.description }}

The `x-twitter` extension is used to hold a reference to the API provider's Twitter account. It can appear as a property in the following objects: `{{page.objects|jsonify}}`.

Mixing of template placeholders with some text that looks specific to this extension listing. Maybe that last line is pre-generated from page.description, prior to template expansion...? Or maybe you pasted it in just to give us a sense of the final content...

### Example
```yaml
openapi: 3.0.0
info:
title: My API
version: 1.0.0
contact:
x-twitter: APIs-guru
```

Is the example auto-generated? It looks too realistic to be auto-generated, unless "APIs-guru" is Jekyll's default value for string. ;-)

Assuming it's not auto-generated, shouldn't there be an example property that is managed by the API, and therefore included in the YAML front-matter?

Used by: (informational)

* APIs.guru

Also missing from the YAML front-matter.

So I'm not sure what I'm looking at when I see the example document. It seems to be a mix of:

  • generic static text
  • template placeholders
  • extension-specific content that might be derived from YAML metadata
  • extension-specific content that clearly is not included in the metadata

Kindly clarify. Thanks!

@tedepstein
Copy link
Contributor Author

tedepstein commented Jan 14, 2019

@dret wrote:

my advice would be to not tightly couple this with github. have a mandatory github identity, but have a mandatory email, too.
...
same here: maybe have a mandatory "submission URI" which for now is the
issue URI. but capture essential metadata such as the timestamp in the
metadata as well.

Agreed.

description: Used to hold a reference to the API provider's Twitter account.

I may not understand the data model here: what API? shouldn't that be
clear text that can be shown in an overview of all extensions?

I think this is best understood in the context of OpenAPI, specifically the Contact Object that is being extended.

An OpenAPI document describes an API. The Contact Object identifies the party associated with the API , usually the API provider. The x-twitter extension property is intended to specify a twitter account associated with the contact.

schema:
type: string

if the schema is to be understood, it would need to be more than that,
right?

type: string is actually a complete and legal JSON Schema describing a string value. Even in this very simple case, it could be made more precise by adding a regular expression constraint:

schema:
  type: string
  pattern: "^@?(\w){1,15}$"

An extension property that expects an object value will be more complex, having subschemas for each property, specifying required properties, etc.

i could easily imagine:

  • usage: list of types where the extension is allowed to appear.
  • schema: some machine/human-readable schema. or maybe just an
    explanation of what the extension captures and the actual schema is left
    to the external documentation?

The objects[...] array in Mike's example is what you're referring to as usage.

schema appears to be a machine-readable JSON Schema, provided in YAML format. (A JSON-formatted JSON Schema would also work, because valid JSON is also valid YAML.)

As I said in my earlier comment, I think there's a question of scope: should this information go into the OAI registry, or should it go into a separate project like Semoasa, giving tool providers a space to experiment and figure out how best to support the evolving landscape of specification extensions?

If we are going to provide "an explanation of what the extension captures," that's fine, but please don't call it a "schema!" This information should go into the existing description property. Maybe we need a summary for the short version, and description for a more complete explanation.

for the extension schema, you might want to make explicit that the
schema needs to have an extension model as well. i.e., if the extension
is a structured object (i.e., not just a value), it needs to be
documented whether that may ever be changed, and if so, how (and
probably only backwards-compatible changes should be allowed here). you
could ask for this to be verbalized in another submission field.

I think schema evolution is one of several concerns that comprise a change control policy. And I think this policy should be specified within the domain of the registry itself. The registry should publish guidelines for versioning and/or deprecation of specification extensions. I don't think we want to provide dynamic metadata for extensions to describe their own change management policies. That would be too confusing, and put too much of a burden on registry consumers.

@dret
Copy link

dret commented Jan 14, 2019 via email

@MikeRalphson
Copy link
Contributor

Both - may I please make a plea for brevity and succinctness in GitHub issues? While it is a strength to be able to write at length and dig into detail, it makes things incredibly hard to follow if someone has to respond to tens of points in a reply. Things can and will get lost.

This is one reason I asked @tedepstein and @earth2marsh to help start work on a CONTRIBUTING.md and/or Pull Request template, specifically to capture the way the registry should work and be used separate from what has been done in the accepted PRs/commits so far i.e. how the registry is implemented.

@dret:

my [advice] would be to not tightly couple this with github

The TSC deliberately took a different view, to leverage the GitHub workflow and community. Not everyone who chooses to interact with the OAS repository publishes a public email address, but they must have a GitHub user id.

but capture essential metadata such as the timestamp in the metadata as well.

I do not understand what benefit is derived by duplicating this information with a copy and paste. The registry entry is either in the registry or it is not. Snapshots and deprecations notwithstanding.

@tedepstein addresses your points re: the description, schema and object properties of the metadata (aka the YAML frontmatter).

for the extension schema, you might want to make explicit that the schema needs to have an extension model as well. i.e.

I think for the avoidance of confusion we should refer to this as the registry metaschema. See in the draft examples how there is a common base of properties, but each registry extends this with properties such as schema, objects and base_type (the latter in the format registry). If such a thing is deemed necessary, I'm sure a jekyll hook could be used to validate the metadata meets the registry metaschema constraints, but for now I strongly believe a GitHub PR template will suffice.

maybe also a flag in the entry if there are various possible entry states (pending, accepted, deprecated, historical, ...)?

The only such state I foresee is deprecated. All other states are either handled by the GitHub PR not yet being merged, or the registry entry existing and not being deprecated. Following your own guidelines, registry entries would never be expired, removed or historical.

@dret
Copy link

dret commented Jan 14, 2019 via email

@dret
Copy link

dret commented Jan 14, 2019 via email

@MikeRalphson
Copy link
Contributor

@tedepstein

re: apis.json

I could be mistaken. In any case, I think Erik is right that APIs.json is a good example to follow in the general sense, but probably not directly applicable to the OAI registry.

I'm sorry but I think you did misunderstand my comment about the JSON formatted API feeds from the registry as referring to apis.json, which I think is a great idea, but has basically no adoption that I know of in the real world to date.

First, the term "YAML front-matter" is new to me

There is precious little point in me badly summarising how the Jekyll static site generator works, when the documentation serves this purpose well: https://jekyllrb.com/

Is there a plan for our registry API to allow submission of new registry listings?

No. The submission mechanism is GitHub PRs. I won't comment much on your "heated agreement" except to say that I don't believe the registry should be focussed solely on being human-readable. Machine readability is a key requirement.

Grouping of related extension properties under common control within a namespace.

Like it or not, OAS specification extensions are not namespaced, except by contained prefixes dictated by their creators (e.g. x-ms-).

Support for catalog patterns, so organizations can create internal registries and integrate them seamlessly into supporting tools.

That sounds like a job for Semoasa, not for this registry,

Support for unrestricted extension properties that can be used wherever specification extensions are allowed.

An empty objects array might have this meaning.

Applicability of an extension property to different OpenAPI versions, each with its own set of allowed objects.

Again Semoasa has this capability, but the need for it has not, as far as I know, been borne out by an real-world usage. At a pinch oneOf handles this.

Ability to mark an extension property as deprecated

This keeps coming up. Obviously we will do this. Maybe I should have marked one of the draft registry entries as deprecated: true or with a date.

Link to externalDocs

I am against this one for the fragility argument mentioned before. The rest of the markdown template should cover all pertinent material referring to the registry entry.

I would be happy to see the Semoasa use cases fully supported in the OAI specification extension registry, so there's no need for Semoasa. But I think it might be premature to do that.

"Heated agreement".

I would not be too happy to see the OAI specification extension registry supporting its own "mini-schema" approach to machine-readability, without fully addressing the needs of tool providers and adopting organizations. I think it will make it harder for Semoasa to get buy-in, and we won't get the opportunity to find out what's really going to work best for the ecosystem.

As you know, I think I'm the world's second most prominent supporter of Semoasa, but in many months I have seen literally no other people talking about it. I would not want it to go the same way as apis.json. A standard everyone references, but no-one actually uses.

@MikeRalphson
Copy link
Contributor

@dret

if i use the API to read the registry content, do i get this metadata? or do i now have to use the github API to crawl it from the github implementation of the registry?

My argument is that the date a registry entry was added to the registry is not essential metadata, because it is of no actual use in and of itself. If someone wants it, they can indeed fetch it from GitHub.

@dret
Copy link

dret commented Jan 14, 2019 via email

@dret
Copy link

dret commented Jan 14, 2019 via email

@dret
Copy link

dret commented Jan 14, 2019 via email

@MikeRalphson
Copy link
Contributor

@tedepstein

I think this got published to OpenAPIs.org just after this week's TSC call.

The draft registry was actually committed as PR OAI/OpenAPI-Specification#1762 on 28 Nov 2018.

Overall what I'm seeing your example looks like it's optimized for easy ingest by a CMS or static site generator, not optimized to capture descriptive registry data in a uniform and structured way.

Again I think we're confusing the representation within the static site (the YAML front-matter + markdown template) with the submission mechanism, which is GitHub PRs. I envision people copying an existing registry entry and tailoring it as necessary before submitting their new registry entry candidate.

The metadata looks incomplete. I don't see a property for extension property name, though I can see from the content that this should be x-twitter.

This is one of the things Jekyll does magically for you (some good, some not so). The filename of the .md file is used as the unique identifier within the 'collection' (here, a registry). This value is persisted as page.slug and does not need to be explicitly set again, unless you need to override it for some reason.

@dret
Copy link

dret commented Jan 16, 2019 via email

@MikeRalphson
Copy link
Contributor

i'd recommend to couple it less tightly with the implementation and focus on designing an API that captures all necessary data, independent of your implementation choice. it's not that much work, but would be different from an API that depends on github.

This is one of the things Jekyll does magically for you (some good, some not so). The filename of the .md file is used as the unique identifier within the 'collection' (here, a registry). This value is persisted as page.slug

it's good for your implementation effort that github/jekyll does that. but from the design and management perspective, it's not so good in scenarios where you want all relevant registration data

Again, some confusion is happening between the representation of the static-site YAML markdown + template and the API as provided. See https://mikeralphson.github.io/OpenAPI-Specification/api/extension.json - the value of the slug property is present and correct and no further API calls (to GitHub or anywhere else) are required.

@dret
Copy link

dret commented Jan 17, 2019 via email

@MikeRalphson
Copy link
Contributor

still not in there, though, are registration metadata such as who registered (if non-github identity/info should be supported) and when.

As above, a GitHub user id will be required, and will be the only id required. And I have queried the value of capturing the registration date. I can't think of an actual use for it in a world without time-travel.

@dret
Copy link

dret commented Jan 17, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants