Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catalogd metas https endpoint proposal #1749

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

grokspawn
Copy link

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2025
Copy link
Contributor

openshift-ci bot commented Jan 31, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jan 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mandre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

redhat-marketplace-index: 9MB
-->

Requiring clients to retrieve the full catalog can also result in a 4-10 second delay _per catalog_ even under optimal network conditions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the surface, 4-10s seems like a suspiciously long time to process 21MB of data. Are we talking about in-cluster clients here or clients outside the cluster that may have slower connections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this include client processing of the data, or simply clients downloading raw bytes as quickly as possible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that timeframe the POC performs:

  1. download of the full FBC
  2. decomposition of JSONL objects to JSON objects (which I've been told is a pain)
  3. loading essential objects to the content delivery framework to fulfill catalog package listing
  4. rendering the catalog listing in the web UI

I'd have to get breakdown details from @TheRealJon to be more specific. For now, I've backed off the language so it doesn't read as specific condemnation of download speeds.


## Proposal

This proposal introduces an additional HTTPS endpoint to an existing catalogd API. The existing HTTPS "all" endpoint will remain as a default option; the user will be able to enable this new capability via a feature gate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we talk about how we plan to deprecate "all" once the new endpoint is GA?

}
```
Query parameters will be logically ANDed and used to restrict response scope.
This API will be conditionally enabled by an upstream `APIV1QueryEndpoint` feature gate as part of a downstream `NewOLM{suffix}` style OCP TP feature gate, and will be disabled by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This API will be conditionally enabled by an upstream `APIV1QueryEndpoint` feature gate as part of a downstream `NewOLM{suffix}` style OCP TP feature gate, and will be disabled by default.
This API will be conditionally enabled by an upstream `APIV1QueryEndpoint` feature gate as part of a downstream `NewOLM{suffix}` style OCP feature gate.

We can talk more about TP vs GA in the graduation criteria section.


This option would require clients to query the entirety of the data (~21 MB for operatorhubio catalog) and parse the response to retrieve relevant information every time the client needs the data. Even if clients’ implement some form of caching, the first query the client does to catalogd server is still the dealbreaker. In a highly resource constrained environment (e.g. clusters in Edge devices), this basically translates to a chokepoint for the clients to get started.

- A “path hierarchy” based construction of API endpoints to expose filtered FBC metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are worried about the fact that query endpoint responses will almost always be incomplete, a middle ground might be an endpoint that returns all of the FBC metadata for a specific package, but I'm not sure that endpoint would provide the necessary latency requirements we're shooting for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes down to whether we require the new endpoint to always provide valid FBC.
When we start revising FBC schemas I think we're going to have to juggle this.
For example, if we revise olm.package.v2 which uses its package field self-referentially, then we also get package-scoped valid FBC without a change to this endpoint.
But how does a client request this? Does it have to request the v2 schema specifically?

@grokspawn grokspawn changed the title catalogd query https endpoint proposal catalogd metas https endpoint proposal Feb 4, 2025
@grokspawn grokspawn force-pushed the catalogd-web-api branch 3 times, most recently from 1e97427 to 41f501b Compare February 11, 2025 15:05
@grokspawn grokspawn marked this pull request as ready for review February 11, 2025 15:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 11, 2025
Copy link
Contributor

openshift-ci bot commented Feb 11, 2025

@grokspawn: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


### Non-Goals

* Redesigning FBC schema to facilitate additional efficiencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very curious what @spadgett and @TheRealJon think about whether this enhancement, on its own (i.e. without further FBC schema evolution), will be a significant enough improvement for Console's use cases, that it is reasonable to keep schema evolution out-of-scope.

I ask because we may need to include the schema evoluation in scope in order to reference it in a graduation criteria for taking the combined OLM and Console changes for OLMv1 support to GA.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair take. All of this originated from our feature RFC, and it was concerned solely with what our team was going to implement in this iteration.
This doesn't seem aligned with this process' scope maybe, and we can adjust as makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to know for sure whether this will make significant improvements to the network latency issues. We still require almost all of the FBC data in order to populate the catalog view as designed. We hope that making more strategic asynchronous requests will help. The data model issues still remain.

Comment on lines +378 to +380
For example, response for `metas?schema=olm.package&name=foo` and `metas?schema=olm.package&name=bar`
represent disjoint sets, and the server would have to either ignore `If-Modified-Since` directives
in client requests or rely on their veracity uncritically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I commented this somewhere, so if this is a repeat comment, I apologize:

RFC 7234, Section 2 states that the cache key is the http method and the full URI, which would include the URL parameters used in the request.

If clients request metas?schema=olm.package&name=foo and get back a LastModified response header, they are not supposed to use that value in a subsequent If-Modified-Since request header when requesting metas?schema=olm.package&name=bar, because that is a different URI.

redhat-marketplace-index: 9MB
-->

Requiring OCP console to make the full catalog available to users incurs a 4-10 second delay _per catalog_ even
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, it's more like a 4-10 second delay total, given the size of the 4 default Red Hat catalogs

Suggested change
Requiring OCP console to make the full catalog available to users incurs a 4-10 second delay _per catalog_ even
Requiring OCP console to make the full catalog available to users incurs a 4-10 second delay even


### Non-Goals

* Redesigning FBC schema to facilitate additional efficiencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to know for sure whether this will make significant improvements to the network latency issues. We still require almost all of the FBC data in order to populate the catalog view as designed. We hope that making more strategic asynchronous requests will help. The data model issues still remain.

Copy link
Member

@spadgett spadgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @grokspawn. Really appreciate all that you guys are doing to help the console team.

Name string
}
```
Query parameters will be logically ANDed and used to restrict response scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the API usage would help here.

Checking my understanding: This API will allow clients to limit the number of items returned, but there's no way to limit what properties are returned for each item, correct? You get the full item representation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spadgett that is correct. You essentially get the list of fbc blobs that match the query criteria, in their entirety.

Comment on lines +378 to +380
For example, response for `metas?schema=olm.package&name=foo` and `metas?schema=olm.package&name=bar`
represent disjoint sets, and the server would have to either ignore `If-Modified-Since` directives
in client requests or rely on their veracity uncritically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @grokspawn. I'm a little unclear what this is saying since earlier the enhancement says cache control headers are supported. Last-Modified and If-Modified-Since should work, correct? Is this about keeping a cache on the server for performance?

As @joelanford points out, the browser cache keys on the full URL, including query parameters. Each response for different query parameters is considered different.

Comment on lines +386 to +389
The existing `all` endpoint also incentivizes clients to conserve resources via local cache to avoid making
many (potentially duplicate) requests. However, the OCP console proof of concept
required what was deemed an unsupportable amount of code and complexity to cache, decompose, and render the
complete FBC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On duplicate requests: If using cache control headers, this is presumably a quick check if the catalog has been modified and a 304 response. If the catalog has changed, we would want to refresh the cache anyway, so the extra request is actually desirable?

In practice, I think console will either be fetching the entire catalog or just one item. Generally, users work with extensions they've already installed much more often than they'd install a new extension, so we'll usually be getting one item. It's a steep cost to download everything for a single item, even if we only do it once per session. And depending how often the catalog updates, it could be many times per session if we refresh the cache.

I think this is more about performance than code complexity (although complexity is a consideration as well).

Name string
}
```
Query parameters will be logically ANDed and used to restrict response scope.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spadgett that is correct. You essentially get the list of fbc blobs that match the query criteria, in their entirety.


### Caching Considerations

This proposal indends to provide RFC7234 caching compliance through support for `Last-Modified` and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Link to RFC7234 here would be helpful.


This proposal indends to provide RFC7234 caching compliance through support for `Last-Modified` and
`If-Modified-Since` directives. Clients can use these headers to avoid re-downloading unchanged data.
The server will respond with 304 Not Changed if the catalog metadata is unchanged.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's 304 statusNotModified

-->
> 1. If a query comes in with `/api/v1/metas?package=foo`, should we include the blob with schema: `olm.package` and name: `foo`?

We feel that it is incorrect for the metas service endpoint to mutate the data model (specifically, to create a synthetic package attribute for the `olm.package` schema). To access all the data modeled for an installable package, separate queries need to be made for the package-level metadata (`schema=olm.package&name=foo`) versus the channel/bundle-level metadata (`package=foo`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could probably use a little more explanation as to why this is even a question, ie include the context that olm.package objects do not contain packageName


#### Completeness
The previous `all` endpoint always returns valid FBC. The new service cannot make that promise,
so clients could make incorrect assumptions about the suitability of results. See Open Questions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a collection of valid FBC blobs not constitute valid FBC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants