Export pull_layer & auth API #9

arronwy · 2021-12-08T14:45:41Z

Export pull_blob API
1. Container images will share layers, the API should support the scenario which don't need pull all the layers.
2. Container image size may vary from megabyte to gigabyte, export pull_layer API can allow the user to the following layer decompress/unpack/store operations in parallel.

Export auth API
For some container image service which support ondemand layer pull like:
* stargz https://github.com/containerd/stargz-snapshotter
* Nydus Image Service https://github.com/dragonflyoss/image-service
export auth API is a requirement when token is expired.

thomastaylor312 · 2021-12-08T17:55:52Z

I seem to remember there being a reason why these methods were not public. @bacongobbler or @radu-matei was there a reason either of you remember why these were private?

bacongobbler · 2021-12-08T19:16:09Z

Yes - some of that conversation can be found here: krustlet/krustlet#564

bacongobbler · 2021-12-08T19:22:40Z

Basically the user should not be given any control over auth - the client knows what endpoints do and do not need authentication, so the call to auth should be hidden behind methods like pull. It is very possible for users to expose their credentials to API endpoints that do not require auth. For example, a pull from DockerHub fetches image layers from index.docker.io, and those calls don't need DockerHub credentials - just auth tokens from the repository scope's auth endpoint.

As for exposing pull_layer... I don't see how this is useful unless you're trying to write an abstraction over the existing Client. It doesn't make a ton of sense because you still have no way to push or pull manifests, and push_layer is still hidden. We need to decide whether we allow others to write their own clients on top of oci-distribution, or we are the ones publishing an OCI client, exposing only the high-level concepts like client.pull() and client.push() (which is the current design today).

bacongobbler · 2021-12-08T19:26:08Z

can allow the user to do the following layer decompress/unpack/store operations in parallel.

Is there a compromise we can make here? I think that should be something pull can handle. It's already async anyways. Parallelizing the layer pull/unpack/store operations within pull should accomplish the same thing as what's requested here.

shanesveller · 2021-12-09T00:08:15Z

If it helps contextualize the OP's desire, one of my hopeful use-cases when consuming this crate is a flavor/variant of https://oras.land/ - specifically with the goal of usage for smarter CI caching. This means that I may be running my caching utility in memory-constrained contexts while also dealing with up-to-multi-gigabyte payloads when considering the "image" as a whole. One of the possible optimizations under those constraints would be for me to be able to stream individual layers to decompress and write to disk during the logical pull operation, so that I never need to buffer an entire layer payload in-memory. I have an analogous desire during pushing as well due to the same memory constraints, where I wouldn't wish to hold those entire layers in-memory all at once. I'd have to know the checksum ahead of time for the push per the registry API, but that doesn't directly require me to hold the payload in memory.

Most of these ideas appear incompatible with this crate's implementation details today, which I understand has been mostly informed by krustlet's usecase and contending with much smaller WASM artifacts.

(I recognize that my goals are not inherently this project's goals and may need to find my own way as a result if the examples I've offered are not compelling-enough.)

bacongobbler · 2021-12-09T01:17:47Z

one of my hopeful use-cases when consuming this crate is a flavor/variant of https://oras.land/ - specifically with the goal of usage for smarter CI caching.

We've discussed the idea with a few of the ORAS maintainers. They were interested in oci-distribution being the basis of a Rust client for ORAS. oras-rs imported krustlet (including oci-distribution) as a subtree project, but hasn't seen any activity since that point. I assume the goal was to copy oci-distribution as a starting point for a Rust client. If you're looking for an oras-go-alike client written in Rust, I'd ask them about their plans with that repository.

As oras-rs matures, I could see much of oci-distribution being ported over to oras-rs. Implementing the entire OCI distribution spec is one of our stated goals, and that goal aligns with some parts of ORAS as well.

One of the possible optimizations under those constraints would be for me to be able to stream individual layers to decompress and write to disk during the logical pull operation, so that I never need to buffer an entire layer payload in-memory. I have an analogous desire during pushing as well due to the same memory constraints, where I wouldn't wish to hold those entire layers in-memory all at once. I'd have to know the checksum ahead of time for the push per the registry API, but that doesn't directly require me to hold the payload in memory.

I don't see how exposing methods like pull_image_layer and auth could help you in that regard unless you're embedding Client within another Client, calling methods like auth to fetch credentials and pass that back to the exterior client. That just seems wonky. But perhaps we can decouple these methods away from the internal logic of the Client and into its own module. Kinda like how oras-go has its own standalone Copy that isn't tied to a Client struct. That might help you re-use some of oci-distribution's client logic.

We could also abstract some of the Client's methods into different Traits which would give you the high-level constraints like pull and push, then it'd be up to you to determine the underlying behaviour. That way the existing Client doesn't have to leak implementation details like pull_manifest and auth back to the caller. I'd imagine we would want to have those as separate traits so users can implement a read-only client.

arronwy · 2021-12-09T05:57:56Z

can allow the user to do the following layer decompress/unpack/store operations in parallel.

Is there a compromise we can make here? I think that should be something pull can handle. It's already async anyways. Parallelizing the layer pull/unpack/store operations within pull should accomplish the same thing as what's requested here.

Yes, current pull API is already async, but we need wait all the layers are pulled before next layer data related operations. Many containers image support encrypted layers, decryption and decompression are time consuming and these operations depends on other crates which different users may have different selections.

Another reason we want export pull_layer API is many container stack support on demand pull like stargz-snapshotter, we will not pull all the layers at the beginning, and will pull the layers on demand.

bacongobbler · 2021-12-09T17:57:00Z

I think we're in agreement here. I want to re-think the design approach though.

I don't see how exposing methods like pull_image_layer and auth could help you unless you're embedding Client within another Client, calling methods like auth to fetch credentials and pass that back to the exterior client. That just seems wonky from a design perspective. But perhaps we can decouple these methods away from the internal logic of the Client and into its own module.

Would you mind weighing in on this? Do you have an example how you plan to use auth and pull_image_layer in your project? Perhaps that may help clarify your use case.

arronwy · 2021-12-10T03:04:33Z

I think we're in agreement here. I want to re-think the design approach though.

I don't see how exposing methods like pull_image_layer and auth could help you unless you're embedding Client within another Client, calling methods like auth to fetch credentials and pass that back to the exterior client. That just seems wonky from a design perspective. But perhaps we can decouple these methods away from the internal logic of the Client and into its own module.

Would you mind weighing in on this? Do you have an example how you plan to use auth and pull_image_layer in your project? Perhaps that may help clarify your use case.

For parallel image layer data processing, we may don't need use auth API, and pull_layer API self param can not be mutable like current implementation in pull_layer API, we will do the work as below:

let mut client = Client::default();
// Authenticate when pull_manifest_and_config
let (manifest, digest, config) = client
        .pull_manifest_and_config(&reference, &RegistryAuth::Anonymous)
        .await?;

let layers = manifest.layers.into_iter().map(|layer| {
    let this = &client;
    async move {
        this.pull_layer(image, &layer.digest, &mut out).await?;
        decrypt_layer()
        decompress_layer()
        unpack_layer()
    }
});

For on demand pull, we may need auth when token is expired:

// on demand pull like when the token is expired
let op = RegistryOperation::Pull;
client
    .auth(&reference, &RegistryAuth::Anonymous, op)
    .await?;
client.pull_layer(image, &layer.digest, &mut out).await?;

We can also hiden the auth for pull_layer API like below, but the self will be mutable since token may updated, now this pull_layer API will can not used in the first senario when we want pull in parallel, any suggestions when we want support both? I found export auth and pull_layer API can do the job, but not sure whether it is the righ way:

    pub async fn pull_layer<T: AsyncWrite + Unpin>(
        &mut self,
        image: &Reference,
        auth: &RegistryAuth,
        digest: &str,
        mut out: T,
    ) -> anyhow::Result<()> {
        let op = RegistryOperation::Pull;
        if !self.tokens.contains_key(image, op) {
            self.auth(image, auth, op).await?;
        }

        self._pull_layer(image, digest, out)
            .await?;

        Ok(())
    }

    async fn _pull_layer<T: AsyncWrite + Unpin>(
         &self,

bacongobbler · 2021-12-16T18:37:30Z

src/client.rs

@@ -233,10 +230,7 @@ impl Client {
        image_manifest: Option<OciManifest>,
    ) -> anyhow::Result<String> {
        debug!("Pushing image: {:?}", image_ref);
-        let op = RegistryOperation::Push;
-        if !self.tokens.contains_key(image_ref, op) {


Why are we forcing an auth even if we have a token? Isn't this a bit excessive?

bacongobbler · 2021-12-16T18:56:02Z

src/client.rs

        &mut self,
        image: &Reference,
        authentication: &RegistryAuth,
        operation: RegistryOperation,
    ) -> anyhow::Result<()> {
        debug!("Authorizing for image: {:?}", image);
+        if self.tokens.contains_key(image, operation) {


Hmm... I see the reasoning behind this change. I think we designed it the other way so the caller can force a token refresh. It makes the code a little less DRY, but it does give the caller a bit more flexibility.

The other thing we need to address at some point is whether the token has expired (which I noticed in your review). Not required for this PR, but it's worth keeping in mind.

Should we remove this commit? or replace with public TokenCache for user to do the token expire check？

oci-distribution should be the one responsible for checking and refreshing auth tokens. I don't think that should be the caller's responsibility.

That being said, this change actively prevents the user from refreshing the auth token, so I think this is a breaking change that should be reverted. Users should be able to manually refresh tokens by calling client.auth at-will.

bacongobbler · 2021-12-16T18:58:52Z

For on demand pull, we may need auth when token is expired

Can't we just address that in the calling code by checking the token's expiration date? That would mean you can just call pull without having to embed auth/pull_layer yourself.

bacongobbler · 2021-12-16T19:01:58Z

For parallel image layer data processing,

I still don't understand why this can't be handled in oci-distribution. Why does this have to be orchestrated from another library? Why can't a pull fetch multiple layers in parallel? Why does this have to be done at a higher level?

We could just implement some form of middleware pattern so that pull can call a function on each layer. That way you can still call decrypt/decompress/unpack on each layer, and it'd all be performed in parallel. Would that solve your issue?

https://doc.rust-lang.org/book/ch19-05-advanced-functions-and-closures.html

arronwy · 2021-12-17T02:46:19Z

For on demand pull, we may need auth when token is expired

Can't we just address that in the calling code by checking the token's expiration date? That would mean you can just call pull without having to embed auth/pull_layer yourself.

Yes, we can do that way, but current TokenCache in client module is not public and TokenCache itself is only visible in current crate by design:

pub(crate) struct TokenCache {

arronwy · 2021-12-17T03:00:45Z

For parallel image layer data processing,

I still don't understand why this can't be handled in oci-distribution. Why does this have to be orchestrated from another library? Why can't a pull fetch multiple layers in parallel? Why does this have to be done at a higher level?

We could just implement some form of middleware pattern so that pull can call a function on each layer. That way you can still call decrypt/decompress/unpack on each layer, and it'd all be performed in parallel. Would that solve your issue?

https://doc.rust-lang.org/book/ch19-05-advanced-functions-and-closures.html

Thanks for your suggestions. Yes, we can pass functions to current pull API, but we have two concerns, first is we modify the interface of an key public API, next is after we processed the layer data, the pull API return value will also need be changed based on the user's needs.

arronwy · 2022-01-07T09:18:12Z

@bacongobbler Another concern is container image layers are shared, after we pull the image manifests, we also need check whether the host already have shared layers pulled by other containers, and we only need pulling the missed parts of the layers.

Image service/runtime are operate at image layer level and image distribution may also need export the layer related API.

arronwy · 2022-02-10T02:06:01Z

Hi @bacongobbler @thomastaylor312 @flavio I rebased the PR and updated the commit message, please review.

bacongobbler

I still disagree with this change in relation to oci-distribution's current API, but I don't really have the time right now to make contributions for further improvements. I'm fine with this going through for now. We can make changes to this API in future iterations since we haven't hit 1.0 yet, so there's plenty of time to refactor if necessary.

I see one code regression that I'd like to see changed. Otherwise this looks good to go.

@thomastaylor312 and @flavio do you have any ideas/concerns about this change?

bacongobbler · 2022-02-15T16:01:07Z

src/client.rs

        &mut self,
        image: &Reference,
        authentication: &RegistryAuth,
        operation: RegistryOperation,
    ) -> anyhow::Result<()> {
        debug!("Authorizing for image: {:?}", image);
+        if self.tokens.contains_key(image, operation) {


oci-distribution should be the one responsible for checking and refreshing auth tokens. I don't think that should be the caller's responsibility.

That being said, this change actively prevents the user from refreshing the auth token, so I think this is a breaking change that should be reverted. Users should be able to manually refresh tokens by calling client.auth at-will.

thomastaylor312 · 2022-02-15T18:22:07Z

Yeah I think this is fine for now. We should just be careful as we approach 1.0 as we decide whether or not the pull_layer function should be exported or not

1. Container images will share layers, the API should support the scenario which don't need pull all the layers 2. Container image size may vary from megabyte to gigabyte, export pull_blob API can allow the user to the following layer decompress/unpack/store operations in parallel. Signed-off-by: Arron Wang <[email protected]>

For some container image service which support ondemand layer pull like: * stargz https://github.com/containerd/stargz-snapshotter * Nydus Image Service https://github.com/dragonflyoss/image-service export auth API is a requirement when token is expired. Signed-off-by: Arron Wang <[email protected]>

arronwy · 2022-02-16T01:40:13Z

I still disagree with this change in relation to oci-distribution's current API, but I don't really have the time right now to make contributions for further improvements. I'm fine with this going through for now. We can make changes to this API in future iterations since we haven't hit 1.0 yet, so there's plenty of time to refactor if necessary.

I see one code regression that I'd like to see changed. Otherwise this looks good to go.

Thanks Matt, fully agree to keep auth() clean as you requested, just updated the PR.

@thomastaylor312 and @flavio do you have any ideas/concerns about this change?

flavio

I think we could find a better solution for this problem, as @bacongobbler suggested. Maybe approving this change will allow us to understand better how things could be changed.

I'm fine with this PR, but this is something we will need to discuss as we approach the 1.0 release

thomastaylor312 · 2022-02-22T23:22:44Z

@bacongobbler This good to go from your end?

arronwy · 2022-04-05T01:45:32Z

@bacongobbler @thomastaylor312 @flavio Thanks, much appreciated!

Export pull_layer & auth API

arronwy requested review from bacongobbler and thomastaylor312 as code owners December 8, 2021 14:45

arronwy force-pushed the export_pull_layer branch from 3ec6f78 to 204ce49 Compare December 16, 2021 08:42

bacongobbler reviewed Dec 16, 2021

View reviewed changes

arronwy force-pushed the export_pull_layer branch from 204ce49 to 9f8c838 Compare February 10, 2022 01:56

arronwy requested a review from flavio as a code owner February 10, 2022 01:56

bacongobbler requested changes Feb 15, 2022

View reviewed changes

arronwy added 2 commits February 16, 2022 09:29

arronwy force-pushed the export_pull_layer branch from 9f8c838 to 3dfe4be Compare February 16, 2022 01:33

flavio approved these changes Feb 22, 2022

View reviewed changes

bacongobbler merged commit 1ba0d94 into oras-project:main Apr 4, 2022

thomastaylor312 pushed a commit that referenced this pull request Mar 2, 2023

Merge pull request #9 from arronwy/export_pull_layer

ebb6f86

Export pull_layer & auth API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export pull_layer & auth API #9

Export pull_layer & auth API #9

arronwy commented Dec 8, 2021 •

edited

Loading

thomastaylor312 commented Dec 8, 2021

bacongobbler commented Dec 8, 2021

bacongobbler commented Dec 8, 2021 •

edited

Loading

bacongobbler commented Dec 8, 2021 •

edited

Loading

shanesveller commented Dec 9, 2021

bacongobbler commented Dec 9, 2021 •

edited

Loading

arronwy commented Dec 9, 2021 •

edited

Loading

bacongobbler commented Dec 9, 2021 •

edited

Loading

arronwy commented Dec 10, 2021

bacongobbler Dec 16, 2021

bacongobbler Dec 16, 2021

arronwy Dec 17, 2021

bacongobbler Feb 15, 2022

bacongobbler commented Dec 16, 2021

bacongobbler commented Dec 16, 2021 •

edited

Loading

arronwy commented Dec 17, 2021

arronwy commented Dec 17, 2021

arronwy commented Jan 7, 2022

arronwy commented Feb 10, 2022

bacongobbler left a comment •

edited

Loading

bacongobbler Feb 15, 2022

thomastaylor312 commented Feb 15, 2022

arronwy commented Feb 16, 2022

flavio left a comment

thomastaylor312 commented Feb 22, 2022

arronwy commented Apr 5, 2022

Export pull_layer & auth API #9

Export pull_layer & auth API #9

Conversation

arronwy commented Dec 8, 2021 • edited Loading

thomastaylor312 commented Dec 8, 2021

bacongobbler commented Dec 8, 2021

bacongobbler commented Dec 8, 2021 • edited Loading

bacongobbler commented Dec 8, 2021 • edited Loading

shanesveller commented Dec 9, 2021

bacongobbler commented Dec 9, 2021 • edited Loading

arronwy commented Dec 9, 2021 • edited Loading

bacongobbler commented Dec 9, 2021 • edited Loading

arronwy commented Dec 10, 2021

bacongobbler Dec 16, 2021

Choose a reason for hiding this comment

bacongobbler Dec 16, 2021

Choose a reason for hiding this comment

arronwy Dec 17, 2021

Choose a reason for hiding this comment

bacongobbler Feb 15, 2022

Choose a reason for hiding this comment

bacongobbler commented Dec 16, 2021

bacongobbler commented Dec 16, 2021 • edited Loading

arronwy commented Dec 17, 2021

arronwy commented Dec 17, 2021

arronwy commented Jan 7, 2022

arronwy commented Feb 10, 2022

bacongobbler left a comment • edited Loading

Choose a reason for hiding this comment

bacongobbler Feb 15, 2022

Choose a reason for hiding this comment

thomastaylor312 commented Feb 15, 2022

arronwy commented Feb 16, 2022

flavio left a comment

Choose a reason for hiding this comment

thomastaylor312 commented Feb 22, 2022

arronwy commented Apr 5, 2022

arronwy commented Dec 8, 2021 •

edited

Loading

bacongobbler commented Dec 8, 2021 •

edited

Loading

bacongobbler commented Dec 8, 2021 •

edited

Loading

bacongobbler commented Dec 9, 2021 •

edited

Loading

arronwy commented Dec 9, 2021 •

edited

Loading

bacongobbler commented Dec 9, 2021 •

edited

Loading

bacongobbler commented Dec 16, 2021 •

edited

Loading

bacongobbler left a comment •

edited

Loading