Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export pull_layer & auth API #9

Merged
merged 2 commits into from
Apr 4, 2022

Conversation

arronwy
Copy link
Contributor

@arronwy arronwy commented Dec 8, 2021

Export pull_blob API
1. Container images will share layers, the API should support the scenario which don't need pull all the layers.
2. Container image size may vary from megabyte to gigabyte, export pull_layer API can allow the user to the following layer decompress/unpack/store operations in parallel.

Export auth API
For some container image service which support ondemand layer pull like:
* stargz https://github.com/containerd/stargz-snapshotter
* Nydus Image Service https://github.com/dragonflyoss/image-service
export auth API is a requirement when token is expired.

@thomastaylor312
Copy link
Contributor

I seem to remember there being a reason why these methods were not public. @bacongobbler or @radu-matei was there a reason either of you remember why these were private?

@bacongobbler
Copy link
Contributor

Yes - some of that conversation can be found here: krustlet/krustlet#564

@bacongobbler
Copy link
Contributor

bacongobbler commented Dec 8, 2021

Basically the user should not be given any control over auth - the client knows what endpoints do and do not need authentication, so the call to auth should be hidden behind methods like pull. It is very possible for users to expose their credentials to API endpoints that do not require auth. For example, a pull from DockerHub fetches image layers from index.docker.io, and those calls don't need DockerHub credentials - just auth tokens from the repository scope's auth endpoint.

As for exposing pull_layer... I don't see how this is useful unless you're trying to write an abstraction over the existing Client. It doesn't make a ton of sense because you still have no way to push or pull manifests, and push_layer is still hidden. We need to decide whether we allow others to write their own clients on top of oci-distribution, or we are the ones publishing an OCI client, exposing only the high-level concepts like client.pull() and client.push() (which is the current design today).

@bacongobbler
Copy link
Contributor

bacongobbler commented Dec 8, 2021

can allow the user to do the following layer decompress/unpack/store operations in parallel.

Is there a compromise we can make here? I think that should be something pull can handle. It's already async anyways. Parallelizing the layer pull/unpack/store operations within pull should accomplish the same thing as what's requested here.

@shanesveller
Copy link

If it helps contextualize the OP's desire, one of my hopeful use-cases when consuming this crate is a flavor/variant of https://oras.land/ - specifically with the goal of usage for smarter CI caching. This means that I may be running my caching utility in memory-constrained contexts while also dealing with up-to-multi-gigabyte payloads when considering the "image" as a whole. One of the possible optimizations under those constraints would be for me to be able to stream individual layers to decompress and write to disk during the logical pull operation, so that I never need to buffer an entire layer payload in-memory. I have an analogous desire during pushing as well due to the same memory constraints, where I wouldn't wish to hold those entire layers in-memory all at once. I'd have to know the checksum ahead of time for the push per the registry API, but that doesn't directly require me to hold the payload in memory.

Most of these ideas appear incompatible with this crate's implementation details today, which I understand has been mostly informed by krustlet's usecase and contending with much smaller WASM artifacts.

(I recognize that my goals are not inherently this project's goals and may need to find my own way as a result if the examples I've offered are not compelling-enough.)

@bacongobbler
Copy link
Contributor

bacongobbler commented Dec 9, 2021

one of my hopeful use-cases when consuming this crate is a flavor/variant of https://oras.land/ - specifically with the goal of usage for smarter CI caching.

We've discussed the idea with a few of the ORAS maintainers. They were interested in oci-distribution being the basis of a Rust client for ORAS. oras-rs imported krustlet (including oci-distribution) as a subtree project, but hasn't seen any activity since that point. I assume the goal was to copy oci-distribution as a starting point for a Rust client. If you're looking for an oras-go-alike client written in Rust, I'd ask them about their plans with that repository.

As oras-rs matures, I could see much of oci-distribution being ported over to oras-rs. Implementing the entire OCI distribution spec is one of our stated goals, and that goal aligns with some parts of ORAS as well.

One of the possible optimizations under those constraints would be for me to be able to stream individual layers to decompress and write to disk during the logical pull operation, so that I never need to buffer an entire layer payload in-memory. I have an analogous desire during pushing as well due to the same memory constraints, where I wouldn't wish to hold those entire layers in-memory all at once. I'd have to know the checksum ahead of time for the push per the registry API, but that doesn't directly require me to hold the payload in memory.

I don't see how exposing methods like pull_image_layer and auth could help you in that regard unless you're embedding Client within another Client, calling methods like auth to fetch credentials and pass that back to the exterior client. That just seems wonky. But perhaps we can decouple these methods away from the internal logic of the Client and into its own module. Kinda like how oras-go has its own standalone Copy that isn't tied to a Client struct. That might help you re-use some of oci-distribution's client logic.

We could also abstract some of the Client's methods into different Traits which would give you the high-level constraints like pull and push, then it'd be up to you to determine the underlying behaviour. That way the existing Client doesn't have to leak implementation details like pull_manifest and auth back to the caller. I'd imagine we would want to have those as separate traits so users can implement a read-only client.

@arronwy
Copy link
Contributor Author

arronwy commented Dec 9, 2021

can allow the user to do the following layer decompress/unpack/store operations in parallel.

Is there a compromise we can make here? I think that should be something pull can handle. It's already async anyways. Parallelizing the layer pull/unpack/store operations within pull should accomplish the same thing as what's requested here.

Yes, current pull API is already async, but we need wait all the layers are pulled before next layer data related operations. Many containers image support encrypted layers, decryption and decompression are time consuming and these operations depends on other crates which different users may have different selections.

Another reason we want export pull_layer API is many container stack support on demand pull like stargz-snapshotter, we will not pull all the layers at the beginning, and will pull the layers on demand.

@bacongobbler
Copy link
Contributor

bacongobbler commented Dec 9, 2021

I think we're in agreement here. I want to re-think the design approach though.

I don't see how exposing methods like pull_image_layer and auth could help you unless you're embedding Client within another Client, calling methods like auth to fetch credentials and pass that back to the exterior client. That just seems wonky from a design perspective. But perhaps we can decouple these methods away from the internal logic of the Client and into its own module.

Would you mind weighing in on this? Do you have an example how you plan to use auth and pull_image_layer in your project? Perhaps that may help clarify your use case.

@arronwy
Copy link
Contributor Author

arronwy commented Dec 10, 2021

I think we're in agreement here. I want to re-think the design approach though.

I don't see how exposing methods like pull_image_layer and auth could help you unless you're embedding Client within another Client, calling methods like auth to fetch credentials and pass that back to the exterior client. That just seems wonky from a design perspective. But perhaps we can decouple these methods away from the internal logic of the Client and into its own module.

Would you mind weighing in on this? Do you have an example how you plan to use auth and pull_image_layer in your project? Perhaps that may help clarify your use case.

For parallel image layer data processing, we may don't need use auth API, and pull_layer API self param can not be mutable like current implementation in pull_layer API, we will do the work as below:

let mut client = Client::default();
// Authenticate when pull_manifest_and_config
let (manifest, digest, config) = client
        .pull_manifest_and_config(&reference, &RegistryAuth::Anonymous)
        .await?;

let layers = manifest.layers.into_iter().map(|layer| {
    let this = &client;
    async move {
        this.pull_layer(image, &layer.digest, &mut out).await?;
        decrypt_layer()
        decompress_layer()
        unpack_layer()
    }
});

For on demand pull, we may need auth when token is expired:

// on demand pull like when the token is expired
let op = RegistryOperation::Pull;
client
    .auth(&reference, &RegistryAuth::Anonymous, op)
    .await?;
client.pull_layer(image, &layer.digest, &mut out).await?;

We can also hiden the auth for pull_layer API like below, but the self will be mutable since token may updated, now this pull_layer API will can not used in the first senario when we want pull in parallel, any suggestions when we want support both? I found export auth and pull_layer API can do the job, but not sure whether it is the righ way:

    pub async fn pull_layer<T: AsyncWrite + Unpin>(
        &mut self,
        image: &Reference,
        auth: &RegistryAuth,
        digest: &str,
        mut out: T,
    ) -> anyhow::Result<()> {
        let op = RegistryOperation::Pull;
        if !self.tokens.contains_key(image, op) {
            self.auth(image, auth, op).await?;
        }

        self._pull_layer(image, digest, out)
            .await?;

        Ok(())
    }

    async fn _pull_layer<T: AsyncWrite + Unpin>(
         &self,

src/client.rs Outdated
@@ -233,10 +230,7 @@ impl Client {
image_manifest: Option<OciManifest>,
) -> anyhow::Result<String> {
debug!("Pushing image: {:?}", image_ref);
let op = RegistryOperation::Push;
if !self.tokens.contains_key(image_ref, op) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we forcing an auth even if we have a token? Isn't this a bit excessive?

src/client.rs Outdated
&mut self,
image: &Reference,
authentication: &RegistryAuth,
operation: RegistryOperation,
) -> anyhow::Result<()> {
debug!("Authorizing for image: {:?}", image);
if self.tokens.contains_key(image, operation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I see the reasoning behind this change. I think we designed it the other way so the caller can force a token refresh. It makes the code a little less DRY, but it does give the caller a bit more flexibility.

The other thing we need to address at some point is whether the token has expired (which I noticed in your review). Not required for this PR, but it's worth keeping in mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this commit? or replace with public TokenCache for user to do the token expire check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oci-distribution should be the one responsible for checking and refreshing auth tokens. I don't think that should be the caller's responsibility.

That being said, this change actively prevents the user from refreshing the auth token, so I think this is a breaking change that should be reverted. Users should be able to manually refresh tokens by calling client.auth at-will.

@bacongobbler
Copy link
Contributor

For on demand pull, we may need auth when token is expired

Can't we just address that in the calling code by checking the token's expiration date? That would mean you can just call pull without having to embed auth/pull_layer yourself.

@bacongobbler
Copy link
Contributor

bacongobbler commented Dec 16, 2021

For parallel image layer data processing,

I still don't understand why this can't be handled in oci-distribution. Why does this have to be orchestrated from another library? Why can't a pull fetch multiple layers in parallel? Why does this have to be done at a higher level?

We could just implement some form of middleware pattern so that pull can call a function on each layer. That way you can still call decrypt/decompress/unpack on each layer, and it'd all be performed in parallel. Would that solve your issue?

https://doc.rust-lang.org/book/ch19-05-advanced-functions-and-closures.html

@arronwy
Copy link
Contributor Author

arronwy commented Dec 17, 2021

For on demand pull, we may need auth when token is expired

Can't we just address that in the calling code by checking the token's expiration date? That would mean you can just call pull without having to embed auth/pull_layer yourself.

Yes, we can do that way, but current TokenCache in client module is not public and TokenCache itself is only visible in current crate by design:

pub(crate) struct TokenCache {

@arronwy
Copy link
Contributor Author

arronwy commented Dec 17, 2021

For parallel image layer data processing,

I still don't understand why this can't be handled in oci-distribution. Why does this have to be orchestrated from another library? Why can't a pull fetch multiple layers in parallel? Why does this have to be done at a higher level?

We could just implement some form of middleware pattern so that pull can call a function on each layer. That way you can still call decrypt/decompress/unpack on each layer, and it'd all be performed in parallel. Would that solve your issue?

https://doc.rust-lang.org/book/ch19-05-advanced-functions-and-closures.html

Thanks for your suggestions. Yes, we can pass functions to current pull API, but we have two concerns, first is we modify the interface of an key public API, next is after we processed the layer data, the pull API return value will also need be changed based on the user's needs.

@arronwy
Copy link
Contributor Author

arronwy commented Jan 7, 2022

@bacongobbler Another concern is container image layers are shared, after we pull the image manifests, we also need check whether the host already have shared layers pulled by other containers, and we only need pulling the missed parts of the layers.

Image service/runtime are operate at image layer level and image distribution may also need export the layer related API.

@arronwy arronwy requested a review from flavio as a code owner February 10, 2022 01:56
@arronwy
Copy link
Contributor Author

arronwy commented Feb 10, 2022

Hi @bacongobbler @thomastaylor312 @flavio I rebased the PR and updated the commit message, please review.

Copy link
Contributor

@bacongobbler bacongobbler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still disagree with this change in relation to oci-distribution's current API, but I don't really have the time right now to make contributions for further improvements. I'm fine with this going through for now. We can make changes to this API in future iterations since we haven't hit 1.0 yet, so there's plenty of time to refactor if necessary.

I see one code regression that I'd like to see changed. Otherwise this looks good to go.

@thomastaylor312 and @flavio do you have any ideas/concerns about this change?

src/client.rs Outdated
&mut self,
image: &Reference,
authentication: &RegistryAuth,
operation: RegistryOperation,
) -> anyhow::Result<()> {
debug!("Authorizing for image: {:?}", image);
if self.tokens.contains_key(image, operation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oci-distribution should be the one responsible for checking and refreshing auth tokens. I don't think that should be the caller's responsibility.

That being said, this change actively prevents the user from refreshing the auth token, so I think this is a breaking change that should be reverted. Users should be able to manually refresh tokens by calling client.auth at-will.

@thomastaylor312
Copy link
Contributor

Yeah I think this is fine for now. We should just be careful as we approach 1.0 as we decide whether or not the pull_layer function should be exported or not

1. Container images will share layers, the API should support the
   scenario which don't need pull all the layers

2. Container image size may vary from megabyte to gigabyte,
export pull_blob API can allow the user to the following
layer decompress/unpack/store operations in parallel.

Signed-off-by: Arron Wang <[email protected]>
For some container image service which support ondemand layer pull like:
    * stargz https://github.com/containerd/stargz-snapshotter
    * Nydus Image Service https://github.com/dragonflyoss/image-service

export auth API is a requirement when token is expired.

Signed-off-by: Arron Wang <[email protected]>
@arronwy
Copy link
Contributor Author

arronwy commented Feb 16, 2022

I still disagree with this change in relation to oci-distribution's current API, but I don't really have the time right now to make contributions for further improvements. I'm fine with this going through for now. We can make changes to this API in future iterations since we haven't hit 1.0 yet, so there's plenty of time to refactor if necessary.

I see one code regression that I'd like to see changed. Otherwise this looks good to go.

Thanks Matt, fully agree to keep auth() clean as you requested, just updated the PR.

@thomastaylor312 and @flavio do you have any ideas/concerns about this change?

Copy link
Contributor

@flavio flavio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could find a better solution for this problem, as @bacongobbler suggested. Maybe approving this change will allow us to understand better how things could be changed.

I'm fine with this PR, but this is something we will need to discuss as we approach the 1.0 release

@thomastaylor312
Copy link
Contributor

@bacongobbler This good to go from your end?

@bacongobbler bacongobbler merged commit 1ba0d94 into oras-project:main Apr 4, 2022
@arronwy
Copy link
Contributor Author

arronwy commented Apr 5, 2022

@bacongobbler @thomastaylor312 @flavio Thanks, much appreciated!

thomastaylor312 pushed a commit that referenced this pull request Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants