Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AC: Treat Directories as blobs #34

Closed
wants to merge 1 commit into from

Conversation

tomcoldrick-ct
Copy link
Collaborator

We currently do some cleverness to pop Directory objects into the OutputDirectories field of the ActionResult for semantic purposes. Unfortunately doing this requires jumping over many hurdles, partially of our own making and partially from REAPI.

To detect whether the object is a Directory, we fetch it from CAS and attempt to unmarshal it into a Directory message. We then must convert it to a Tree to store in the ActionResult, which also requires the raw proto.

The catch is that things that are not Directory messages may successfully unmarshal into one -- as a motivating example, BuildStream's Source proto is such a message. When this happens, we're very likely to fail the request as we attempt to use the corrupted Directory we've created. In the BuildStream case, we hit an error with "Unknown Digest Function", as a Digest proto is loaded as nil.

To solve this, we could pass through data on whether the asset is supposed to be a directory or not, however doing so is inelegant and has another subtle problem -- if the client decides to use PushDirectory to push something that isn't a Directory (which isn't explicitly banned by the spec), then we will error.

As the Action and ActionResults we generate are intended only to be used by us, we can instead break the semantics somewhat and treat the Directory as an opaque file, as we do for blobs.

Note: this intentionally breaks the sharing of Actions should the RemoteExecutionFetcher be used. This feels more correct to me, as we were overwriting the one we actually ran anyway. Under this usage we instead get multiple Actions pointing to the same ActionResult -- one for the actually executed Action, and another for the asset reference.

Fixes #33

We currently do some cleverness to pop Directory objects into the
OutputDirectories field of the ActionResult for semantic purposes.
Unfortunately doing this requires jumping over many hurdles, partially of our
own making and partially from REAPI.

To detect whether the object is a Directory, we fetch it from CAS and attempt to
unmarshal it into a Directory message.  We then must convert it to a Tree to
store in the ActionResult, which also requires the raw proto.

The catch is that things that are not Directory messages may successfully
unmarshal into one -- as a motivating example, BuildStream's Source proto is
such a message.  When this happens, we're very likely to fail the request as we
attempt to use the corrupted Directory we've created.

To solve this, we could pass through data on whether the asset is supposed to be
a directory or not, however doing so is inelegant and has another subtle problem
-- if the client decides to use PushDirectory to push something that isn't a
Directory (which isn't explicitly banned by the spec), then we will error.

As the Action and ActionResults we generate are intended only to be used by us,
we can instead break the semantics somewhat and treat the Directory as an opaque
file, as we do for Blobs.

Note: this intentionally breaks the sharing of Actions should the
RemoteExecutionFetcher be used.  This feels more correct to me, as we were
overwriting the one we actually ran anyway.  Under this usage we instead get
multiple Actions pointing to the same ActionResult -- one for the actually
executed Action, and another for the asset reference.
@tomcoldrick-ct tomcoldrick-ct force-pushed the coldtom/ac-dir-semantics branch from 2826044 to 3e5ace2 Compare January 26, 2024 16:18
@EdSchouten
Copy link
Member

EdSchouten commented Jan 27, 2024

Hey! Before merging any of this, take a look at this PR:

bazelbuild/remote-apis#258

bb-storage already implements this:

buildbarn/bb-storage@068d214

What you need to do is write Action Cache entries with OutputDirectories that have both tree_digest and root_directory_digest set. So upload directories both as a Tree and separate Directory messages. Once you do that, CompletenessCheckingBlobAccess is capable of using the Tree message to ensure the separate Directory messages also remain present.

In other words, you won't need to abuse OutputFiles for this. Hope that helps!

@harrysarson
Copy link

I am taking over this work. What I would like to do is

  1. Add a field to the remote-asset internal Asset proto to indicate whether this is a "blob" or a "directory". This field would control whether the ActionResult uses output_files or output_directories to store the asset's digest (instead of our current heuristic of loading the blob from the CAS and seeing what it looks like).
    • This field could also live on the AssetReference proto, would need to play about a bit with this.
  2. In the action-cache asset store Put
    • Check to Asset to determine if this is "blob" or "directory"
    • Set output_directory_format = DIRECTORY_ONLY in the Command we create (for directory assets).
    • Set output_directories.root_directory_digest (instead of setting output_directories.tree_digest) in the ActionResult we create (for directory assets).
  3. In the action-cache asset store Get
    • Use the output_directories.root_directory_digest from the ActionResult as the digest to return to the client.
    • If the output_directories digest is empty then this is a "blob" asset.

This uses the output_directory_format added in bazelbuild/remote-apis#258.

Comment on lines -207 to +162
if commandGenerator, err := qualifier.QualifiersToCommand(ref.Qualifiers); err != nil || len(ref.Uris) > 1 {
// Create the action with the qualifier directory as the input root
action, command, err = assetReferenceToAction(ref, directoryDigest)
if err != nil {
return err
}
} else {
command = commandGenerator(ref.Uris[0])
commandDigest, err := ProtoToDigest(command)
if err != nil {
return err
}
action = &remoteexecution.Action{
CommandDigest: commandDigest,
InputRootDigest: EmptyDigest,
}
// Create the action with the qualifier directory as the input root
action, command, err = assetReferenceToAction(ref, directoryDigest)
if err != nil {
return err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes assets with the special qualifiers we check for in QualifiersToCommand to unfetchable from the asset server.

Either we need to revert this bit or apply the corresponding change (i.e dropping QualifiersToCommand) from Get.

@harrysarson
Copy link

What you need to do is write Action Cache entries with OutputDirectories that have both tree_digest and root_directory_digest set.

I do not believe its feasible to set the tree_digest when writing the Action Cache entry. When creating the entry all we know about the asset is its digest (we also know the uris and qualifiers but they don't help us). Getting a tree_digest requires doing "assume digest refers to a directory and look it up in the CAS" which is what we are trying to avoid doing.

So I think we need to write Action Cache entries with OutputDirectories that only have root_directory_digest set.

@tomcoldrick-ct
Copy link
Collaborator Author

What you need to do is write Action Cache entries with OutputDirectories that have both tree_digest and root_directory_digest set.

I do not believe its feasible to set the tree_digest when writing the Action Cache entry. When creating the entry all we know about the asset is its digest (we also know the uris and qualifiers but they don't help us). Getting a tree_digest requires doing "assume digest refers to a directory and look it up in the CAS" which is what we are trying to avoid doing.

So I think we need to write Action Cache entries with OutputDirectories that only have root_directory_digest set.

If we're told that the asset is supposed to be a Directory, by the endpoint being called being {Push,Fetch}Directory, then it's perfectly valid to attempt to pull and form a Tree from it. If the thing pointed at isn't a valid directory, then that is an error for the client. If we store whether the pointed-to-thing is a Directory or not on the Asset, then I think it's reasonable to return an error if we try to fetch something that's not a Directory with FetchDirectory.

@harrysarson
Copy link

I think it's reasonable to return an error if we try to fetch something that's not a Directory with FetchDirectory.

In #33 you wrote:

Note that what we're currently doing isn't quite in line with the spec anyway, as there's not actually a requirement that PushDirectory references a Directory proto.

Have we changed our mind on this?


Either way, calculating the tree_digest requires recursively reading from the CAS which feels like something that assetStore.Put should avoid doing if it can.

@EdSchouten
Copy link
Member

EdSchouten commented Jul 12, 2024

  • Set output_directory_format = DIRECTORY_ONLY in the Command we create (for directory assets).
  • Set output_directories.root_directory_digest (instead of setting output_directories.tree_digest) in the ActionResult we create (for directory assets).

Note that this is something that bb-storage's CompletenessCheckingBlobAccess does not support. You must set tree_digest. root_directory_digest may be set in addition to that. The reason being that bb-storage uses Tree objects to efficiently obtain a list of all recursively referenced Directory objects, so that their presence can be checked.

https://github.com/buildbarn/bb-storage/blob/master/pkg/blobstore/completenesschecking/completeness_checking_blob_access.go#L178-L185

harrysarson added a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 12, 2024
Which means we avoid needing to calculate TreeDigest`s whilst still
creating Actions/ActionResults that encode that an asset is a directory.

Note: the bb-remote-asset server does not validate that the
RootDirectoryDigest actually references a Directory proto because
bb-remote-asset treats the digest as an opaque digest.

See buildbarn#34 (comment)
harrysarson added a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 12, 2024
Which means we avoid needing to calculate TreeDigest`s whilst still
creating Actions/ActionResults that encode that an asset is a directory.

Note: the bb-remote-asset server does not validate that the
RootDirectoryDigest actually references a Directory proto because
bb-remote-asset treats the digest as an opaque digest.

See buildbarn#34 (comment)
harrysarson added a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 12, 2024
Which means we avoid needing to calculate TreeDigest`s whilst still
creating Actions/ActionResults that encode that an asset is a directory.

Note: the bb-remote-asset server does not validate that the
RootDirectoryDigest actually references a Directory proto because
bb-remote-asset treats the digest as an opaque digest.

See buildbarn#34 (comment)
harrysarson added a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 12, 2024
Which means we avoid needing to calculate TreeDigest`s whilst still
creating Actions/ActionResults that encode that an asset is a directory.

Note: the bb-remote-asset server does not validate that the
RootDirectoryDigest actually references a Directory proto because
bb-remote-asset treats the digest as an opaque digest.

See buildbarn#34 (comment)
harrysarson added a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 15, 2024
Which means we avoid needing to calculate TreeDigest`s whilst still
creating Actions/ActionResults that encode that an asset is a directory.

Note: the bb-remote-asset server does not validate that the
RootDirectoryDigest actually references a Directory proto because
bb-remote-asset treats the digest as an opaque digest.

See buildbarn#34 (comment)
harrysarson pushed a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 23, 2024
bb-remote-asset does some cleverness to pop `Directory` objects into the
OutputDirectories field of the ActionResult for semantic purposes. This
commit cleans up that cleverness to allow blobs that look a bit like
directories to be uploaded as asset.

Prior to this commit we detected whether an object is a `Directory` by
fetching it from CAS and attempting to unmarshal it into a `Directory`
message. This commit requires all assets uploaded via `PushDirectory`
are `Directories` (and treats assets uploaded via `PushBlob` as blobs).

We still must convert Directories to a Tree to generate the `TreeDigest`
needed by the `ActionResult`, so the action cache asset store will fetch
from the CAS when putting a new directory. (The ActionResult needs a
`TreeDigest` to satisfy bb-storage's completeness checking.)

If the client decides to use `PushDirectory` to push something that
isn't a `Directory` (which isn't explicitly banned by the spec), then we
will
error. The client will need to use `PushBlob` for these assets instead.

Fixes buildbarn#33
Closes buildbarn#34
harrysarson pushed a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 24, 2024
bb-remote-asset does some cleverness to pop `Directory` objects into the
OutputDirectories field of the ActionResult for semantic purposes. This
commit cleans up that cleverness to allow blobs that look a bit like
directories to be uploaded as asset.

Prior to this commit we detected whether an object is a `Directory` by
fetching it from CAS and attempting to unmarshal it into a `Directory`
message. This commit requires all assets uploaded via `PushDirectory`
are `Directories` (and treats assets uploaded via `PushBlob` as blobs).

We still must convert Directories to a Tree to generate the `TreeDigest`
needed by the `ActionResult`, so the action cache asset store will fetch
from the CAS when putting a new directory. (The ActionResult needs a
`TreeDigest` to satisfy bb-storage's completeness checking.)

If the client decides to use `PushDirectory` to push something that
isn't a `Directory` (which isn't explicitly banned by the spec), then we
will
error. The client will need to use `PushBlob` for these assets instead.

Fixes buildbarn#33
Closes buildbarn#34
harrysarson pushed a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 25, 2024
bb-remote-asset does some cleverness to pop `Directory` objects into the
OutputDirectories field of the ActionResult for semantic purposes. This
commit cleans up that cleverness to allow blobs that look a bit like
directories to be uploaded as asset.

Prior to this commit we detected whether an object is a `Directory` by
fetching it from CAS and attempting to unmarshal it into a `Directory`
message. This commit requires all assets uploaded via `PushDirectory`
are `Directories` (and treats assets uploaded via `PushBlob` as blobs).

We still must convert Directories to a Tree to generate the `TreeDigest`
needed by the `ActionResult`, so the action cache asset store will fetch
from the CAS when putting a new directory. (The ActionResult needs a
`TreeDigest` to satisfy bb-storage's completeness checking.)

If the client decides to use `PushDirectory` to push something that
isn't a `Directory` (which isn't explicitly banned by the spec), then we
will
error. The client will need to use `PushBlob` for these assets instead.

Fixes buildbarn#33
Closes buildbarn#34
@tomcoldrick-ct
Copy link
Collaborator Author

Superseded by #40

harrysarson added a commit to harrysarson/bb-remote-asset that referenced this pull request Jul 29, 2024
bb-remote-asset does some cleverness to pop `Directory` objects into the
OutputDirectories field of the ActionResult for semantic purposes. This
commit cleans up that cleverness to allow blobs that look a bit like
directories to be uploaded as asset.

Prior to this commit we detected whether an object is a `Directory` by
fetching it from CAS and attempting to unmarshal it into a `Directory`
message. This commit requires all assets uploaded via `PushDirectory`
are `Directories` (and treats assets uploaded via `PushBlob` as blobs).

We still must convert Directories to a Tree to generate the `TreeDigest`
needed by the `ActionResult`, so the action cache asset store will fetch
from the CAS when putting a new directory. (The ActionResult needs a
`TreeDigest` to satisfy bb-storage's completeness checking.)

If the client decides to use `PushDirectory` to push something that
isn't a `Directory` (which isn't explicitly banned by the spec), then we
will
error. The client will need to use `PushBlob` for these assets instead.

Fixes buildbarn#33
Closes buildbarn#34
tomcoldrick-ct pushed a commit that referenced this pull request Jul 29, 2024
bb-remote-asset does some cleverness to pop `Directory` objects into the
OutputDirectories field of the ActionResult for semantic purposes. This
commit cleans up that cleverness to allow blobs that look a bit like
directories to be uploaded as asset.

Prior to this commit we detected whether an object is a `Directory` by
fetching it from CAS and attempting to unmarshal it into a `Directory`
message. This commit requires all assets uploaded via `PushDirectory`
are `Directories` (and treats assets uploaded via `PushBlob` as blobs).

We still must convert Directories to a Tree to generate the `TreeDigest`
needed by the `ActionResult`, so the action cache asset store will fetch
from the CAS when putting a new directory. (The ActionResult needs a
`TreeDigest` to satisfy bb-storage's completeness checking.)

If the client decides to use `PushDirectory` to push something that
isn't a `Directory` (which isn't explicitly banned by the spec), then we
will
error. The client will need to use `PushBlob` for these assets instead.

Fixes #33
Closes #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PushBlob broken on blobs that unmarshal as Directorys using the Action Cache asset store
3 participants