Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure chunked TOC and tar-split metadata are consistent #2035

Merged
merged 4 commits into from
Jul 19, 2024

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Jul 16, 2024

This should resolve #2014. Basically untested, filing now for early design review.

We simply enforce that the TOC and tar-split have exactly the same contents. To the extent the zstd:chunked format is only being produced by this package, we can expect an equal match (right now? ), but that may become harder as the format evolves (e.g. the recent timeIfNotZero change).

@giuseppe PTAL. Cc: @cgwalters .

Comment on lines +36 to +37
// This is horrible, but we don’t know how much padding to skip. (It can be computed from the previous hdr.Size for non-sparse
// files, but for sparse files that is set to the logical size.)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty horrible hack.

It should be possible for tar-split/archive/tar to expose the expected padding size in Header.

(Alternatively, the OCI spec suggests that sparse files should not be used, but I don’t expect that can be relied upon.)

@mtrmac
Copy link
Collaborator Author

mtrmac commented Jul 16, 2024

tar-split and TOC data is inconsistent: reading tar-split entries: invalid character 'T' looking for beginning of value

The test failure is on unrealistic data in a different test in this package, I’ll update that tomorrow.

Copy link
Contributor

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a relatively superficial skim but generally LGTM.

pendingFiles := map[string]*internal.FileMetadata{} // Name -> an entry in toc.Entries
for i := range toc.Entries {
e := &toc.Entries[i]
if e.Type != internal.TypeChunk {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor but I paused on this for a bit, uncertain; perhaps:

// Chunks are just part of files, they won't appear explicitly
// in the tar stream, so we don't validate them.
if e.Type == internal.TypeChunk {
    continue
}

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be best documented somewhere around pkg/chunked/internal/compression.go, this is “just” straightforwardly consuming the structure as designed.

#1939 did add the basic documentation of TypeChunk, although the full semantics of Offset/ChunkOffset etc. is not written down anywhere.

Copy link
Collaborator Author

@mtrmac mtrmac Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be a good citizen, I have added a commit documenting the format (to the extent I can reverse-engineer it from the code) to the documentation of internal.FileMetadata.


// iterateTarSplit calls handler for each tar header in tarSplit
func iterateTarSplit(tarSplit []byte, handler func(hdr *tar.Header) error) error {
// This, strictly speaking, hard-codes undocumented assumptions about how github.com/vbatts/tar-split/tar/asm.NewInputTarStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this API could land as a PR to that repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would certainly be better. The timing will probably force us to carry this in c/storage at for a few days/weeks, but I should definitely prepare a tar-split PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have recorded the task to file a tar-split PR as an item on containers/image#2189 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as vbatts/tar-split#71 .

"github.com/vbatts/tar-split/tar/storage"
)

func testTarheader(index int, typeFlag byte, size int64) tar.Header {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name made me thing this is itself a test, how about createTestTarheader?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@mtrmac mtrmac force-pushed the wip branch 2 times, most recently from 983b14b to 886c158 Compare July 17, 2024 16:43
@mtrmac mtrmac force-pushed the wip branch 2 times, most recently from 28222ae to 85d7be5 Compare July 17, 2024 19:01
mtrmac added a commit to mtrmac/libpod that referenced this pull request Jul 17, 2024
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mtrmac
Copy link
Collaborator Author

mtrmac commented Jul 18, 2024

Podman tests passed in containers/podman#23307 .

@cgwalters
Copy link
Contributor

/lgtm

In addition to the existing use when creating a TOC from tar data,
we will also need it when parsing TOC and tar-split data.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We are going to be checking its consistency with the TOC.

Signed-off-by: Miloslav Trmač <[email protected]>
@openshift-ci openshift-ci bot removed the lgtm label Jul 18, 2024
@mtrmac mtrmac changed the title WIP: Ensure chunked TOC and tar-split metadata are consistent Ensure chunked TOC and tar-split metadata are consistent Jul 18, 2024
@mtrmac mtrmac marked this pull request as ready for review July 18, 2024 23:25
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Jul 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, giuseppe, mtrmac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 8d26ede into containers:main Jul 19, 2024
18 checks passed
@mtrmac mtrmac deleted the wip branch July 19, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zstd:chunked metadata ambiguity
3 participants