Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full stacks make recommendations on minimal manifest file content #9

Open
mellybelly opened this issue Mar 14, 2018 · 6 comments
Open
Labels
help wanted Extra attention is needed

Comments

@mellybelly
Copy link

For example, versioning information, schema location and version, schema documentation/UML diagram etc.

AGR provided the attached file as an example.
alliance_file_manifest.txt

It would be great to have other examples too.

@owhite owhite added the help wanted Extra attention is needed label Mar 14, 2018
@owhite owhite removed the Examples label Mar 14, 2018
@cmungall
Copy link

Note to consumers of the manifest

This URL has to be prepended to all files referenced: https://s3.amazonaws.com/mod-datadumps/

E.g. for FB_1.0.4_4.tar.gz, this is the URL:
https://s3.amazonaws.com/mod-datadumps/FB_1.0.4_4.tar.gz

In future this can be made more explicit by using a standard bdbag distribution or similar

@mikedarcy
Copy link

The file SGD_1.0.4_1.tar.gz as listed in the example manifest does not exist at https://s3.amazonaws.com/mod-datadumps/SGD_1.0.4_1.tar.gz and a 404 error is returned when trying to retrieve this file.

In addition, there are no MD5 (or SHA-256, etc) checksums associated with any of the files listed in the manifest, nor are there checksums associated with the S3 objects themselves, making it impossible to verify the data integrity of these files. The authoritative issuer of these files should provide this information somehow, whether it be in the example manifest or (preferably), associated directly with the S3 objects as MD5 checksums.

For S3 object uploads, the Content-MD5 request header should be set in the file PUT request. The value of this header is The base64-encoded 128-bit MD5 digest of the file data. For more information, see: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html and https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/.

@christabone
Copy link

christabone commented Mar 16, 2018

@cmungall This is not necessarily correct, the files which were submitted do not always resolve to the S3 bucket. A few (~4-5) of the files were renamed for consistency and clarity before submission (the content of the files is the same). This will be fixed with the new data submission system for the next release.

@mikedarcy Including checksums is an excellent idea, we will look into this for our next release.

@ianfoster
Copy link
Contributor

Here is the BDBag that we created for the manifest: http://n2t.net/minid:b94x3r.

(This is an earlier BDBag, which is marked as obsoleted as the contents of one of the files referenced in the manifest was updated--demonstrating the importance of using BDBags to capture the specific versions of files that we are working with: http://n2t.net/minid:b9j69h.)

@jmcherry-zz
Copy link

We are working to automate addition to AWS, then copy to GCP. BDBag part of that, I believe.

@christabone
Copy link

This is correct. The data submission system is nearing completion and we've started discussions regarding the use of BDBags as well. @ianfoster Very nice work, I'll pass on the link to the rest of the Alliance group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants