Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload MGI upstream / "silver" to mirror.geneontology.io, with new filename, and point metadata to it #369

Closed
4 tasks done
kltm opened this issue Apr 4, 2024 · 9 comments
Assignees

Comments

@kltm
Copy link
Member

kltm commented Apr 4, 2024

Currently, the build process depends on quirks of skyhook. To make this generally usable, we want to upload the MGI upstream file we produce to a stable location (mirror.geneontology.io), with the new filename, and point metadata to it.

From original geneontology/gopreprocess#65

  • change the name to mgi-p2go-homology.gaf
  • push to the bucket - only have to worry about updates, no deletes necessary. file is not versioned.
  • rename silver-issue-325-gopreprocess pipeline to something more intuitive
  • automate the upload to the S3 bucket

go look at the go-copy-to-mirror pipeline branch for finding the S3 bucket.

@kltm
Copy link
Member Author

kltm commented Apr 9, 2024

From @sierra-moxon

this is the current "upstream" for MGI: http://skyhook.berkeleybop.org/silver-issue-325-gopreprocess/products/upstream_and_raw_data/preprocess_raw_files/mgi-merged.gaf

@kltm
Copy link
Member Author

kltm commented Apr 9, 2024

kltm added a commit to geneontology/go-site that referenced this issue Apr 9, 2024
@kltm
Copy link
Member Author

kltm commented Apr 9, 2024

go-site metadata updated in mgi.yaml.

@sierra-moxon
Copy link
Member

sierra-moxon commented Apr 9, 2024

I made a new branch off of the silver-issue-325-gopreprocess pipeline branch called: p2go-homology-upstream-file-generator. This new branch adds a step to include two new subdirectories and a copy of the final GAF file from the upstreams code base to s3://go-mirror/:

  • p2go-homology-upstream-file-generator/preprocess_raw_files/
  • p2go-homology-upstream-file-generator/preprocessed_GAF_output/
  • at the root level, s3://go-mirror/mgi-p2go-homology.gaf.gz is added/overwritten on every successful run of this pipeline branch. This is the MGI upstream now. Seth already changed the go-site metadata to reflect this new name/path.

These capture the incremental output of the upstreams code as well as the final GAF file. Each command in the new pipeline branch overwrites the last run's files in the paths above. I looked a tiny bit into versioning; @kltm - do we need to keep versions of this file or the pipeline outputs?

I pushed this branch, and it will try to run on the next repository scan.

@kltm
Copy link
Member Author

kltm commented Apr 9, 2024

@sierra-moxon A quick note that we need the compressed version of the file.

@sierra-moxon
Copy link
Member

fixed to use .gz version of the file.

@sierra-moxon sierra-moxon moved this from In Progress to Done in MGI upstreams and GPAD/GPI 2.0 release Apr 30, 2024
@kltm kltm moved this from Done to In Progress in MGI upstreams and GPAD/GPI 2.0 release May 1, 2024
@kltm
Copy link
Member Author

kltm commented May 1, 2024

@sierra-moxon Sorry to ask, but I don't think the current production metadata points to this yet? Perhaps we should at an item to the top, just so this can be tracked?

@kltm
Copy link
Member Author

kltm commented May 1, 2024

Or maybe that's geneontology/go-site#2285 ...in which case I'll put things back the way you had them :)

@sierra-moxon
Copy link
Member

yes, that one geneontology/go-site#2285 should be the one we use to merge metadata changes in, I have the MGI metadata changes in this branch (where we point to the mirror version of the gopreprocess MGI gaf file, etc). This branch also has a lot of hacking in it to make my pipeline go fast. So I will cherry pick changes into a new branch for merge into master/main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants