Consider (and attempt) blessing snapshots runs to "release" status #352

kltm · 2024-01-17T01:05:53Z

Look at blessing snapshots to release, to:

help with things like Explore getting the pipeline completing again by adjusting settings and runtime parameters #349
prevent conflicting run timings when snapshot or release not going well

No new libraries or technologies. The only "interesting" additions would likely be:

an extension of the current Zenodo scripts to assemble an upload package from a given snapshot and push it
snapshots to get an associated date and location
figure out what to do about testing

The text was updated successfully, but these errors were encountered:

kltm · 2024-03-04T22:34:50Z

Noting that we have a week-long holding pen for snapshots already built in for debugging, during the "Publish" step. If I switch over to having these autoclean by bucket policy, these would give us a clean jump-off point to perform the manual publication that we're already doing because of the Zenodo instability. This holding pen could be arbitrarily extended up from a week to however long we want.

While this very much falls short of a full after-the-fact "blessing" system, it is actually very in line with current practices and I believe that with the change of a few lines of the current manual release SOP, we could bring up a successful snapshot.

@pgaudet What are the minimum indicators you need before knowing if a snapshot is worthwhile? Would you be able to look at the stats and, if it looks okay, let me know and I could put it out on the experimental AmiGO so you could take a closer look? How would letting you know work? Could I just sign you up for all success snapshot run emails and you get back to me when the timing feels right? If this kind of thing might work for you, I think I have a fairly quick way forward:

add better hygene to the snapshot holding pen (i.e. go-data-products-daily), so that only intended files are kept
create release SOP to move held daily to zenodo, publish, and deploy
pause/remove release code
add bits to warn you of snapshots passing
remove release amigo-exp deployment, create manual SOP that aims at specified daily bucket

kltm · 2024-03-06T00:21:09Z

7-day existence rule added; we should see results very soon.

kltm · 2024-03-06T22:10:28Z

The dailies now auto-clean. Moving forward, we can use these as a clean base, within a week, to create a release.

pgaudet · 2024-03-11T12:32:13Z

@kltm

What are the minimum indicators you need before knowing if a snapshot is worthwhile? Would you be able to look at the stats and, if it looks okay, let me know and I could put it out on the experimental AmiGO so you could take a closer look? How would letting you know work? Could I just sign you up for all success snapshot run emails and you get back to me when the timing feels right?

The same procedure as we have now for the release seems appropriate:

I get a notification that a release/snapshot is ready to be checked. Note that having the data on some experimental AmiGO is required for the checks to be carried out.
I look at the stats, and if all is OK, I notify you. Right now this communication is by email; we can change that if needed.

Does that answer all the questions?

Thanks, Pascale

kltm · 2024-03-11T21:31:24Z

Talking to @pgaudet this morning, until we've run through this a couple of times to work out the kinks (or have a machine that gets us back to where we were), we'll:

setup pascale to get snapshot success emails
pascale will look at reports from the snapshot when she is feeling like the timing is right for a release
within a week, will let seth know that things are looking okay
seth will put the candidate onto amigo-exp
if gets a thumbs-up from pascale, seth will promote the snapshot to a release

kltm · 2024-03-16T00:27:44Z

Okay, after a little consideration, I think I may have some "easy" ways forward, although any one might take a day or so to put together. Essentially, the issue is with a bad docker/jenkins interaction. I can now see a few ways to bypass this:

break the pipeline into two pieces, pre-index and post-index, and do the middle part (essentially) manually. while labor-intensive, this is nearly guaranteed to be tractable
set a pipeline (snapshot) to use a single standing docker instance to build the index. possible issue here is that "remote controlling" docker may be a big PITA, but we bypass the interaction bug and we still have full automation
break the solr load into smaller pieces that should individually not have the footprint to stop things. I think this would likely work, but would be slow to test

kltm · 2024-03-19T21:36:30Z

Actually, poking around in this, I think I'm going to try something else first:
4. "catch" the error, wait, and then continue; going to take a look at the Jenkins docs but, IIRC, this is supported

kltm · 2024-03-19T21:51:07Z

Also, clarifying for "3", to make this work, the whole image would have to be dropped and stood back up. If going that way, there will be some temporary repetition and we may have to introduce a template functions to bypass the string limit we will almost immediately smack into.

…anually to completion; draft work on #352

kltm · 2024-03-19T23:52:34Z

Looking at the failure messages, and understanding how this is happening at a stage level (not a step level), I think I can change tack a little.
I've created a new pipeline snapshot-post-fail; it has the following properties

all stages through to the mega-make have been removed
blanket replacement of "$BRANCH_NAME" with "snapshot"
remove initialize() and watchdog()
in script conditionals (if/else), if there is a 'snapshot', add a 'snapshot-post-fail'

I believe what this should allow me to do is "hijack" the snapshot run with the new pipeline, picking up where the failed (but data-wise sound) run terminated.

kltm · 2024-03-19T23:58:45Z

https://github.com/geneontology/pipeline/blob/snapshot-post-fail/Jenkinsfile

kltm · 2024-03-20T23:44:53Z

Cheers to @dustine32 for helping me out with a code review. Issues that I'll fix before proceeding:

match metadata to snapshot, specifically TARGET_BUCKET
re-add watchdog, cause I screwed up above
change "when" variables in "Publish" to let in snapshot-post-fail

kltm · 2024-03-26T14:57:02Z

@pgaudet I believe a snapshot has now gone through, using the modified pipeline. Would you be able to briefly review it? If it seems solid, we can either 1) attempt to do the new "promotion" procedure, where we try and take a snapshot and make it a release or 2) do the same thing we did here for release, giving us a very very high probability of success.

kltm · 2024-04-04T00:28:17Z

Noting that I'm now working towards something between the two above.
Essentially, I will be taking the release pipeline, removing the first part of it, and replacing it with a "copy from snapshot". We can refine this model and timing, but a huge improvement over what we have now (nothing).
(@dustine32 I'll be hunting after you in the next day or so for a review of that change and as a sanity check.)

kltm · 2024-11-04T19:47:09Z

Talking to @pgaudet , I agree that this is probably closed.

kltm added the enhancement label Jan 17, 2024

kltm self-assigned this Jan 17, 2024

kltm added a commit that referenced this issue Mar 11, 2024

switch snapshot onto manual to get more cycles; work on #352

8cbf1ad

kltm mentioned this issue Mar 16, 2024

Generate rolling builds of the ontology geneontology/go-ontology#27296

Closed

kltm added a commit that referenced this issue Mar 19, 2024

trying to make a post-failure extension of snapshot that can be run m…

b980b6b

…anually to completion; draft work on #352

kltm added a commit that referenced this issue Mar 20, 2024

bring better inline with snapshot metadata; re-add watchdog; #352

9e0e025

kltm added a commit that referenced this issue Mar 20, 2024

let snapshot-post-fail into ublish; #352

ece8750

kltm added a commit that referenced this issue Mar 22, 2024

snapshot-post-fail watchdog fix; #352

b815be1

kltm added a commit that referenced this issue Mar 25, 2024

data variable recovery; #352

8605099

kltm added a commit that referenced this issue Apr 5, 2024

final form for proposed theft of snapshot work for #352

9f81dd6

kltm added a commit that referenced this issue Apr 5, 2024

try and make a safe test for snapshot-to-release copy; for #352

dd4ff2c

kltm added a commit that referenced this issue Apr 5, 2024

better cleaning and copy (and hopefully correct syntax); for #352

d0b7c49

kltm added a commit that referenced this issue Apr 5, 2024

make sure manual-only; for #352

831b237

kltm added a commit that referenced this issue Apr 5, 2024

annoying typo; for #352

c7025cd

kltm added a commit that referenced this issue Apr 5, 2024

tune rsync; for #352

a798b46

kltm added a commit that referenced this issue Apr 5, 2024

re-add all of the finalizing operations to release; for #352

32a6195

kltm added a commit that referenced this issue Apr 10, 2024

missing variable reconstruction for START_DAY; for #352

9fe164d

kltm added a commit that referenced this issue Apr 12, 2024

variable whoops: START_DOW; for #352

425afd5

kltm added a commit that referenced this issue Aug 8, 2024

add announcement for snapshots getting through the first stage; for #352

2702d62

kltm closed this as completed Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider (and attempt) blessing snapshots runs to "release" status #352

Consider (and attempt) blessing snapshots runs to "release" status #352

kltm commented Jan 17, 2024

kltm commented Mar 4, 2024 •

edited

Loading

kltm commented Mar 6, 2024

kltm commented Mar 6, 2024

pgaudet commented Mar 11, 2024

kltm commented Mar 11, 2024

kltm commented Mar 16, 2024 •

edited

Loading

kltm commented Mar 19, 2024

kltm commented Mar 19, 2024

kltm commented Mar 19, 2024

kltm commented Mar 19, 2024

kltm commented Mar 20, 2024 •

edited

Loading

kltm commented Mar 26, 2024

kltm commented Apr 4, 2024

kltm commented Nov 4, 2024

Consider (and attempt) blessing snapshots runs to "release" status #352

Consider (and attempt) blessing snapshots runs to "release" status #352

Comments

kltm commented Jan 17, 2024

kltm commented Mar 4, 2024 • edited Loading

kltm commented Mar 6, 2024

kltm commented Mar 6, 2024

pgaudet commented Mar 11, 2024

kltm commented Mar 11, 2024

kltm commented Mar 16, 2024 • edited Loading

kltm commented Mar 19, 2024

kltm commented Mar 19, 2024

kltm commented Mar 19, 2024

kltm commented Mar 19, 2024

kltm commented Mar 20, 2024 • edited Loading

kltm commented Mar 26, 2024

kltm commented Apr 4, 2024

kltm commented Nov 4, 2024

kltm commented Mar 4, 2024 •

edited

Loading

kltm commented Mar 16, 2024 •

edited

Loading

kltm commented Mar 20, 2024 •

edited

Loading