-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve flexibility for publishing options #2964
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev #2964 +/- ##
=======================================
- Coverage 88.7% 88.6% -0.2%
=======================================
Files 89 89
Lines 11023 11040 +17
=======================================
+ Hits 9785 9788 +3
- Misses 1238 1252 +14 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean :) I just wonder if we want to keep the upload in the general cleanup function vs. trigger it on build success.
Please take another look, I have retained the original behavior of copying to GCS after the ETL finishes, regardless of the outcome. |
I have merged |
@@ -109,6 +116,9 @@ if [[ $ETL_SUCCESS == 0 ]]; then | |||
ETL_SUCCESS=${PIPESTATUS[0]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to remove the success
file here prior to distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we replace success
file with, say, json file that has some runtime statistics from how the data was generated, would we want to include it in distributed files, or also not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a little build metadata file would be worth including. We're distributing the pudl-etl.log
file. It would kind of be a structured accessory to that output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged dev
in and added a comment to explain the bash conditional environment variable assignment magic.
@@ -83,6 +85,11 @@ jobs: | |||
- name: Set up Cloud SDK | |||
uses: google-github-actions/setup-gcloud@v1 | |||
|
|||
- name: Determine commit information | |||
run: |- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about the block chomping operator. Does GHA complain when there's an extra newline at the end here?
@@ -109,6 +120,9 @@ if [[ $ETL_SUCCESS == 0 ]]; then | |||
ETL_SUCCESS=${PIPESTATUS[0]} | |||
|
|||
# Dump outputs to s3 bucket if branch is dev or build was triggered by a tag | |||
# TODO: this behavior should be controlled by on/off switch here and this logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - having the action just pass in a switch would be nice. I think at some point we should replace this whole nightly build harness with a Python script that's more robust and that would be a nice time to fix this too.
This change slightly improves upon the flexibility of publishing outputs to remote destinations.
PUDL_GCS_OUTPUT
env variable is introduced togcp_pudl_etl.sh
runner script to possibly override the destination used to publish stuff.PUDL_PUBLISH_DESTINATIONS
env variable to immediately ship outputs after ETL finishes.Option (2) should eventually replace the logic contained within (1), once we integrate data validation into the ETL or decouple it and do it later on.