-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fully parallel upload processing #658
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
@@ Coverage Diff @@
## main #658 +/- ##
=======================================
Coverage 98.02% 98.02%
=======================================
Files 437 438 +1
Lines 36313 36389 +76
=======================================
+ Hits 35597 35672 +75
- Misses 716 717 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
@@ Coverage Diff @@
## main #658 +/- ##
=======================================
Coverage 98.02% 98.02%
=======================================
Files 437 438 +1
Lines 36313 36389 +76
=======================================
+ Hits 35597 35672 +75
- Misses 716 717 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. @@ Coverage Diff @@
## main #658 +/- ##
=======================================
Coverage 98.02% 98.02%
=======================================
Files 437 438 +1
Lines 36313 36389 +76
=======================================
+ Hits 35597 35672 +75
- Misses 716 717 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #658 +/- ##
=======================================
Coverage 98.02% 98.02%
=======================================
Files 437 438 +1
Lines 36313 36389 +76
=======================================
+ Hits 35597 35672 +75
- Misses 716 717 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
7635506
to
b9f675a
Compare
b9f675a
to
f1ab443
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is somewhat cleaner than the existing harness but to be honest i still feel a little uncomfortable with all the if/else branches peppered around to copy something here and skip writing something there. it feels too easy to accidentally break real processing or leave side-effects that real users will be able to see
the approach i imagine would be simpler would be a separate task that either runs nightly and chooses a batch of N commits, or is scheduled as a followup after X% of finisher tasks. this task would fetch completed report JSONs and use the sessions list from them to reconstruct UploadTask
arguments but with dummy commits/repos owned by Codecov plugged in. one dummy repo would be overridden into the expt and the other overridden out of it. we run the identical task arguments for each repo and compare the results
with that approach, any and all copying/staging we need to do for verification can happen in one place, and there's little to no risk of our test procedure accidentally breaking things for production users or accidentally leaving side-effects that they can see. there's nothing to clean up when transitioning from validation to running the actual experiment, it's just a Feature
with a test
and control
group. it doesn't faithfully reproduce carryforward inheritance, but CFF is all settled before anything changes for parallel processing anyway. i think the main downside is having to suppress GitHub API errors because our dummy repos probably won't have unique authentic commits/PRs for each batch of tasks we want to test
out of steam for the day but will see your thoughts tomorrow
# this should be enabled for the actual rollout of parallel upload processing. | ||
# if PARALLEL_UPLOAD_PROCESSING_BY_REPO.check_value( | ||
# "this should be the repo id" | ||
# ): | ||
# upload_obj.state_id = UploadState.PARALLEL_PROCESSED.db_id | ||
# upload_obj.state = "parallel_processed" | ||
# else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you haven't found it, this enum value is what this commented out block is about
a state of PROCESSED
implies the upload's data will be found if you call get_existing_report_for_commit()
. a state of PARALLEL_PROCESSED
indicates UploadProcessorTask
has finished but UploadFinisherTask
has not gotten to it yet. don't remember if the distinction mattered
fully forgot about this bit
I totally agree with this. I’m tempted to just create a new task for parallel processing which removes all the code related to handling multiple uploads in one chunk, and have ideas for further simplification ahead of time. |
i think some of the brittleness is inherent to the "kick off parallel tasks but copy all the inputs and then skip saving the outputs" approach to verification, but i'd be happy to be proved wrong haha. my suggested alternative requires us to handle any GH request failure non-fatally which may be easier said than done there's a lot in i should have said this in my initial comment but: i can't see any specific problems in the PR apart from the edge case with IDs which only matters for comparison with serial results, and that was already there. i think this is all logically correct, and less fragile than it was before. i am excited to see this PR and for this project to get some momentum |
739b6e5
to
ad04519
Compare
I updated this PR yet again, with the following changes:
To be quite honest, I think just keeping the various One thing that I would still have to take care of is the migration path. Rolling out the feature flag currently has a direct effect on already scheduled tasks, which should be avoided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I think I would also wait for @matt-codecov 's review/approval as he has more context into this code.
ad04519
to
7660ffc
Compare
if parallel_feature is ParallelFeature.EXPERIMENT and delete_archive_setting( | ||
commit_yaml | ||
): | ||
parallel_feature = ParallelFeature.SERIAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you happen to know why this setting should disable the experiment? is it a problem for the fully parallel mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the relevant code to avoid creating a copy of the upload, in favor of just using the upload as it exists, provided it does exist and is not being deleted :-)
Parallel processing does not have that problem, as it only has a single task processing (and deleting) a raw upload.
tasks/upload_finisher.py
Outdated
# When we are fully parallel, we need to update the `Upload` in the database | ||
# with the final session_id (aka `order_number`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this taking the place of the PARALLEL_PROCESSED
upload state daniel had?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite. I will discuss these various states a bit more and figure out a good way to go there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But good that you called this out, I found another bug related to the new code from #745 not being ported to this PR yet, which I now did.
7660ffc
to
9169799
Compare
This adds another variant to the `PARALLEL_PROCESSING` feature/rollout flag which prefers the parallel upload processing pipeline in favor of running it as an experiment. Upload Processing can run in essentially 4 modes: - Completely serial processing - Serial processing, but running "experiment" code (`EXPERIMENT_SERIAL`): - In this mode, the final (`is_final`) `UploadProcessor` task saves a copy of the final report for later verification. - Parallel processing, but running "experiment" code (`EXPERIMENT_PARALLEL`): - In this mode, another parallel set of `UploadProcessor` tasks runs *after* the main set up tasks. - These tasks are not persisting any of their results in the database, instead the final `UploadFinisher` task will launch the `ParallelVerification` task. - Fully parallel processing (`PARALLEL`): - In this mode, the final `UploadFinisher` task is responsible for merging the final report and persisting it. An example Task chain might look like this, in "experiment" mode: - Upload - UploadProcessor - UploadProcessor - UploadProcessor (`EXPERIMENT_SERIAL` (the final one)) - UploadFinisher - UploadProcessor (`EXPERIMENT_PARALLEL`) - UploadProcessor (`EXPERIMENT_PARALLEL`) - UploadProcessor (`EXPERIMENT_PARALLEL`) - UploadFinisher (`EXPERIMENT_PARALLEL`) - ParallelVerification The `PARALLEL` mode looks like this: - Upload - UploadProcessor (`PARALLEL`) - UploadProcessor (`PARALLEL`) - UploadProcessor (`PARALLEL`) - UploadFinisher (`PARALLEL`)
9169799
to
3ef30d7
Compare
This adds another variant to the
PARALLEL_PROCESSING
feature/rollout flag which prefers the parallel upload processing pipeline in favor of running it as an experiment.Upload Processing can run in essentially 4 modes:
EXPERIMENT_SERIAL
):is_final
)UploadProcessor
task saves a copyof the final report for later verification.
EXPERIMENT_PARALLEL
):UploadProcessor
tasks runs afterthe main set up tasks.
instead the final
UploadFinisher
task will launch theParallelVerification
task.PARALLEL
):UploadFinisher
task is responsible for mergingthe final report and persisting it.
An example Task chain might look like this, in "experiment" mode:
EXPERIMENT_SERIAL
(the final one))EXPERIMENT_PARALLEL
)EXPERIMENT_PARALLEL
)EXPERIMENT_PARALLEL
)EXPERIMENT_PARALLEL
)The
PARALLEL
mode looks like this:PARALLEL
)PARALLEL
)PARALLEL
)PARALLEL
)