-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nightly exchange fuzzer. #8862
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgpai Thank you for adding this job. Will it be running as experimental job? If not, why? Have we ran this as experimental job earlier and determined that it is stable enough?
Please, update the PR description to clarify for how long ExchangeFuzzer runs and how many iterations it goes through in that time.
.github/workflows/scheduled.yml
Outdated
- name: "Install dependencies" | ||
run: source ./scripts/setup-ubuntu.sh | ||
|
||
- name: Download join fuzzer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
join -> Exchange
.github/workflows/scheduled.yml
Outdated
with: | ||
name: exchange | ||
|
||
- name: "Run Join Fuzzer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join -> Exchange
Why are some step names are in double quotes and others are not?
- name: Download join fuzzer
vs.
- name: "Run Join Fuzzer"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont think it really matters : https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions , but we can remove the quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Would be nice to pick one way and use it consistently (in a follow-up PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yaml can do some fun things with unquoted strings so it's usually good practice to quote all strings. But of course normal step names are normally not affected by these.
.github/workflows/scheduled.yml
Outdated
@@ -113,6 +113,14 @@ jobs: | |||
name: join | |||
path: velox/_build/debug/velox/exec/tests/velox_join_fuzzer_test | |||
|
|||
- name: Upload Exchange fuzzer | |||
uses: actions/upload-artifact@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does @V3 mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its version 3 of the upload-artifact action : https://github.com/actions/upload-artifact?tab=readme-ov-file#v4---whats-new
.github/workflows/scheduled.yml
Outdated
with: | ||
name: exchange-fuzzer-failure-artifacts | ||
path: | | ||
/tmp/exchange_fuzzer_repro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's in that directory? I tried to look up documentation for this fuzzer, but couldn't. Does it exist? If not, it would be nice to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to save a potential reproduction for a failure: https://github.com/facebookincubator/velox/blob/main/velox/exec/tests/ExchangeFuzzer.cpp#L314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add documentation in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
b977c2f
to
2cfe5cc
Compare
.github/workflows/scheduled.yml
Outdated
&& echo -e "\n\Exchange fuzzer run finished successfully." | ||
|
||
- name: Archive Exchange production artifacts | ||
if: always() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this statement? Isn't it always true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because typically the artifact mechanism is used when the job produces some artifact at the end of the run (think you are building a package etc) that you want to upload . When the job fails most jobs dont want to upload half built packages etc, however in our case its the opposite as when we fail we want to save the repro and thus the always flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you for explaining.
@mbasmanova I ran it locally for 20 - 30 minutes and did not see any failure (more than 100 iterations), my understanding was that this was mostly stable. However I see that on CI it seems to be failing , and have created an issue : #8876 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Note: I think its better to not land this till we fix issue : #8876. |
2cfe5cc
to
ed58863
Compare
The mmapAllocator mmaps address space equal to allocatorCapacity for all page size classes (there are total of 9 classes) Since it runs fine everywhere my best guess is that the container in github actions seems to not allow large mmaps. Since we dont have direct ssh access to the container (which would have helped me poke the kernel or play around with allocatorCapacity's value), I am instead changing the PR and reducing the allocatorCapacity to see if the job now succeeds. |
This will need to be rebased on #8734 as well. |
a9471d2
to
8a2d18f
Compare
@bikramSingh91 Setting exchange fuzzer to 8 gb seems to work. |
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This is pending #8734 |
4903e7f
to
cf8884e
Compare
cf8884e
to
e2244d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments :)
e2244d6
to
1df832a
Compare
1df832a
to
1c1c17f
Compare
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: This PR adds a nightly fuzzer run for the exchange fuzzer. It runs along with the other scheduled nightly jobs and the failure repros are saved to artifacts. The job runs for 30 minutes and locally I saw 100 odd iterations. Pull Request resolved: facebookincubator#8862 Reviewed By: mbasmanova Differential Revision: D54230287 Pulled By: kgpai fbshipit-source-id: 31543026678227038b85fc632beb8475fa42db1e
This PR adds a nightly fuzzer run for the exchange fuzzer. It runs along with the other scheduled nightly jobs and the failure repros are saved to artifacts. The job runs for 30 minutes and locally I saw 100 odd iterations.