-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] run Dask tests on aarch64 architecture #3996
Conversation
seems like they worked ok! I can see in the logs that the Dask tests definitely ran
|
wait I take that back. Was looking at the wrong test. A lot of them were xfailed.
full log
I don't see anything in the logs explaining why so many were xfailed. |
This code is on the code path of the Don't know why we don't see anything in the logs though... |
Yeah, me too!.. Also, build took ~1.5h with only few passed tests. Guess all xfailed ones were aborted before completing.
If that's true, then it is very dangerous to use this fixture because we may not spot actual errors with our main Dask builds as the whole build will be marked green. |
I agree :/ I'll have to investigate more |
I just updated this to latest |
Looks better than #3996 (comment).
Not in terms of number of failed tests but in terms that they are |
Interesting! I looked quickly at the logs from https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=9916&view=logs&j=c2f9361f-3c13-57db-3206-cee89820d5e3, seems most of the errors are some form of this:
I can do some research and see if similar errors have been reported to |
Not sure if it's relevant, but it does look like the aarch64 builds are getting slightly newer versions of the key dependencies.
I mention that because the error we're getting seems to be happening at the level where Dask is doing sum |
Just noticed that it's all-green now! 😱 I'd like to make any random Dask test to fail and check QEMU reaction. |
oh wow, great! Yes, I'm curious to see what happens if you add a |
OK, corrupted Dask test on QEMU failed as it was expected:
Reverting test commit now. |
This reverts commit c43c985.
Were all previous test failures on QEMU due to environment compatibility issues or did the latest Dask (or its deps) include a real fix? If latter, I think it will be useful to identify that release and document it as minimum required version for aarch64 platform. How do you think @jameslamb ? |
I'm going to push some empty commits to check the tests stability and measure their time consumption. |
Well, the latest 5 jobs were passed and Dask tests add no more than 35min to the overall QEMU job time. Now its' duration either ~50min or ~105min (this binarization is due to that sometimes QEMU job is x2 faster than other times). |
Oh interesting! I never noticed that big differences between different job runtimes before.
I am ok with this setup for now. The level of activity on the project hasn't been high enough that that would be add significant friction to the development process. |
Me too. Merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's merge! Thanks for doing this. I'm glad to see more coverage of the Dask package.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Refer to #3948 (comment).
Now that QEMU is setup and proved to be stable on our CI, let's run Dask tests on aarch64 architecture and see what happens.