Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the race condition of bicgstab #1676

Merged
merged 2 commits into from
Sep 10, 2024
Merged

fix the race condition of bicgstab #1676

merged 2 commits into from
Sep 10, 2024

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Sep 5, 2024

This pr fixes the race condition of bicgstab.
Originally, we put the finalize and update together.
That means that some threads might get the finalized value from the other thread and the thread will not update the value.

We need to separate them to ensure update the stop_status after update values.

It should solve #1563

@yhmtsai yhmtsai added 1:ST:ready-for-review This PR is ready for review is:bugfix This fixes a bug labels Sep 5, 2024
@yhmtsai yhmtsai requested a review from a team September 5, 2024 16:22
@yhmtsai yhmtsai self-assigned this Sep 5, 2024
@ginkgo-bot ginkgo-bot added mod:cuda This is related to the CUDA module. type:solver This is related to the solvers mod:hip This is related to the HIP module. labels Sep 5, 2024
@yhmtsai yhmtsai mentioned this pull request Sep 5, 2024
@yhmtsai yhmtsai force-pushed the fix_bicgstab branch 2 times, most recently from 2657f8c to 17672ea Compare September 5, 2024 22:08
@MarcelKoch MarcelKoch requested a review from a team September 9, 2024 07:05
@upsj
Copy link
Member

upsj commented Sep 9, 2024

Would it be possible to construct a test where this fails more reliably? E.g. many rhs, few rows

@yhmtsai
Copy link
Member Author

yhmtsai commented Sep 9, 2024

It needs more rows such that some have updated the stop_status, but the rest does not read the stop_status yet after the cache value is updated.
The failed test might be easier on the cpu side but not gpu side which relies on when threads can see the updated value from the cache/memory.
I have added a test with some description

@yhmtsai
Copy link
Member Author

yhmtsai commented Sep 9, 2024

@upsj @MarcelKoch please feel free to take a look at the test. I will wait for the test failed on our CI and then push the fix back again. If everything works correctly, I will merge this pr

@yhmtsai
Copy link
Member Author

yhmtsai commented Sep 9, 2024

The failed tests are in https://gitlab.com/ginkgo-project/ginkgo-public-ci/-/pipelines/1446060008
except for the amd due to CI issue, others faced the issue

@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Sep 9, 2024
@yhmtsai yhmtsai added this to the Ginkgo 1.9.0 milestone Sep 9, 2024
@yhmtsai yhmtsai merged commit 6db98d6 into develop Sep 10, 2024
10 of 14 checks passed
@yhmtsai yhmtsai deleted the fix_bicgstab branch September 10, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. is:bugfix This fixes a bug mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants