Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a lock to CuArray for improved thread safety #2026

Closed
wants to merge 2 commits into from

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Aug 12, 2023

This PR adds a lock to CuArray so that we can protect against operations that mutate the array object. Note that this doesn't need to extend to the contained ArrayStorage, as that should already be threadsafe, as that's an immutable struct with the refcount being handled atomically.

@ToucheSir encountered a case where multithreaded use of unsafe_free!, both manually and from finalizers, resulted in a UndefRefError accessing the context. It's not entirely clear to me whether that's an error accessing the thread-local context, or the array context, so @ToucheSir please post the full backtrace.

Currently, unsafe_free! assumes that it doesn't need a lock, because (1) finalizers don't run concurrently to regular code, and (2) unsafe_free! doesn't make sense to be called from multiple threads. It may make sense to protect the operation though, because technically it's legal to call unsafe_free! multiple times.

In addition, I suspect we'll need to protect other operations too (like resize!) so it probably doesn't hurt adding a lock anyway.

cc @vchuravy

@maleadt maleadt added cuda array Stuff about CuArray. bugfix This gets something working again. labels Aug 12, 2023
@codecov
Copy link

codecov bot commented Aug 12, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (38fb707) 62.31% compared to head (4120270) 62.31%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2026   +/-   ##
=======================================
  Coverage   62.31%   62.31%           
=======================================
  Files         151      151           
  Lines       12842    12843    +1     
=======================================
+ Hits         8002     8003    +1     
  Misses       4840     4840           
Files Changed Coverage Δ
src/array.jl 85.07% <100.00%> (+0.03%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ToucheSir
Copy link
Contributor

ToucheSir commented Aug 12, 2023

@maleadt the only backtrace I have is the non-minimal one from JuliaML/MLUtils.jl#161. Have not managed to create a MWE which replicates this. You can see though that the finalizer is called in response to the allocation of an array on CPU (which I assume means the GC is involved?) and that it managed to pass the context! context manager (which I assume should've thrown if the array context was already destroyed at that point).

@maleadt
Copy link
Member Author

maleadt commented Aug 14, 2023

Thinking about this some more, I don't think we can make CuArray realistically thread-safe. Every operation would need to take a lock, so we should punt this on the user (much like how Base datastructures aren't thread-safe either).

The actual issue this attempted to solve should be taken care of by #2029. I don't think that unsafe_free! could even cause this.

@maleadt maleadt closed this Aug 14, 2023
@vchuravy vchuravy deleted the tb/array_lock branch August 14, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again. cuda array Stuff about CuArray.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants