Replies: 4 comments 3 replies
-
Error messages are often incomprehensible throughout scientific Python (not highlighting or singling out Awkward by any means: numpy, pandas, matplotlib all suffer from this at least as much), so I tend to doubt this would be much worse (without having tried it, of course). |
Beta Was this translation helpful? Give feedback.
-
my 2c is this still better than CuPy's behavior. If we can reach deeper, (idk how deep is awkward plan to interface with CUDA), you can certainly check for exception during a non-blocking synchronization. (ref) |
Beta Was this translation helpful? Give feedback.
-
It sounds like people are in favor of error messages, against following CuPy's lead in pretending they don't exist. I can get behind that. Our plans for CUDA are to have Awkward Arrays be manually copyable between main memory and GPU global memory, and all the high-level operations (slicing, source = ak.to_backend(source, "cuda") at the beginning of the script and sink = ak.to_backend(sink, "cpu") at the end, and it would work the same. If we put If we put
In addition to any CUDA errors ("GPU is unplugged! Plug it back in!"), the errors we're interested in are Awkward indexing errors, like Oh! I just got what you're saying: there's a CUDA call (different from That certainly looks desirable, though it means that the error feedback could come even later. When step "C" is requested from the CPU and returns control to Python, step "B" (which has an error) might not have finished yet. The error flag could prevent "C" from doing any work: every device thread could start by checking the error flag and refusing to work if it's set, and then steps "D", "E", and "F" all quickly skip their work to report the error only at the end. Revised question to everybody: what if errors are only raised when sink = ak.to_backend(sink, "cpu") is called? We can insert enough forensic information into the error state structure to say which high-level operation (slicing,
and possibly include a line number in the source code calling In this model, error handling is strictly global, but it looks like CuPy has only one CUDA context, so we may be limited to that from external constraints (we use CuPy). It may be that multi-GPU handling is off the table, too: that would have to be implemented by multiple Python processes (such as through Dask). One thing that I like about raising GPU data-dependent errors in |
Beta Was this translation helpful? Give feedback.
-
I meant to answer this question, too, but got distracted by other things. There are three plans for Awkward-CUDA integration:
|
Beta Was this translation helpful? Give feedback.
-
@swishdiff has started developing infrastructure to perform Awkward Array calculations on GPUs. In doing this, we're facing some questions that would have implications for users. One of these deals with concurrency.
When you launch a CUDA kernel in C++, the kernel runs asynchronously, returning control to the C++ driving program before the calculation is complete (potentially). But Awkward Array operations are eager: they finish calculating before returning control to Python. (dask-awkward is another story.) Since the whole point of a GPU backend is for better utilization of resources, we may want to adopt that model.
Each Awkward operation (
ak.*
functions, NumPy ufuncs, slicing, etc.) assumes that the values in an array are valid, so at minimum there needs to be acudaSynchronize
between each operation (and also between the intermediate steps that comprise a high-level operation). But if we put thecudaSynchronize
at the beginning each Awkward operation,we can get strictly greater utilization than if we put the
cudaSynchronize
at the end of each operation,where "CPU" is returning to Python and doing Python stuff, while "GPU" is doing numerical calculations.
However, there's a consequence: if control is returned to Python before the numerical calculation completes, then any errors deriving from data in the arrays can't be raised as Python exceptions. If, for instance, you're trying to broadcast together two arrays that have different lengths, you'll get an error about that (the length is represented in Python, on the CPU), but if they have the same outer length and incompatible internal lengths, you won't get an error. That includes things like
when some events might have zero muons.
CuPy also returns control to Python before the GPU calculation completes, so I wondered what CuPy does. Here's an example of an operation that can only raise an error if you look at the data in the array: slicing by an array of integer indexes, some of which are out of bounds.
Okay, NumPy raises an error if the slice is wrong. What does CuPy do?
It does not raise an error! The last element should be beyond the bounds of the array, but they evidently "wrapped around," returning 5.5, a value near the beginning. This is on purpose documentation, presumably because of this issue—it can either return control to Python before it finishes processing or it can detect the error, but not both.
We could take a similar policy, but there are more ways that Awkward operations can encounter errors in the midst of processing. How do people feel about the possibility of running calculations that suppress errors—doing some random thing like CuPy's wrap-around when the CPU-based calculation would raise an error?
Another possibility that @swishdiff and I discussed is to set a flag and raise a Python exception when you try to do the next operation. That is, if you compute A, B, and C, and B is invalid, you get the error message when C begins. It sounds like that could make debugging difficult, but we could put the name of operation B (e.g.
ak.this
orak.that
) in the error message.What does everyone think?
Beta Was this translation helpful? Give feedback.
All reactions