Replies: 3 comments 7 replies
-
Hi @xacond00 , I would recommend checking out the Dr.Jit documentation section on control flow to get a good idea of the differences between symbolic and evaluated modes. Additionally, I would suggest at least as a first pass, if possible, trying to implement your algorithm using Python first, so that you can leverage the But in short, using symbolic loops should at the very least reduce tracing time, because you're now no longer unrolling your loop >8 * >8 times, so it could be an implementation issue on your part. But that's why I think writing it in Python first would be beneficial to narrow down any issues. |
Beta Was this translation helpful? Give feedback.
-
If anyone is wondering. Those kinds of situations cannot even be optimized into a packed vector for single computation instead of a loop, if the types used are dr. structs (Intersections3f etc.), because dr::tile doesn't work on them. The previously suggested local memory is very badly documented, and also super inefficient for this use case, considering this shouldn't be thread local data at all, and rather requires some broadcasting capabilities / saving of multiple (but predictable) jitvars during JIT traversal. This is a huge bottleneck, which should have been remedied in the JIT backend. I really do hope, this problem becomes solvable one way or another, once the number of "useful" operations in drjit library grows. The few operations available so far, prevent implementation of many less straightforward parallel PT algorithms. |
Beta Was this translation helpful? Give feedback.
-
To update on this 'issue'. The main problem was calling expensive brdf->pdf() virtual functions inside the nested loop. Additionally avoiding the virtual pdf computation altogether with simplified fixed function, resulted in just 220s total runtime. (It works, but is not correct solution for all microfacet brdfs). |
Beta Was this translation helpful? Give feedback.
-
I'm writing a algorithm that requires storing sampling data of all sensors in a multisensor situation in vectorized variants. In nutshell it takes samples from one sensor and reprojects the to all the other sensors in the scene + some additional weights.
Ie.
The sample data is allocated as a normal array inside a integrator using new SampleData[], and passed into a custom function which works as follows (for simplicity only case with single sensor is given, but in reality we sample all sensors at once in the first step):
sample()
algorithm that:a) Finds intersection of the scene from selected camera
b) For all the other sensors samples_direction and checks visibility, filling the Data arrray in a single scalar loop
c) Based on stored pdf and other factors computes weight for each sample using MIS in a scalar NESTED loop
d) Samples direct emiters and iradiance at the point
e) Stores the computed radiance in all sensors
The problem with this is, that with larger number of sensors (> 8), the algorithm takes FOREVER to trace and compile (upwards of 10 minutes), and the main culprit is the loop 2c).
Here is simplified version of what it is trying to do:
And as I said, this takes up majority of tracing and compilation time. And I have no idea how to fix it.
a) I've tried transforming the loop into dr::while, however later I realised, using scalar index doesn't do anything and doesn't affect the times in the slightest. Also nesting two dr::loops kills the performance even more :/.
b) When trying to run the while loop with a vector index, I cannot keep a scalar index alongside it, to index into the array, as it gives runtime error. Why aren't vectorized/symbolic loops with a single counter rather than a vector possible ?
c) Using vector index might be possible, but I would have to use gather/scatter, and not sure how am I supposed to make it work with an array of vector structures ? I've only ever seen it used inside BatchSensor to gather simple dynamic array of sensor pointers into a SensorPtr vector... totally different use case.
d) What even is the difference between a normal C++ loop and dr::while_loop that wouldn't diverge at all ? In documentation, I've seen something about running in symbolic mode, but aren't normal loops with vector code also running in symbolic mode ?
e) From my POV, this loop is absolutely unavoidable, as the rest of the data structures are vectorized instead, as much as they can be.
Thanks for any ideas !
Beta Was this translation helpful? Give feedback.
All reactions