SpriteList without spatial hashing is slow for multiple collision tests when GPU transform is used #2363

eschan145 · 2024-09-11T00:47:59Z

Bug Report

When profiling my code, I found out that arcade.gl.query.__exit__ was taking up the vast majority of my performance time, more than four times than anything else. It doesn't seem right that an __exit__ function should take this long, and my project is running slowly because of this. I'm surprised this hasn't been brought up before, and it may be a problem with my code.

         5219230 function calls (5174575 primitive calls) in 49.182 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    35634   15.204    0.000   15.364    0.000 C:\Users\Esamu\AppData\Roaming\Python\Python312\site-packages\arcade\gl\query.py:132(__exit__)
     2899    3.393    0.001    4.267    0.001 {built-in method time.sleep}
    35634    3.323    0.000    3.456    0.000 C:\Users\Esamu\AppData\Roaming\Python\Python312\site-packages\arcade\gl\vertex_array.py:342(transform_interleaved)
     4332    1.925    0.000    1.985    0.000 {built-in method nt._getfinalpathname}

System Info

Arcade 3.0.0.dev35
------------------
vendor: Intel
renderer: Intel(R) UHD Graphics
version: (3, 3)
python: 3.12.2 (tags/v3.12.2:6abddd9, Feb  6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)]
platform: win32
pyglet version: 2.1.dev5
PIL version: 10.2.0

Steps to reproduce/example code:

My project is located at https://github.com/eschan145/SpaceShooter.

The text was updated successfully, but these errors were encountered:

DragonMoffon · 2024-09-24T04:27:15Z

Thanks for bringing this to our attention. The link to your repository doesn't work at the mo. Secondly, I wonder if you could also provide the top functions by the highest average time rather than the highest total time given that the method was called 30000+ times, which means each call takes around 0.5 ms to complete (admittedly not insignificant but it will be interesting to see the slowest methods aswell)

Cleptomania · 2024-09-24T04:36:54Z

I strongly suspect this isn’t anything Arcade can really do about it. My guess is this is just the result of the GL querying being slow, querying blocks the pipeline in GL and is generally a pretty expensive operation to perform. Would need more in depth profiling to determine the exact function call eating the bulk of the time, but I strongly suspect that it’s one of the GL query operations.

If it is that, the performance can vary a decent bit between different hardware/drivers as well.

DragonMoffon · 2024-09-24T04:38:43Z

It's unsurprising that this particular exit method takes a while. multiple gl calls are made per property queried. Given that these are getting information directly from the GPU, they are probably blocking methods, which will cause your program to hang until the GPU responds. If you are querying large or complex gl operations such as data transfers, then you will have to wait for that whole process to finish rather than continuing immediately and letting it complete in the background.

From my investigation, no internal arcade process uses the Query object, so if it isn't critical to your applications functionality, it might be better to disable it when not debugging.

einarf · 2024-09-24T12:30:49Z

It likely needs to wait for the rendering queue to process the call you are measuring to obtain the query info. If there are many things already in the queue it also needs to wait for those to complete.

eschan145 · 2024-09-24T14:54:56Z

Thanks for your responses.
I've fixed the repo link, so it should work now. Run profiler.py to see the profiling results.

Secondly, I wonder if you could also provide the top functions by the highest average time rather than the highest total time given that the method was called 30000+ times, which means each call takes around 0.5 ms to complete (admittedly not insignificant but it will be interesting to see the slowest methods aswell)

cProfile doesn't support sorting by time percall. And most of the ones that take up the most time aren't really revelant anyways, because they are only called once. However, the update function of the main game takes about 0.026 seconds to complete and is called 163 times, but that's really all that's significant.

From my investigation, no internal arcade process uses the Query object, so if it isn't critical to your applications functionality, it might be better to disable it when not debugging.

I tried commenting it out, but collision checking stopped working. Are you referring to something else?
output.txt

eschan145 · 2024-09-27T22:30:40Z

It seems it's being called when checking for collisions. I've added custom collision detection using AABBs and it has replaced arcade.gl.query as the slowest method.

DragonMoffon · 2024-09-28T03:09:29Z

You are correct sorry, the reference finder didn't pick that up. If you are using GPU collisions for sprites then it does have a query object. That shouldn't be running by default I'll look into it.

DragonMoffon · 2024-09-28T03:10:51Z

def _get_nearby_sprites(
    sprite: BasicSprite, sprite_list: SpriteList[SpriteType]
) -> List[SpriteType]:
    sprite_count = len(sprite_list)
    if sprite_count == 0:
        return []

    # Update the position and size to check
    ctx = get_window().ctx
    sprite_list._write_sprite_buffers_to_gpu()

    ctx.collision_detection_program["check_pos"] = sprite.position
    ctx.collision_detection_program["check_size"] = sprite.width, sprite.height

    # Ensure the result buffer can fit all the sprites (worst case)
    buffer = ctx.collision_buffer
    if buffer.size < sprite_count * 4:
        buffer.orphan(size=sprite_count * 4)

    # Run the transform shader emitting sprites close to the configured position and size.
    # This runs in a query so we can measure the number of sprites emitted.
    with ctx.collision_query:
        sprite_list._geometry.transform(  # type: ignore
            ctx.collision_detection_program,
            buffer,
            vertices=sprite_count,
        )

    # Store the number of sprites emitted
    emit_count = ctx.collision_query.primitives_generated
    # print(
    #     emit_count,
    #     ctx.collision_query.time_elapsed,
    #     ctx.collision_query.time_elapsed / 1_000_000_000,
    # )

    # If no sprites emitted we can just return an empty list
    if emit_count == 0:
        return []

    # # Debug block for transform data to keep around
    # print("emit_count", emit_count)
    # data = buffer.read(size=emit_count * 4)
    # print("bytes", data)
    # print("data", struct.unpack(f'{emit_count}i', data))

    # .. otherwise build and return a list of the sprites selected by the transform
    return [
        sprite_list[i] for i in struct.unpack(f"{emit_count}i", buffer.read(size=emit_count * 4))
    ]

DragonMoffon · 2024-09-28T03:11:33Z

Yikes... so query is needed hahaha

einarf · 2024-09-29T08:24:09Z

We really need to to a pass on collision in the future. Remember that you can also specify the collision method to override the default behavior. IF you try to do collision in very large lists without spatial hash the gpu version will trigger. That's of course not always what you want but it's still a useful option.

einarf · 2024-12-09T09:18:30Z

Renamed this issue. It happens when a SpriteList don't have a spatial hash and the number of sprites surpass a certain amount. It's actually blazingly fast for a few collision checks on very large spritelists but with lots of collision checks you end up overfilling the render queue to such a degree that this option is counter-productive.

The collision checker will auto pick the collision mode unless you pass it a specific mode. We need to re-evaluate how this works in 3.x.x.

eschan145 · 2024-12-10T01:04:02Z

I was able to solve my problem of collision checks by using a spatial hash that updated periodically and using aabbs with C++ extension modules.

einarf · 2024-12-10T13:16:56Z

I was able to solve my problem of collision checks by using a spatial hash that updated periodically and using aabbs with C++ extension modules.

Great to hear. I'm suspecting the gpu collision checking was slower than using a spartial hash or simply specifying the collision method in the call to a pure python version? it would depend on the number of checks.

Still.. I agree your original findings was surprising and might not be a suitabe default in the library. The early confusion about what this issue was really about makes it clear.

eschan145 · 2024-12-10T17:08:05Z

I actually used a custom implementation of collision checking. I also used a timer to update the spatial hash map for my constantly moving entities.

Normally the spatial hash map is updated each time an entity moves -- however, because I know how much an entity can move each frame at most, I can set the spatial hash update rate to a suitable value to maximize performance and accuracy, in my case, 10 times each second.

Another thing I added to my custom collision checking is a limit parameter. If a user only needs to check with a few collisions with a sprite list, or only one, it is much cheaper to break out early if the limit for collision checking is reached without having to check with all of the other sprites in the list. Of course this also depends on the index of the sprite being checked, but it can reduce the amount of checks quite significantly if specified.

einarf changed the title ~~arcade.gl.query.__exit__ extremely slow~~ SpriteList without spatial hashing is slow for multiple collision tests when GPU transform is used Dec 9, 2024

einarf modified the milestones: 3.0.x, 3.1 Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpriteList without spatial hashing is slow for multiple collision tests when GPU transform is used #2363

SpriteList without spatial hashing is slow for multiple collision tests when GPU transform is used #2363

eschan145 commented Sep 11, 2024 •

edited

Loading

DragonMoffon commented Sep 24, 2024

Cleptomania commented Sep 24, 2024 •

edited

Loading

DragonMoffon commented Sep 24, 2024

einarf commented Sep 24, 2024

eschan145 commented Sep 24, 2024 •

edited

Loading

eschan145 commented Sep 27, 2024

DragonMoffon commented Sep 28, 2024

DragonMoffon commented Sep 28, 2024

DragonMoffon commented Sep 28, 2024

einarf commented Sep 29, 2024

einarf commented Dec 9, 2024

eschan145 commented Dec 10, 2024

einarf commented Dec 10, 2024

eschan145 commented Dec 10, 2024

SpriteList without spatial hashing is slow for multiple collision tests when GPU transform is used #2363

SpriteList without spatial hashing is slow for multiple collision tests when GPU transform is used #2363

Comments

eschan145 commented Sep 11, 2024 • edited Loading

Bug Report

System Info

Steps to reproduce/example code:

DragonMoffon commented Sep 24, 2024

Cleptomania commented Sep 24, 2024 • edited Loading

DragonMoffon commented Sep 24, 2024

einarf commented Sep 24, 2024

eschan145 commented Sep 24, 2024 • edited Loading

eschan145 commented Sep 27, 2024

DragonMoffon commented Sep 28, 2024

DragonMoffon commented Sep 28, 2024

DragonMoffon commented Sep 28, 2024

einarf commented Sep 29, 2024

einarf commented Dec 9, 2024

eschan145 commented Dec 10, 2024

einarf commented Dec 10, 2024

eschan145 commented Dec 10, 2024

eschan145 commented Sep 11, 2024 •

edited

Loading

Cleptomania commented Sep 24, 2024 •

edited

Loading

eschan145 commented Sep 24, 2024 •

edited

Loading