diff --git a/docs/assets/images/FullFTRvsCPF.png b/docs/assets/images/FullFTRvsCPF.png new file mode 100644 index 0000000..0b64735 Binary files /dev/null and b/docs/assets/images/FullFTRvsCPF.png differ diff --git a/docs/assets/images/FullFTvsCPF.png b/docs/assets/images/FullFTvsCPF.png new file mode 100644 index 0000000..17a0a88 Binary files /dev/null and b/docs/assets/images/FullFTvsCPF.png differ diff --git a/docs/assets/images/ZoomedFTRvsCPF.png b/docs/assets/images/ZoomedFTRvsCPF.png new file mode 100644 index 0000000..d1ca924 Binary files /dev/null and b/docs/assets/images/ZoomedFTRvsCPF.png differ diff --git a/docs/assets/images/ZoomedFTvsCPF.png b/docs/assets/images/ZoomedFTvsCPF.png new file mode 100644 index 0000000..0bb6a4f Binary files /dev/null and b/docs/assets/images/ZoomedFTvsCPF.png differ diff --git a/docs/index.markdown b/docs/index.markdown index 9dbddeb..fec782d 100644 --- a/docs/index.markdown +++ b/docs/index.markdown @@ -13,15 +13,17 @@ title: "Massive Scale Collision Detection with Bevy" # What is this? -An example codebase showing implementation of GPU accelerated collision detection (narrow-phase only) and standard CPU collision detection. Designed to realistically test the performance differences between each technique. You can run the code yourself and see the difference, or read this article where I discuss the results of my comparison testing. +1. An example codebase showing implementation of GPU accelerated collision detection (narrow-phase only) and standard CPU collision detection. Designed to realistically test the performance differences between each technique. You can run the code yourself and see the difference, here is the repo: + [![Repo](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Sheldonfrith/gpu_accelerated_collision_detection) -## See the code: - -- link to github repo here with github icon +2. A report on the results of comparison testing using the above codebase. Here is the report... # TLDR -GPU acceleration is not hard to implement and can provide major performance increases over standard CPU collision detection. Significant improvements start at around 15k collisions per "frame" ("frame" = "iteration"/"step" if you are running a simulation and not a game) and reaching up to a 50% performance improvement for between 40k and 200k collisions per frame, with the improvements plateauing after that point. Current version of the code shows slight decreases in improvements for larger scales (25% at 10 million collisions per frame), but there are optimizations that can be done to the algorithm which would likely eliminate that decline with scale. +- GPU acceleration can provide major performance increases over CPU-based collision detection. + +- Significant improvements start at around 15k collisions per frame/iteration/step and reaching up to a 50% performance improvement for between 40k and 200k collisions per frame, with the improvements plateauing after that point. +- Current version of the code shows slight decreases in improvements for larger scales (25% at 10 million collisions per frame), but there are optimizations that can be done to the algorithm which would likely eliminate that decline with scale. ## Who this is useful for @@ -33,35 +35,52 @@ Especially if you are already using the [Bevy engine](https://bevyengine.org/), ## Rationale for Creating -I needed performant collision detection for a much larger scale than normal (hundreds of thousands of simultaneously colliding entities at least). I tried popular existing collision detection solutions (like Avian, Rapier) but my brief testing indicated they probably weren't built for massive simulations like I was working with (although they work great for most game applications). And I only needed collision detection, not a physics engine. +I needed performant collision detection for a much larger scale than normal; hundreds of thousands of simultaneously colliding entities, at least. I tried popular existing collision detection solutions, but my brief testing indicated that their performance was unacceptable for the scales I required. + +# Performance Results: + +### Raw Frame Time: + +![Full Frame Time vs Collisions per Frame Comparison Graph](/assets/images/FullFTvsCPF.png) + +And here is a zoomed version to show the critical point where GPU acceleration becomes valuable: +![Zoomed Frame Time vs Collisions per Frame Comparison Graph](/assets/images/ZoomedFTvsCPF.png) + +### % Frame Time Reduction using GPU: + +![Full Frame Time Reduction vs Collisions per Frame Graph](/assets/images/FullFTRvsCPF.png) +_Note the logarithmic scale of the x axis ABOVE._ + +And here is a zoomed version to show in more detail the point where GPU acceleration becomes valuable (NOT log scale): +![Zoomed Frame Time Reduction vs Collisions per Frame Graph](/assets/images/ZoomedFTRvsCPF.png) -# Caveats +# Caveats: - This technique will probably not provide benefits for web-based applications that do not have low level GPU access. -- If you are using this for a videogame that already has very intensive graphics, there might not be enough extra capacity on the GPU to handle this method. +- If you are using this for a videogame that already has _very_ intensive graphics, there might not be enough extra capacity on the GPU to handle this method. However for most games the extra GPU usage shouldn't be an issue. Collision detection for 160k collisions per frame, for example, used only about 7% of my GPU capacity (RTX 3070 laptop version). -# Narrow vs Broad Phase +## Narrow vs Broad Phase -As mentioned above this code is only for NARROW-PHASE of collision detection. If performance is an issue, you should first prioritize implementing a performant broad-phase to your collision detection as this is an easier way of making big performance gains. And for truly massive simulations a broad phase is required, because of the practical limits of narrow-phase collosion detection. +See [narrow vs broad phase](https://developer.mozilla.org/en-US/docs/Games/Techniques/2D_collision_detection#collision_performance). -#### Limits of Narrow-Only collision detection +This technique is only for **narrow-phase** of collision detection. If performance is an issue, you should first prioritize implementing a performant broad-phase to your collision detection as this is an easier way of making big performance gains. And for truly massive simulations a broad phase is required, because of the practical limits of narrow-phase collosion detection. -Using this program to test we can see that if you are using collision detection in a game if you are getting about 500k collisions per frame performance drops to between 10-20 fps under ideal conditions. So if you're game needs to handle that many collisions you have to implement some sort of broad-phase. +## Practical Upper Limits on Collisions/Frame -Even if you are running a scientific simulation and dont care about fps, you will still benefit greatly from implementing a broad-phase collision detection pass. +- around 200-300k collisions per frame for videogame applications. Total collisions can be much higher if you also implement broad-phase filtering (see above). +- tens or hundreds of millions of collisions per frame, limited only by the RAM available for storing all of the collision pairs. -## Discussion +# Further Performance Improvements -- Main reason GPU acceleration doesn't work as collision detection service is that number of collisions is unknown, but we have to pre-allocate memory when working with the GPU, leading to a lot of waste and slowdown -- Vs the CPU where we do not have to preallocate memory, so we only end up using the amount of memory necessary to hold the correct number of collisions +- The main waste with GPU based collision detection is having to pre-allocate a lot of memory which we don't actually end up using, since we don't know ahead of time the number of collisions that will be detected. Testing indicates anything that can be done to pre-estimate the number of collisions that will be detected yields large performance improvements (this is the purpose of the `max_detectable_collisions_scale` variable in the GPU code, but a lot of improvements can still be made to that part of the code). +- A major bottleneck is the render device's maximum storage buffer size. If there is a way to safely increase this buffer size limit, performance can be dramatically improved. (This may be hardware limited, I haven't had the time to look into it yet.) +- Switch to integer positions and integer math instead of floating point. This requires client code to use integer positions, which is less convenient, which is why the code currently uses floating point positions. +- For simulations, running batches in parallel may be possible as a method to utilize more of the GPU. -## How to improve GPU accelerated collision performance: +## Inlining -- If the maximum storage buffer size was much larger, this could be improved significantly -- Switch to integer positions and integer math instead of floating point -- If most or all of the simulation logic (movements and reactions to collisions) were moved to the GPU performance would improve and buffer size would not be a bottleneck -- run batches in parallel, since generally GPU is very underutilized, however this method requires some work to avoid stack overflows and memory shortages +You may notice that simply combining the collision processing with the collision detection can dramatically improve the CPU algorithm's speed so that it is actually faster than the GPU method. However we have to keep in mind that **the same thing can be said for the GPU algorithm**. If we also put collision processing directly onto the GPU we will also gain dramatic performance improvements. -## How to improve CPU accelerated collision performance: +If you are trying to get the absolute best possible performance in your application you will probably have to use this strategy, but otherwise you should avoid it because it creates highly coupled, difficult to maintain code. -- In the same way that moving more logic onto the GPU would improve performance by decreasing data transfer and memory allocation costs, inlining logic on the CPU side has the same benefits. This requires prior knowledge of the entire simulation, and leads to tightly coupled, non-reusable code. But the performance gains are very significant. +I have not done this for either CPU or GPU because I want this test to be representative of the general case, where we don't know what exactly the client is going to do with the collisions detected. diff --git a/output.json b/output.json index fa02592..c2214b3 100644 --- a/output.json +++ b/output.json @@ -571,4 +571,4 @@ "total_frames": 2500, "entities_spawned": 392 } -] \ No newline at end of file +] diff --git a/run_config.json b/run_config.json index c0eda2b..924a4ba 100644 --- a/run_config.json +++ b/run_config.json @@ -1,12 +1,12 @@ { - "bottom_left_x": -1, - "bottom_left_y": -1, - "top_right_x": 1, - "top_right_y": 1, - "sensor_radius": 21, - "body_radius": 3, - "rng_seed": 1, - "num_frames_to_test": 2500, - "use_gpu": false, - "path_to_output_json": "./output.json" - } \ No newline at end of file + "bottom_left_x": -10, + "bottom_left_y": -10, + "top_right_x": 10, + "top_right_y": 10, + "sensor_radius": 21, + "body_radius": 3, + "rng_seed": 1, + "num_frames_to_test": 1000, + "use_gpu": true, + "path_to_output_json": "./output.json" +} diff --git a/script.ipynb b/script.ipynb index be37460..a198b37 100644 --- a/script.ipynb +++ b/script.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 2, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -26,208 +26,202 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# test cases\n", "\n", "test_cases = [\n", - " # TestCase(\n", - " # width=3,\n", - " # height=3,\n", - " # frames=1000,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=12,\n", - " # height=12,\n", - " # frames=400,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=24,\n", - " # height=24,\n", - " # frames=200,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=40,\n", - " # height=40,\n", - " # frames=100,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=60,\n", - " # height=60,\n", - " # frames=20,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=80,\n", - " # height=80,\n", - " # frames=6,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=100,\n", - " # height=100,\n", - " # frames=4,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=120,\n", - " # height=120,\n", - " # frames=3,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=160,\n", - " # height=160,\n", - " # frames=3,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=18,\n", - " # height=18,\n", - " # frames=1000,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=13,\n", - " # height=13,\n", - " # frames=1500,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=15,\n", - " # height=15,\n", - " # frames=1400,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=16,\n", - " # height=16,\n", - " # frames=1700,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=17,\n", - " # height=17,\n", - " # frames=300,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=8,\n", - " # height=8,\n", - " # frames=3000,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=15,\n", - " # height=15,\n", - " # frames=300,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=16,\n", - " # height=16,\n", - " # frames=300,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=15,\n", - " # height=15,\n", - " # frames=5000,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=16,\n", - " # height=16,\n", - " # frames=5000,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=13,\n", - " # height=13,\n", - " # frames=4000,\n", - " # sensor_radius=5,\n", - " # body_radius=40\n", - " # ),\n", - " # TestCase(\n", - " # width=18,\n", - " # height=18,\n", - " # frames=2000,\n", - " # sensor_radius=5,\n", - " # body_radius=40\n", - " # ),\n", - " # TestCase(\n", - " # width=40,\n", - " # height=40,\n", - " # frames=5,\n", - " # sensor_radius=5,\n", - " # body_radius=40\n", - " # ),\n", - " # TestCase(\n", - " # width=9,\n", - " # height=9,\n", - " # frames=4000,\n", - " # sensor_radius=5,\n", - " # body_radius=40\n", - " # ),\n", - " # TestCase(\n", - " # width=10,\n", - " # height=10,\n", - " # frames=3500,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=11,\n", - " # height=11,\n", - " # frames=3000,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " # TestCase(\n", - " # width=14,\n", - " # height=14,\n", - " # frames=2500,\n", - " # sensor_radius=21,\n", - " # body_radius=3\n", - " # ),\n", - " TestCase(\n", - " width=2,\n", - " height=2,\n", + " TestCase(\n", + " width=3,\n", + " height=3,\n", + " frames=1000,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=12,\n", + " height=12,\n", + " frames=400,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=24,\n", + " height=24,\n", + " frames=200,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=40,\n", + " height=40,\n", + " frames=100,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=60,\n", + " height=60,\n", + " frames=20,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=80,\n", + " height=80,\n", + " frames=6,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=100,\n", + " height=100,\n", + " frames=4,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=120,\n", + " height=120,\n", + " frames=3,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=160,\n", + " height=160,\n", + " frames=3,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=18,\n", + " height=18,\n", + " frames=1000,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=13,\n", + " height=13,\n", + " frames=1500,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=15,\n", + " height=15,\n", + " frames=1400,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=16,\n", + " height=16,\n", + " frames=1700,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=17,\n", + " height=17,\n", + " frames=300,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=8,\n", + " height=8,\n", + " frames=3000,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=15,\n", + " height=15,\n", + " frames=300,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=16,\n", + " height=16,\n", + " frames=300,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=15,\n", + " height=15,\n", + " frames=5000,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=16,\n", + " height=16,\n", + " frames=5000,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=13,\n", + " height=13,\n", + " frames=4000,\n", + " sensor_radius=5,\n", + " body_radius=40\n", + " ),\n", + " TestCase(\n", + " width=18,\n", + " height=18,\n", + " frames=2000,\n", + " sensor_radius=5,\n", + " body_radius=40\n", + " ),\n", + " TestCase(\n", + " width=40,\n", + " height=40,\n", + " frames=5,\n", + " sensor_radius=5,\n", + " body_radius=40\n", + " ),\n", + " TestCase(\n", + " width=9,\n", + " height=9,\n", + " frames=4000,\n", + " sensor_radius=5,\n", + " body_radius=40\n", + " ),\n", + " TestCase(\n", + " width=10,\n", + " height=10,\n", + " frames=3500,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=11,\n", + " height=11,\n", + " frames=3000,\n", + " sensor_radius=21,\n", + " body_radius=3\n", + " ),\n", + " TestCase(\n", + " width=14,\n", + " height=14,\n", " frames=2500,\n", " sensor_radius=21,\n", " body_radius=3\n", " ),\n", + " \n", "]" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -283,7 +277,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -292,20 +286,6 @@ "text": [ "Running test case 1 of 1\n" ] - }, - { - "ename": "CalledProcessError", - "evalue": "Command '['cargo', 'run', '--release']' returned non-zero exit status 101.", - "output_type": "error", - "traceback": [ - "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[1;31mCalledProcessError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[1;32mIn[16], line 2\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;66;03m# run\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m \u001b[43mrun_tests\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m./output.json\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n", - "Cell \u001b[1;32mIn[4], line 22\u001b[0m, in \u001b[0;36mrun_tests\u001b[1;34m(rng_seed, path_to_output)\u001b[0m\n\u001b[0;32m 20\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m i, test_case \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(test_cases):\n\u001b[0;32m 21\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mRunning test case \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mi\u001b[38;5;241m+\u001b[39m\u001b[38;5;241m1\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m of \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(test_cases)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m---> 22\u001b[0m \u001b[43mrun_test_one_side\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrng_seed\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpath_to_output\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtest_case\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 23\u001b[0m run_test_one_side(\u001b[38;5;28;01mTrue\u001b[39;00m, rng_seed, path_to_output, test_case)\n\u001b[0;32m 24\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mTest case \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mi\u001b[38;5;241m+\u001b[39m\u001b[38;5;241m1\u001b[39m\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m done\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", - "Cell \u001b[1;32mIn[4], line 46\u001b[0m, in \u001b[0;36mrun_test_one_side\u001b[1;34m(use_gpu, rng_seed, path_to_output, test_case)\u001b[0m\n\u001b[0;32m 33\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_config.json\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mw\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[0;32m 34\u001b[0m f\u001b[38;5;241m.\u001b[39mwrite(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\"\"\u001b[39m\u001b[38;5;130;01m{{\u001b[39;00m\n\u001b[0;32m 35\u001b[0m \u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbottom_left_x\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m: -\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mhalf_width\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m,\u001b[39m\n\u001b[0;32m 36\u001b[0m \u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbottom_left_y\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m: -\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mhalf_height\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m,\u001b[39m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 44\u001b[0m \u001b[38;5;124m \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mpath_to_output_json\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m: \u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mpath_to_output\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 45\u001b[0m \u001b[38;5;124m \u001b[39m\u001b[38;5;130;01m}}\u001b[39;00m\u001b[38;5;124m\"\"\"\u001b[39m)\n\u001b[1;32m---> 46\u001b[0m \u001b[43msubprocess\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcargo\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mrun\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m--release\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcheck\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n", - "File \u001b[1;32mC:\\Program Files\\WindowsApps\\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\\Lib\\subprocess.py:571\u001b[0m, in \u001b[0;36mrun\u001b[1;34m(input, capture_output, timeout, check, *popenargs, **kwargs)\u001b[0m\n\u001b[0;32m 569\u001b[0m retcode \u001b[38;5;241m=\u001b[39m process\u001b[38;5;241m.\u001b[39mpoll()\n\u001b[0;32m 570\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m check \u001b[38;5;129;01mand\u001b[39;00m retcode:\n\u001b[1;32m--> 571\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m CalledProcessError(retcode, process\u001b[38;5;241m.\u001b[39margs,\n\u001b[0;32m 572\u001b[0m output\u001b[38;5;241m=\u001b[39mstdout, stderr\u001b[38;5;241m=\u001b[39mstderr)\n\u001b[0;32m 573\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m CompletedProcess(process\u001b[38;5;241m.\u001b[39margs, retcode, stdout, stderr)\n", - "\u001b[1;31mCalledProcessError\u001b[0m: Command '['cargo', 'run', '--release']' returned non-zero exit status 101." - ] } ], "source": [ @@ -315,14 +295,14 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "C:\\Users\\sheld\\AppData\\Local\\Temp\\ipykernel_44744\\244706675.py:34: SettingWithCopyWarning: \n", + "C:\\Users\\sheld\\AppData\\Local\\Temp\\ipykernel_14440\\244706675.py:34: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", @@ -364,17 +344,65 @@ " # Calculate percentage difference\n", " pct_diff = -1* ((gpu_val - cpu_val) / cpu_val) * 100\n", " # add to df\n", - " df_gpu[\"pct_diff\"].iloc[i] = pct_diff" + " df_gpu[\"pct_diff\"].iloc[i] = pct_diff\n", + " \n", + " \n" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "def deduplicate_clusters(df, cluster_width=5000):\n", + " \"\"\"\n", + " Remove all but one entry from each cluster, keeping the entry with the highest 'frames' value.\n", + " Clusters are defined as groups of entries where the difference in collisions_per_frame is less than cluster_width.\n", + " \n", + " Parameters:\n", + " df (pandas.DataFrame): DataFrame containing 'collisions_per_frame' and 'frames' columns\n", + " cluster_width (float): Maximum difference in collisions_per_frame to be considered part of the same cluster\n", + " \n", + " Returns:\n", + " pandas.DataFrame: DataFrame with only one entry per cluster\n", + " \"\"\"\n", + " df_sorted = df.sort_values('collisions_per_frame')\n", + " \n", + " # Initialize cluster labels\n", + " current_cluster = 0\n", + " cluster_labels = []\n", + " current_cluster_start = df_sorted['collisions_per_frame'].iloc[0]\n", + " \n", + " # Assign cluster labels\n", + " for value in df_sorted['collisions_per_frame']:\n", + " if value - current_cluster_start > cluster_width:\n", + " current_cluster += 1\n", + " current_cluster_start = value\n", + " cluster_labels.append(current_cluster)\n", + " \n", + " df_sorted['cluster'] = cluster_labels\n", + " result = df_sorted.loc[df_sorted.groupby('cluster')['frames'].idxmax()]\n", + " result = result.drop('cluster', axis=1).sort_index()\n", + " \n", + " return result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 30, "metadata": {}, "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -385,16 +413,23 @@ ], "source": [ "# plot % difference\n", - "n = 1000000000 # for example\n", + "n = 100000 # for example\n", "\n", "# Filter the DataFrame for collisions_per_frame < n\n", "df_gpu_filtered = df_gpu[df_gpu[\"collisions_per_frame\"] < n]\n", + "\n", + "df_gpu_filtered_deduped = deduplicate_clusters(df_gpu_filtered, cluster_width=500)\n", + "# reindex\n", + "df_gpu_filtered_deduped = df_gpu_filtered_deduped.reset_index(drop=True)\n", + "# sort again\n", + "df_gpu_filtered_deduped = df_gpu_filtered_deduped.sort_values('collisions_per_frame')\n", "plt.figure()\n", - "plt.scatter(df_gpu_filtered[\"collisions_per_frame\"], df_gpu_filtered[\"pct_diff\"], label=\"Gpu\")\n", + "plt.plot(df_gpu_filtered_deduped[\"collisions_per_frame\"], df_gpu_filtered_deduped[\"pct_diff\"], label=\"Gpu\")\n", "# plt.xscale('log')\n", "plt.xlabel(\"Collisions per frame\")\n", - "plt.ylabel(\"% Difference in frame time\")\n", - "plt.legend()\n", + "plt.ylabel(\"% Reduction in frame time\")\n", + "plt.title(\"Reduction in frame time vs. Collisions per frame\")\n", + "# plt.legend()\n", "plt.show()\n", "\n" ] @@ -451,6 +486,9 @@ "\n", "# Your existing data loading code...\n", "\n", + "# deduplicate_clusters\n", + "df_gpu_deduped = deduplicate_clusters(df_gpu, cluster_width=500)\n", + "\n", "plt.figure(figsize=(12, 6)) # Make the figure a bit wider to accommodate annotations\n", "\n", "# Plot the original lines\n", diff --git a/src/collision_detection_performance_test.rs b/src/collision_detection_performance_test.rs index 17161d2..f10f138 100644 --- a/src/collision_detection_performance_test.rs +++ b/src/collision_detection_performance_test.rs @@ -18,6 +18,7 @@ use crate::{ entity_movement::{move_entities_deterministic, setup_position_cache}, entity_spawning::spawn_entities, graphics::plugin::GraphicsPlugin, + headless_entity_spawning::spawn_entities_headless, performance::{PerformanceMetrics, track_performance_and_exit}, }; diff --git a/src/headless_entity_spawning.rs b/src/headless_entity_spawning.rs new file mode 100644 index 0000000..e4416a2 --- /dev/null +++ b/src/headless_entity_spawning.rs @@ -0,0 +1,48 @@ +use bevy::{ + asset::{Assets, RenderAssetUsages}, + log, + math::{Vec2, Vec3, bounding::BoundingCircle}, + prelude::{Commands, Mesh, Mesh2d, Res, ResMut, Transform}, + sprite::{ColorMaterial, MeshMaterial2d}, + utils::default, +}; + +use crate::{ + components_and_resources::{BoundingCircleComponent, EntitiesSpawned, Sensor}, + config::RunConfig, + graphics::colors_and_handles::{AvailableColor, ColorHandles}, +}; + +pub fn spawn_entities_headless(mut commands: Commands, run_config: Res) { + let mut count = 0; + for x in run_config.bottom_left_x..run_config.top_right_x { + for y in run_config.bottom_left_y..run_config.top_right_y { + spawn_body_headless(x as f32, y as f32, run_config.body_radius, &mut commands); + spawn_sensor_headless(x as f32, y as f32, run_config.sensor_radius, &mut commands); + count += 2; + } + } + log::info!("total of {} entities spawned", count); + commands.insert_resource(EntitiesSpawned(count)); +} + +fn spawn_body_headless(x: f32, y: f32, radius: f32, commands: &mut Commands) { + commands.spawn(( + Transform { + translation: Vec3::new(x, y, 0.0), + ..default() + }, + BoundingCircleComponent(BoundingCircle::new(Vec2::new(x, y), radius)), + )); +} + +fn spawn_sensor_headless(x: f32, y: f32, radius: f32, commands: &mut Commands) { + commands.spawn(( + Sensor {}, + Transform { + translation: Vec3::new(x, y, 0.0), + ..default() + }, + BoundingCircleComponent(BoundingCircle::new(Vec2::new(x, y), radius)), + )); +} diff --git a/src/main.rs b/src/main.rs index 17fd6bb..dc3bc6f 100644 --- a/src/main.rs +++ b/src/main.rs @@ -14,8 +14,10 @@ pub mod entity_movement; pub mod entity_spawning; pub mod gpu_collision_detection; pub mod graphics; +pub mod headless_entity_spawning; pub mod helpers; pub mod performance; + fn main() { let path_to_run_config_json = "./run_config.json"; let run_config = serde_json::from_str::(