Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map improve remove speed for large map and high percentage of element removed #608

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

yuxiaomao
Copy link
Collaborator

@yuxiaomao yuxiaomao commented Jul 19, 2023

This PR try to deal with the high execution time when removing large amount of entries for a large map. Still not very good at performance but at least some improvement.

What cause the problem as far as I understand: When removing a large amount of entries, map's freelist (the one containing all empty indexes) increase in size, and manipulating this list becomes costly when the map's memory is very fragmented.

The proposed solution defragment the map when the size of freelist is bigger than a threshold.
The defragment method is moving elements in map.
The threshold is chosen in experimentation, to do not increase the execution time when removing 10% of elements and/or 100% of elements. It is a very high threshold and is only impacting the performance when map size reach around 500,000.

Performance comparison before => after

  • Nearly no change below N=100,000. Time for N=100,000:
    • 10% remove: 0.0068 sec
    • 100% remove: 0.2151 sec
  • N=500,000
    • 10% remove: 0.1685 sec => 0.0727sec
    • 100% remove: 4.3643 sec => 0.7718 sec
  • N=1000000
    • 10% remove: 0.6612 sec => 0.1569 sec
    • 100% remove: 19.4425 sec => 1.200 sec
  • N=5000000
    • 10% remove: (Unknown) => 1.9806 sec
    • 100% remove: (Unknown) => 19.0631 sec

Test code:

	static function mapTest(N : Int, repeat : Int = 10) {
		trace("N", N, "repeat", repeat);
		var timeSet : Float = 0;
		var timeGet : Float = 0;
		var timeRemove1 : Float = 0;
		var timeAdd1 : Float = 0;
		var timeValid : Float = 0;
		var timeRemoveAll : Float = 0;

		for (r in 0...repeat) {
			// Prepare key array
			var arr : Array<String> = [];
			for (i in 0...N) {
				arr.push(i + String.fromCharCode(65+Std.random(26)));
			}

			// Prepare map
			var map:Map<String, Int> = [];

			var t0 = haxe.Timer.stamp();

			// Set
			for (i in 0...N) {
				map.set(arr[i], i);
			}
			var t1 = haxe.Timer.stamp();

			// Get
			var keys = map.keys();
			for (key in keys) {
				var a = map.get(key);
				a++;
			}
			var t2 = haxe.Timer.stamp();

			// Remove 1/10
			var N10 : Int = Std.int(N/10);
			var N5 : Int = Std.int(N/5);
			for (i in N10...N5) {
				map.remove(arr[i]);
			}
			var t3 = haxe.Timer.stamp();

			// Add 1/10
			for (i in N10...N5) {
				map.set(arr[i], i);
			}
			var t4 = haxe.Timer.stamp();

			// Validate
			for (i in 0...N){
				var a = map.get(arr[i]);
				if (a != i) trace("assert", a, i);
			}
			var t5 = haxe.Timer.stamp();

			// Remove all
			for (i in 0...N){
				var b = map.remove(arr[i]);
				if (!b) trace("assert remove do not exist", arr[i], i);
			}
			var t6 = haxe.Timer.stamp();

			timeSet += t1 - t0;
			timeGet += t2 - t1;
			timeRemove1 += t3 - t2;
			timeAdd1 += t4 - t3;
			timeValid += t5 - t4;
			timeRemoveAll += t6 - t5;
		}

		trace("timeSet      :", timeSet / repeat);
		trace("timeGet      :", timeGet / repeat);
		trace("timeRemove1  :", timeRemove1 / repeat);
		trace("timeAdd1     :", timeAdd1 / repeat);
		trace("timeValid    :", timeValid / repeat);
		trace("timeRemoveAll:", timeRemoveAll / repeat);
	}

This commit will resize down the map when its n < max*0.75, and reduce the remove execution time for a big but empty map.
@yuxiaomao yuxiaomao marked this pull request as draft July 20, 2023 06:24
@yuxiaomao yuxiaomao changed the title Map add resize down when nentries becomes small Map improve remove speed for large map and high percentage of element removed Jul 28, 2023
@yuxiaomao yuxiaomao marked this pull request as ready for review July 31, 2023 06:30
@yuxiaomao
Copy link
Collaborator Author

I just implemented the Robin Hood Hashing for Hashmap. (References : https://www.youtube.com/watch?v=4cTKiTQtS_k&t=893s , https://programming.guide/robin-hood-hashing.html)
Tested with the previously described script, it seems to greatly reduce remove time without scarifying set/get time, however it occupied 50% more memory space.

  • Removing all elements one by one for N= 500 000 only takes 0.029secs on my machine (compare to 0.041secs on JS, and 4.38secs with existing implementation, 0.772secs with previous optimization).
  • Removing all elements one by one for N= 5 000 000 only takes 0.453secs
  • Get/Set time seems to be reduced 5% to 20% in most cases

The overall performance is now comparable to javascript target - even faster in some situations.

@yuxiaomao yuxiaomao marked this pull request as draft February 21, 2024 14:32
@yuxiaomao
Copy link
Collaborator Author

TODO: after we're sure of this change, we'll need memdump + debugger support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant