Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved intersection #13

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Improved intersection #13

wants to merge 3 commits into from

Conversation

psy-man
Copy link

@psy-man psy-man commented Jan 16, 2018

Hi @bcherny.

There is a little issue in intersection solution: if the right array has any duplicates, they will appear in the result.
intersection([1, 5, 4, 2], [8, 91, 4, 1, 3, 8, 91]); // [ 4, 1, 8, 91 ]

I rewrote this function to es6 using Set.

@bcherny
Copy link
Owner

bcherny commented Jan 18, 2018

@psy-man thanks for the contribution! Please add a test that would not have passed before, but does pass now.

@bcherny
Copy link
Owner

bcherny commented Jan 21, 2018

Thanks for updating the test. The bug you found is a real bug, but the solution could be improved.

If we're taking the intersection of 2 arrays of lengths M and N, the current solution runs in O(M+N) time. Your solution is substantially slower, and runs in O(M+M*N²) time. The reason is that Set#has and Set#set run in linear time (see https://www.ecma-international.org/ecma-262/6.0/#sec-set.prototype.has).

A more efficient solution might avoid Sets for their poor performance. If you want to update, I'll be happy to merge.

@psy-man
Copy link
Author

psy-man commented Jan 21, 2018

Hi @bcherny

I've created a few possible solutions and tested with 2 arrays: 4646 and 1103 lengths.
Could you check please Benchmarks

intersectionNew - fix for current solution
intersectionNewWithSet - solution with Set
intersectionNewSimple - solution without reduce and concat

For me, the solution with Sets was the most faster.
We also can improve performance via comparing the lengths of arrays and start with the smaller one.

I'll add a commit after your decision.

@DarkPurple141
Copy link

Is this a case of the average case actually nullifying the worse case big O?

I did read the link you provided but if it truly was the case that Set was outperformed on its most standard ops by {} it's hard to understand why ECMA would have introduced Map and Set at all.

From ECMA also:

Set objects must be implemented using either hash tables or other mechanisms that, on average, provide access times that are sublinear on the number of elements in the collection. The data structures used in this Set objects specification is only intended to describe the required observable semantics of Set objects. It is not intended to be a viable implementation model.

Here's a benchmark from three years ago: https://github.com/anvaka/set-vs-object

It seems that most browsers have optimised them beyond the performance of old school key in Object checks.

I ran across across a few different browsers, mostly seems like with the length optimisation (ie choose the smaller array to iterate through), Set performs best, with the IntersectionNewSimple second best (but not by much).

@givehug
Copy link

givehug commented Jul 26, 2019

You don't need second table/set

function intersection(left, right) {
	let seen = left.reduce((seen, item) => {
		seen[item] = true
		return seen
	}, {})

	return right.filter(item => {
		const include = item in seen
		delete seen[item]
		return include
	})
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants