-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OAK-11602 : removed usage of Guava's ImmutableSet.copyOf with LinkedSet #2178
base: trunk
Are you sure you want to change the base?
Conversation
Commit-Check ✔️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this replacement is too dangerous. It changes significantly the semantics and performance characteristics.
- Changes mutability: LinkedHashSet is not immutable.
- Changes performance: the sets created by ImmutableSet.copyOf() are highly optimized. There are special implementations for empty and one-element sets. And the implementation for general sized sets is also optimised, taking advantage of the set being immutable. It stores the elements in an array, which is more compact and faster to access than in a LinkedHashSet.
This might have unintended consequences in performance and even in correctness.
The tricky questions remain: should we review the code to check what semantics are really needed (a bit a dangerous, does not scale well, but may clarify the code). should we check whether performance really matters? FWIW, when I replaced other collection utilities, I did exacly that, and apparently did not break anything. See, for instance, https://issues.apache.org/jira/browse/OAK-11278. Note that in this case, I did the zero/one/more variants separately (evidently zero or one arguments are easier to argue about). |
Regarding mutability: there are many cases where it's trivial to see that it does not matter. For instance, when the set is just used once in a chain of collection transformations. |
As these changes are intended to be purely mechanical, I don't think we should be looking at how the objects are used. That does not scale and it is very error-prone, as we may miss some hidden assumptions about the behaviour of the APIs that are being changed. The safest way would be to preserve the semantics and performance characteristics as much as possible. So in this case, the Set should still be immutable and should have similar performance and memory usage. Replacing the Guava sets by the sets created by
This PR replaces |
I believe that these comments are given on the assumption that all the changes are mechanical which is not true. I checked the usage before making this change. I also checked the previous PR, where we removed the Second, about the performance, we are only using the In my implementation, I am simply creating a Regarding Immutability, there is nothing in JDK or Apache's We could overcome this (case by case basis) with either But just to remove any doubt, I would again do a thorough review of all the cases and wrap it inside an unmodifiable wrapper in case of any doubt. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a solution that:
- Exactly matches the behavior of ImmutableSet.copyOf (specially being read-only, behavior of null entries, concurrency).
- Have similar performance (measurably). Here a simple micro-benchmark would help; it doesn't need to be a unit test.
- Have similar memory usage (measured). I don't think we need a unit test for that; instead, we could just test with a fixed set of e.g. 100 million entries, and see if it's running OOME at the same memory setting.
Also, I would probably not change all 35 files at the same time, but maybe half of those (so in two steps) with one week delay. So in case we break something, we know more exactly where.
@nfsantos @thomasmueller thanks for the review. I did try to incorporate some of your review comments in 9876048. However, I agree with @thomasmueller that we should even split this PR into smaller chunks. @thomasmueller does the following sound reasonable to you:
|
…e sending/receiving it to outer world
@thomasmueller @nfsantos Please find the benchmark below: Each method creates a Set from Iterables and iterates over it.
|
3746b8b
to
94e7383
Compare
@@ -86,7 +87,7 @@ protected void beforeSuite() throws Exception { | |||
createForEachPrincipal(principal, acMgr, allPrivileges); | |||
|
|||
adminSession.save(); | |||
nodeSet = ImmutableSet.copyOf(nodePaths); | |||
nodeSet = SetUtils.toLinkedSet(nodePaths); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a test case, so no need for Immutability here.
Thanks for the benchmarks. Could you do similar benchmarks for lookups? Fast lookups are one of the main reasons to use a set instead of a list, so I would imagine that most Sets created by Oak will be heavily used for lookups. Therefore I think lookup performance is more important than traversal performance. |
@@ -34,7 +35,7 @@ public PathElementComparator() { | |||
} | |||
|
|||
public PathElementComparator(Iterable<String> preferredPathElements) { | |||
this.preferred = ImmutableSet.copyOf(preferredPathElements); | |||
this.preferred = Collections.unmodifiableSet(SetUtils.toLinkedSet(preferredPathElements)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we do not need to preserve insertion order. Sets.of()
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with Sets.of
is that, we need to make sure that we don't even pass null
to Set.contains()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we don't pass null
(this is indexing code, I am familiar with it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set.of()
doesn't accept Iterable
and Set.copyOf
only accepts Collection
.
|
Nope. That's not what we planned. The focus was on replacing Guava methods with standard JDK calls wherever possible. |
@nfsantos benchmark for creating set from Iterable + iterating it + calling
|
No description provided.