Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Unimplement Map from CharSequenceMap to obey contract #11704

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

findepi
Copy link
Member

@findepi findepi commented Dec 5, 2024

CharSequenceMap did not follow Map contract fully

  • Map.keySet is supposed to be a view, but CharSequenceMap returned a copy. Fixing this one can be hard given how CharSequenceMap is implemented internally.
  • Map.equals should compare maps by value. I.e. different map implementations with same key/value pairs are supposed to be equal. CharSequenceMap could certainly implement this correctly, but it does not today. The benefit of doing so is unclear and time necessary to do so correctly is non-negligible (we would need to define semantics what to when CharSequenceMap is compared to a map with more keys, but such that each key compared as char sequences is equal to some key in the CharSequenceMap).

Also, having a Map implementation that does not compare the keys by equality is error-prone:

  • it's likely that new default Map interface methods assume equality-based comparison semantics. JDK maintains non-equality based Map implementation (Comparator-based and Identity-based), and it surely doesn't go without maintenance burden (and not without bugs either, e.g. https://bugs.openjdk.org/browse/JDK-8178355)
  • it's not unlikely someone tries to make a defensive copy of a map with ImmutableMap.copyOf or Map.copyOf, resulting in a map with different semantics.

Instead of fixing the CharSequenceMap to obey the Map contract, it seems more pragmatic to just stop implementing the Map interface, which this commit does, resulting in a simple to understand class contract and behavior.
This PR provides a copy of CharSequenceMap, PathMap, with Map interface removed.

Additional positive side-effects of the change

  • less code and easier to understand contracts
  • strongly typed get and containsKey methods (the need to accept Object in these was forced by Map interface).

CharSequenceSet has similar issues, to be addressed in a follow-up.

cc @anuragmantri @aokolnychyi @amogh-jahagirdar @nastra

@findepi findepi force-pushed the findepi/core-unimplement-map-from-charsequencemap-to-obey-contract-491472 branch from 2c5fe4c to 02cb792 Compare December 5, 2024 08:12
@findepi
Copy link
Member Author

findepi commented Dec 5, 2024

BTW Guava has a concept of Equivalence which is a more generic version of our purpose-built CharSequenceWrapper. Guava lacks collections that use Equivalence transparently. There must be good reasons for this. I didn't find a longer explanation, but here is a short note: google/guava#576 (comment)

.palantir/revapi.yml Outdated Show resolved Hide resolved
@github-actions github-actions bot added the data label Dec 7, 2024
.palantir/revapi.yml Outdated Show resolved Hide resolved
@findepi findepi force-pushed the findepi/core-unimplement-map-from-charsequencemap-to-obey-contract-491472 branch from 53b446f to 1c03a14 Compare December 10, 2024 09:17
@findepi findepi requested review from nastra and Fokko December 10, 2024 09:19
@findepi findepi added this to the Iceberg 1.8.0 milestone Dec 10, 2024
@findepi findepi force-pushed the findepi/core-unimplement-map-from-charsequencemap-to-obey-contract-491472 branch from 1c03a14 to fca920d Compare December 10, 2024 21:55
`CharSequenceMap` did not follow `Map` contract fully

- `Map.keySet` is supposed to be a view, but `CharSequenceMap` returned
  a copy. Fixing this one can be hard given how `CharSequenceMap` is
  implemented internally.
- `Map.equals` should compare maps by value. I.e. different map
  implementations with same key/value pairs are supposed to be equal.
  `CharSequenceMap` could certainly implement this correctly, but it
  does not today. The benefit of doing so is unclear and time necessary
  to do so correctly is non-negligible (we would need to define
  semantics what to when `CharSequenceMap` is compared to a map with
  more keys, but such that each key compared as char sequences is equal
  to some key in the `CharSequenceMap`).

Also, having a Map implementation that does not compare the keys by
equality is error-prone:

- it's likely that new default Map interface methods assume
  equality-based comparison semantics. JDK maintains non-equality based
  Map implementation (Comparator-based and Identity-based), and it
  surely doesn't go without maintenance burden (and not without bugs
  either, e.g. https://bugs.openjdk.org/browse/JDK-8178355)
- it's not unlikely someone tries to make a defensive copy of a map with
  `ImmutableMap.copyOf` or `Map.copyOf`, resulting in a map with
  different semantics.

Instead of fixing the `CharSequenceMap` to obey the `Map` contract, it
seems more pragmatic to just stop implementing the `Map` interface,
which this commit does, resulting in a simple to understand class
contract and behavior. This commit provides a copy of `CharSequenceMap`,
`PathMap`, with `Map` interface removed.

Additional positive side-effects of the change

- less code and easier to understand contracts
- strongly typed `get` and `containsKey` methods (the need to accept
  `Object` in these was forced by `Map` interface).

`CharSequenceSet` has similar issues, to be addressed in a follow-up.

Rename CharSequenceMap to PathMap

Restore original CharSequenceMap

Deprecate CharSequenceMap
@findepi findepi force-pushed the findepi/core-unimplement-map-from-charsequencemap-to-obey-contract-491472 branch from fca920d to 6e99c84 Compare December 11, 2024 10:19
@findepi
Copy link
Member Author

findepi commented Dec 12, 2024

@nastra PTAL. after this lands i'd like to fix the CharSequenceSet class too.

@nastra
Copy link
Contributor

nastra commented Dec 12, 2024

@nastra PTAL. after this lands i'd like to fix the CharSequenceSet class too.

Sorry for the delay. I'm a bit overloaded currently with reviews, so this might me take a while to get back to. Hopefully someone else can find some time to review this

@findepi
Copy link
Member Author

findepi commented Dec 16, 2024

@nastra @Fokko can you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants