Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShareCommon.Object.hash assigns mismatching hashes to equal objects #5831

Open
3 tasks done
eric-wieser opened this issue Oct 25, 2024 · 2 comments · May be fixed by #5840
Open
3 tasks done

ShareCommon.Object.hash assigns mismatching hashes to equal objects #5831

eric-wieser opened this issue Oct 25, 2024 · 2 comments · May be fixed by #5840
Labels
bug Something isn't working P-low We are not planning to work on this issue

Comments

@eric-wieser
Copy link
Contributor

Prerequisites

Please put an X between the brackets as you perform the following steps:

Description

ShareCommon.Object.hash can assign mismatching hashes to objects considered equal.

Context

[Broader context that the issue occurred in. If there was any prior discussion on the Lean Zulip, link it here as well.]

Steps to Reproduce

import Lean

unsafe def badHash (s : String) : UInt64 := ShareCommon.Object.hash <| unsafeCast s

def test : IO Unit := do
  let s2 := "tes"
  let mut s := "test"
  s := s.extract ⟨0⟩ ⟨3⟩
  unless s = s2 do throw <| .userError "Not equal"
  unless unsafe (badHash s == badHash s2) do throw <| .userError "Differing hashes"

#eval test

Expected behavior: no errors
Actual behavior: hashes do not match

Versions

4.12.0-nightly-2024-10-24

Additional Information

I may have a patch for this tomorrow.

Impact

Add 👍 to issues you consider important. If others are impacted by this issue, please ask them to add 👍 to it.

@eric-wieser eric-wieser added the bug Something isn't working label Oct 25, 2024
@digama0
Copy link
Collaborator

digama0 commented Oct 25, 2024

I think this needs more context, it seems easy to define this as "not a bug" given that you have plainly used an unsafe function and also this works exactly as I would expect from something that hashes pointers.

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 25, 2024

I think it's worth pointing out both the cause and a deeper problem here: the hashing currently uses the (potentially uninitialized) capacity range of the string, instead of the (correct) initialized size range that contains the actual string data.

case LeanString: return lean_string_byte_size(o);

lean_string_byte_size is the amount of allocated memory, rather than the size of the actual string.

If the storage is indeed uninitialized, these reads actually cause undefined behaviour. But even if the behaviour is well-defined, you're hashing the semantically wrong thing. (Loosely speaking, you're hashing the object representation, not the value representation.)

The example is meant to demonstrate the case quickly, but I suppose one can imagine other, fully legitimate ways of creating two equal strings with different or uninitilized spare capacity. (Indeed, if the unused capacity's content was always entirely determined by the proper string data itself, one might wonder why a separate capacity exists in the first place.)

@leanprover-bot leanprover-bot added the P-low We are not planning to work on this issue label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-low We are not planning to work on this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants