Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: rename std.AutoHashMap to std.HashMap #22155

Closed
robbielyman opened this issue Dec 5, 2024 · 5 comments
Closed

proposal: rename std.AutoHashMap to std.HashMap #22155

robbielyman opened this issue Dec 5, 2024 · 5 comments
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@robbielyman
Copy link

robbielyman commented Dec 5, 2024

This proposal stems from the meat of a blog post I wrote while reflecting on what I learned from Rust and from Zig doing Advent of Code problems. Everything relevant to the Zig project is reproduced below, but I wanted to flag the duplication.

One thing I love about Zig is that it has great default implementations of all of the standard container types.
They're just named poorly. One instance of poor name choices is already being corrected for Zig 0.14. I discuss that at the end of the blog post. Here I'd like to argue for another change towards better defaults.

In Zig, here is the signature of std.HashMap:

pub fn HashMap(comptime K: type, 
	comptime V: type, 
	comptime Context: type, 
	comptime max_load_percentage: u64,
) type { ... }

The Context type is responsible for actually hashing keys of type K and determining when two keys are the same.
You can write a Context type yourself without crying too much; for Advent of Code 2023, I did—for the simple and sad reason that this type has the best name, while the type that actually deserves the best name does not.

As of Zig 0.13, I will confidently assert that the type you want maybe 90% of the time you reach for a std.HashMap
is actually one of std.AutoHashMapUnmanaged(K, V), std.AutoArrayHashMapUnmanaged(K, V),
or in the particular case of string keys, std.StringHashMapUnmanaged or std.StringArrayHashMapUnmanaged.

The Unmanaged issue, as I alluded to earlier, is being fixed already. If the issue of the Auto prefix is not already being changed, I'd like to propose that it should be.

Here Auto means "use a best guess at a default hashing function". (Array means store keys and values contiguously so that iterating over them is fast and ergonomic. I read somewhere that even iterating once is enough to justify reaching for this type.) It's great that Zig allows you to swap out the default (or even std-provided) methods for hashing with your own. The "ground" types HashMapUnmanaged and ArrayHashMapUnmanaged are well thought through in terms of API design. My issue with them is that for a beginner, they shadow the correct type to reach for first.

I propose that current std.HashMap should become std.HashMapWithContext and that std.AutoHashMap should become std.HashMap. I'm happy to defer to the core team's expertise on how to manage the transition period.

Now, to be sure, the core of this proposal could be summarily dismissed as a "skill issue". On the one hand, reading documentation or the std library code quickly reveals the existence of the Auto... types. And on the other, learning to write reasonably useful hashing functions is a skill that will serve a budding programmer well.

I think this dismissal is a mistake. In my personal experience, Zig's extremely legible standard library code and thoughtful approach to default implementations has helped me grow as a programmer in ways I never anticipated. By providing friction in the right places, the language has steered me towards better design and more correct code. In this particular instance, I think juice, while worth an eventual squeeze, might come at an off-putting price for a beginner. To wit, I didn't learn to comfortably reach for hash maps from Zig; I learned it from Rust.

This is more of an addendum, but I'd like to also propose the addition of std.HashSet(T) as well, as in #6919. My reason for doing so, though, is different. For example, I would be happy to see a simple implementation like

pub fn std.HashSet(K) type {
    return std.ArrayAutoHashMapUnmanaged(K, void);
}

The reason I would like to ask for this addition to the standard library is not functional but pedagogical: again, realizing that you can implement a HashSet type by creating a HashMap type with a zero-sized value type is a beautiful realization that I don't want to deny a beginner Zig programmer; the challenge is that coming to that realization on one's own is nontrivial.

@marler8997
Copy link
Contributor

marler8997 commented Dec 5, 2024

I like this direction of using short names for wrappers that provide reasonable defaults while still providing access to the underlying configurable base type. I recall a recent comment from Andrew about the benefits of using "one word suffixes". So instead of HashMapWithContext, maybe something like HashMapBase?

C++ does something similar with it's std::string type, which is a wrapper around std::basic_string<...>, however the word "base" seems a more accurate descriptor to me.

@mlugg mlugg added standard library This issue involves writing Zig code for the standard library. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Dec 5, 2024
@mlugg mlugg added this to the unplanned milestone Dec 5, 2024
@mlugg mlugg added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Dec 5, 2024
@andrewrk
Copy link
Member

andrewrk commented Dec 5, 2024

@mlugg it's a bit of a contradiction to have it be proposal, unplanned, and open all at the same time, isn't it? If it's an unplanned proposal that means that we will release 1.0 before evaluating that proposal, which means it is Rejected.

@mlugg
Copy link
Member

mlugg commented Dec 5, 2024

Hm, I think we interpreted that milestone a little differently to each other; I took it to mean something along the lines of "we aren't putting a date on when we'll look at this". I'll let you adjust this milestone as you see fit.

To be fair, though, you've done the same thing :^)

@andrewrk
Copy link
Member

andrewrk commented Dec 5, 2024

"we aren't putting a date on when we'll look at this"

Yes that's a fair description - and my point is that if we don't look at it before 1.0 and it's breaking, then it's effectively rejected.

Thanks for the links - it's easy to make hasty mistakes when triaging.

Edit: actually I don't think I made a mistake in those links since those 3 things could be enhancements added after 1.0.

@mlugg mlugg modified the milestones: unplanned, 1.0.0 Dec 5, 2024
@andrewrk
Copy link
Member

andrewrk commented Dec 5, 2024

Anyway to get back on topic, thanks for the proposal @robbielyman - I have read it in full and can already confidently say "no" with the following reasoning:

It comes down to this:

"use a best guess at a default hashing function"

This means that AutoHashMap may choose an inappropriate hash or equality function for your data type. The "Auto" exists as a subtle reminder of that fact. The teenist bit of friction that you feel when you first look at HashMap and confronted with the fact that you do have to care about how your keys are hashed, is by design. The programmer must acknowledge that they are using an automatic, heuristic-based hashing function to avoid that work.

Finally, regarding HashSet - your suggestion would not work as-is because methods such as "put" would then require a void value extra argument which makes no sense if the type was constructed with that API but if it is constructed with void as value type then it makes sense. So overall we end up with less boilerplate.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2024
@andrewrk andrewrk modified the milestones: 1.0.0, 0.14.0 Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

4 participants