Skip to content

Conversation

@imlvts
Copy link
Collaborator

@imlvts imlvts commented Sep 25, 2025

As previously discussed, this moves towards Zipper API to allow "blind" zippers, where the user can track the path.
Changes:
ZipperPath: New trait to return path as a single continuous slice.

  • fn path moved out of ZipperMoving to ZipperPath
  • fn move_to_path moved out of ZipperMoving to ZipperPath. it really requires having an existing path.

ZipperMoving:

  • descend_indexed_byte, descend_indexed_branch, descend_first_byte, to_next_sibling_byte, to_prev_sibling_byte now return Option<u8> indicating which byte we just descended to.
  • descend_until now accepts a buffer Option<&mut Vec<u8>> to push the newly descended path. Could have been Option<&mut impl Write>, but &mut Vec is just simpler.
  • ascend now returns the number of remaining steps as a Result<(), usize>. Maybe not the prettiest API, but forces the user to handle ascending above root. Maybe it's a good idea to separate it into a different function, and keep the original ascend the same.
  • ascend_until and ascend_until_branch now returns Option<usize>, the number of bytes ascended.
  • changing to_next_step to allow the caller to track path might require more thought, so it is left with the default implementation.

@luketpeterson
Copy link
Collaborator

A few questions / comments. Please don't take my perspective as authority; It's likely there is a good reason the code is the way it is that I'm just not seeing.

  • Is there any reason that ascend_until, and ascend_until_branch, etc. need the Option<usize>? Is seems that 0 is the same as None in this case.

  • ascend returning a Result is a hit to the usability of the API because you now need to call “unwrap” or assign it to _ to avoid the warning. Plus it strikes me as a little odd to return the number of steps “remaining” as opposed to the number of steps ascended, like the other ascend methods do. I think, if it were me, I would have ascend return the number of steps ascended, and then it would be possible to check success as if z.ascend(steps) == steps. But there might be something I’m not considering.

  • I don’t really like the name TrackPath for the zipper object. How about PathZipper? I don’t love PathZipper either, but it seems like TrackPath describes how the zipper is implemented internally, rather than its purpose in the API. Also it seems like a verb as opposed to an object. I’m open to other ideas?

@imlvts
Copy link
Collaborator Author

imlvts commented Oct 9, 2025

  • Is there any reason that ascend_until, and ascend_until_branch, etc. need the Option<usize>? Is seems that 0 is the same as None in this case.

Yes, it's possible to make it simpler. The point is to make the API harder to mis-use.
This is a general pattern, when you have different outcomes (failed to ascend / ascended), you return different enum variants. This makes it very explicit to the user on what happened, and forces the user to handle different outcomes.

  • Plus it strikes me as a little odd to return the number of steps “remaining” as opposed to the number of steps ascended

Yeah, I think changing it to the number of steps ascended is a good idea. I don't even remember why I chose this way, and I did consider returning the "number of steps ascended". Regardless of other changes, I think returning the number of bytes ascended is the right choice here.

I think, if it were me, I would have ascend return the number of steps ascended, and then it would be possible to check success as if z.ascend(steps) == steps

Yes, this would make a more slim API. the reason I used Result is similar to the reason I used Option previously. To make the user handle different cases.

Here's my understanding of tradeoffs:

  • The API for these functions in the current state is much harder to mis-use.
    Making both return usize puts this responsibility on the user. I think this isn't much of a drawback, since there are other APIs which are easy to use "wrong".
  • Making both functions return usize makes a conceptually more simple and consistent API.
  • Making both functions return usize is more likely to optimize better (one register return).

Overall, because of consistency, simplicity and better optimizations, using usize would be better.
It's just my default instinct to make APIs harder to use "wrong", even if the API is harder to use as a result. Too many times I've been bitten by "0 means success" and "0 means failure".

I don’t really like the name TrackPath for the zipper object. How about PathZipper?

The purpose of this struct is to wrap a zipper and add ZipperPath implementation. If you want a noun, here's some ideas:

  • PathTracker (my personal choice)
  • [Zipper]PathWrapper
  • [Zipper]PathImpl
  • ZipperWithPath
    I really don't mind any choice, doesn't seem a central problem for me.

@Adam-Vandervorst
Copy link
Owner

Adam-Vandervorst commented Oct 9, 2025

> Option
I believe we want either to use 0 as the base case, or use NonZero for performance reasons.

> Remaining vs ascended
Either one is good to me. Remaining would be equal to path length? An ascended==0, like the ==steps check is also only useful in the ascended case, so perhaps prefer that.

> Result
I do generally prefer a lighter API, and this is the most performance critical part of API. That said, we should not lose any information, and integrate a small example of how to use it in the docstring.

> Path naming
Located? PathWrapper is OK. ZipperParh is slightly confusing to me.

@luketpeterson
Copy link
Collaborator

Yes, it's possible to make it simpler. The point is to make the API harder to mis-use.
This is a general pattern, when you have different outcomes (failed to ascend / ascended), you return different enum variants. This makes it very explicit to the user on what happened, and forces the user to handle different outcomes.

In general, I totally agree with this philosophy. But in this particular case, I think “failed to ascend” and “ascended 0 bytes” are actually exactly the same thing. Not just in effect but also in concept. The only way you could ascend zero bytes when calling one of these functions is if ascend failed, and the failure conditions are clearly defined.

Too many times I've been bitten by "0 means success" and "0 means failure".

Haha. That’s the worst. That same set of concerns leads me to try not to cast bools to numbers in rust, and thankfully going the other way isn't allowed (unlike C!)

Path-tracking object naming…

I’ll narrow it down to PathWrapper and PathTracker. Not sure which I prefer. Maybe PathWrapper… But it’s not a clear favorite so I could be convinced to go with PathTracker or something else.

@luketpeterson
Copy link
Collaborator

Oh shit.... I just did this work...

@luketpeterson
Copy link
Collaborator

Lemme merge and see if I missed anything you got, or vice versa

@luketpeterson
Copy link
Collaborator

So far, our changes are line-for-line identical excepting things like variable names and doc updates. A good sign.

@imlvts
Copy link
Collaborator Author

imlvts commented Oct 10, 2025

Oh shit.... I just did this work...

Unfortunate. I just saw the comment and implemented the changes as a test.
And I think after the change the API is much easier to use, so it was a good idea.

…ppers. Making all `ascend_` variants always return the number of bytes ascended

UPDATE: Merging with Igor's nearly identical work
@luketpeterson
Copy link
Collaborator

I've never had so many conflicts resolve so easily. We literally wrote exactly the same code (structurally) in 90% of cases.

@Adam-Vandervorst
Copy link
Owner

Adam-Vandervorst commented Oct 10, 2025

> Option
I believe to_next_sibling_byte returning Option<u8> and descend_until (and descend_first) taking Option<&mut Vec<u8>> are burdensome options, both syntactically and performance (flexibility) wise.

to_next_sibling_byte does not inform the user whether we were at the root, or if it was the last active byte, which I think would be the only excuse to have an option here. Since at_root needs to be implemented for blind zippers, I propose a u8 return type. I also propose we finally get rid of the dual to_sibling(bool) and to_{next,prev}_sibling() functions. Depending on accepting the proposal, the did-not-move invariant should be last_byte() == to_sibling(_) or to_next_sibling() == 0 and to_prev_sibling() == 255.
Furthermore, believe to_sibling implies that even blind-zippers should know some of the state, not just at_root() -> bool but last_byte() -> Option<u8>.

On the usability side, descend_until should really return how many bytes are descended, and not require to take the difference between the old and new vector length. Having potential allocation inlined is also expensive, so the code should be specialized based on the now runtime Option. Moreover, demanding a Vec eliminates the possibility of using a fixed-sized buffer here. This can be solved by a Write which automatically works for &mut Vec, and some Discard in cases where you're not interested in retrieving the path. For a C-style, generics-free API, *mut u8 works, and the returned usize written bytes can be used to process the result (with a null pointer obviously not being written to).

@imlvts
Copy link
Collaborator Author

imlvts commented Oct 10, 2025

to_next_sibling_byte does not inform the user whether we were at the root, or if it was the last active byte, which I think would be the only excuse to have an option here

I would argue that this is not a problem, since the function returning None indicates that 1) the zipper didn't move 2) there's no next sibling. This information is useful, correct and consistent, across different situations.
On the other hand, returning 0 and 255 is too magical.

I agree there's some weirdness around what does it mean to have a blind zipper. Perhaps we should discuss this, and it's possible that this change is not needed if there's no consistency.
My idea of a "blind zipper" is one that doesn't have its entire path as a singular continuous buffer.
Existing ZipperMoving functions (even excluding path), allow reconstructing path. I think it's not possible to have a (useful) ZipperMoving which can't recover it's path.
So these are the options:

  1. "Blind zipper" means it doesn't have its entire path available immediately (but it can reconstruct it)
  2. "Blind zipper" means it can't reconstruct its path. (And therefore it's not possible to have a wrapper that has a path as a single buffer)
  3. There's an entirely different definition for what it is.
  4. "Blind zipper" is not a useful idea. And this PR can be closed.

@Adam-Vandervorst
Copy link
Owner

> For a C-style, generics-free API, *mut u8 works, and the returned usize written bytes can be used to process the result (with a null pointer obviously not being written to).

Implemented in 78bb6b5 but overlooked an important factor: descend_until needs to be bounded (either internally or externally) or something in the vein of "to allocate or not to allocate: that is the question".

This seems to favor a W: Write API, but I'm not sure until I try that one...

@luketpeterson
Copy link
Collaborator

luketpeterson commented Oct 11, 2025

I also propose we finally get rid of the dual to_sibling(bool) and to_{next,prev}_sibling() functions

to_sibling has been gone from the trait for many months. Some implementations (Like the ACT zippers and the ReadZipperCore) have it as a private method, but it hasn't been exposed for a long time.

Furthermore, believe to_sibling implies that even blind-zippers should know some of the state, not just at_root() -> bool but last_byte() -> Option.

I commented exactly this point here, a couple days ago:

//GOAT, this default impl is incompatible with the blind zippers. :-( That argues that perhaps it belongs in another trait, e.g. ZipperIteration

Although my intuition was that this argued for moving the lateral (sibling) movement into another trait. But after the discussion last night I'm feel pragmatically that adding fn focus_byte(&self) -> Option<u8> to ZipperMoving is a reasonable solution.

@luketpeterson
Copy link
Collaborator

  • "Blind zipper" means it doesn't have its entire path available immediately (but it can reconstruct it)

I almost agree with this definition. To me a "blind zipper" is the idea that a zipper doesn't need to track its path internally to implement ZipperMoving. In other words I don't think the zipper itself needs to be required to store everything needed to reconstruct its path. But an outsider should be able to do that by tracking how the zipper moved and where it started.

I don't think the phrase "blind zipper" is going to mean much to users (or even us) beyond just this PR. It's just a handy shorthand for the change we want to make. However I think this PR is useful and good. I.e. it is conceptually consistent and it provides a valuable practical benefit.

@luketpeterson
Copy link
Collaborator

I believe to_next_sibling_byte returning Option ... are burdensome options

I am not sure I understand the objection to returning Option<u8>, as opposed to the bool that was previously returned.

From a usability standpoint, just call is_some(), and that provides exactly the same information as the prior bool.

From a perf standpoint, Option<u8> packs into a single machine word on any platform we care about, so it's one register either way. Maybe there is a single extra instruction needed for access (although I'm not even sure about that) but I am virtually certain this isn't going to move any benchmarks.

…s opposed to `Option<&mut Vec<u8>>` and fixing a couple other bugs & todos along the way
@luketpeterson
Copy link
Collaborator

Just pushed a change to make the descend_until method accept a generic Write sink for descended bytes. In addition to the improved flexabilty, this has the added advantage that the monomorphism is guaranteed to strip away the runtime checks that used to be around the Option if you pass the "dev/null" option of std::io::sink()

fn descend_until<W: std::io::Write>(&mut self, mut desc_bytes: W) -> bool {

@Adam-Vandervorst
Copy link
Owner

Adam-Vandervorst commented Oct 11, 2025

> since the function returning None indicates that 1) the zipper didn't move 2) there's no next sibling
> provides exactly the same information as the prior

That's precisely the problem: it doesn't provide anything extra, while acting like it does!
The contract 1) OR 2) caused None literally has no use over the boolean.

If you map 2) to 0u8 or last_byte_unchecked(), now you have more information.

If you remove 1) (at root) and just call at_root(), now you have more information.

Hell with last_byte(), you can just return a boolean again, and the API usage just stays clean and no information is lost!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants