Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterize Split #5523

Open
wants to merge 10 commits into
base: trunk
Choose a base branch
from
Open

Conversation

sellout
Copy link
Contributor

@sellout sellout commented Jan 4, 2025

Overview

This is motivated by making changes to UCM commands that require handling many different variants of path types. This results in a lot of almost-duplicated code that needs to be updated in many places whenever there is a change.

This primarily makes two changes:

  1. moves HashQualified outside of the HQSplit tuples and
  2. parameterizes Split by its path type.

This results in the following changes to types:

  • (path, NameSegment)Split path
  • AbsSplitSplit Absolute
  • HQSplit'HashQualified (Split Path')
  • HQSplitAbsoluteHashQualified (Split Absolute)
  • HQSplitHashQualified (Split Path)
  • Split'Split Path'
  • SplitSplit Path

There are a number of related changes that follow from this:

  • Pathy and Namey classes that provide common operations across these types,
  • better integration of BranchRelativePath and ProjectPath with these types,
  • elimination of duplicated functions that were needed to handle HQ variants of Split types (these are now all simply fmap or traverse), and
  • improving type safety in some cases (e.g., Split Path becoming Split Relative in some cases).

And a couple other changes that were just easy to make:

  • replacing sequence-oriented operations with path-oriented operations (e.g., snoc is now descend and unsnoc is now split),
  • removing Show from path serialization, and
  • making Path' a sum type, rather than a newtype.

Interesting/controversial decisions

One thing that feels controversial is the increased use of the unsafe Functor HashQualified instance. However, I think this PR just exposes already unsafe usage. Basically, modifying the name in a HashQualified can invalidate the hash, but Functor can’t manage that. Because of the change from (Path, HashQualified NameSegment) to HashQualified (Path, NameSegment), changes to the Path now require Functor HashQualified. But it was already the case that changing that path could make the hash not match the Path </> NameSegment, it’s just more apparent now.

Loose ends

There are still plenty of ways to improve the path handling in Unison. This brings things a bit closer. I think an eventual unification of Split and Name is worthwhile, combining the benefits of each.

  • Split’s structure allows for distinguishing between absolute and relative paths at the type level
  • Name’s reversed segment ordering allows for simpler code

They’re isomorphic to `Name` and `HQ'.HashQualified Name`. There are
some benefits to the alternative structure, but overall it just resulted
in a lot of duplicated code. `Name` can be restructured in future, but
first we need to eliminate the duplication.
All uses of `HashQualified` should now be outside the full pathy
structure. There are a couple motivations for this:

1. it makes the many `Name` ↔︎ `Split` conversions simpler and
2. changing the `Path` component of a `Split` should have to pass
   through the `HashQualified` structure, because it can potentially
   invalidate the hash[^1].

[^1]: `HashQualified` shouldn’t have a `Functor` instance, because it
should only be modifiable in carefully-controlled ways, but that’s a
battle for another day.

Types like `Split` and `AbsSplit` still exist because there aren’t
`Name` equivalents to those types (and there are additional types like
`(ProjectPath, NameSegment)` that are structured similarly to `Split`).
It now takes the path-like type as an argument, so we use `Split Path`,
`Split Absolute`, and `Split ProjectPath` as appropriate. Some
operations are also generalized over `Split p`.
This adds a type class, `Pathy`, with instances for the common path
types: `Path`, `Path'`, `Absolute`, and `Relative`. The intent is to
simplify the code, but also to make it easier to transition away from
direct `Path` usage in future.

This also renames a few operations, to make them more path-specific.
E.g., instead of `snoc`, it’s `descend`[^1]. and `absoluteEmpty` is now
`root`.

[^1]: `snoc` and `cons` are particularly confusing names, because we
store segments in different orders in different contexts, and it’s not
clear whether `snoc`/`cons` should refer to the internal ordering or the
conceptual ordering. Also, `Path` is likely to reverse it order at some
point to match `Name`, and if `snoc`/`cons` reflects the internal
ordering, then that’ll break things.

Also the `Path` constructor/field are no longer exported, and `Path'` is
now defined directly in terms of the `pattern`s that were used
everywhere.
With the various changes since removing `Split'`, it now makes more
sense to use `Split Path'` (structurally identical to `Split'`) than
`Name`.
This covers `ProjectPath` and `BranchRelativePath`.

It required splitting `Name` related functions into a separate class,
because these types don’t convert to `Name`. I think that’s a good idea
anyway, as we want to phase out those conversions.
This also tightens up path types in a few places.
@sellout
Copy link
Contributor Author

sellout commented Jan 4, 2025

I generally recommend reviewing PRs commit-by-commit, and I still do here, but it’s worth noting that some changes from the first commit were reverted in the sixth commit. Because of the intervening work, it was complicated to rewrite the history to avoid the reversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant