Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting containers #169

Closed
treeowl opened this issue May 22, 2023 · 22 comments
Closed

Splitting containers #169

treeowl opened this issue May 22, 2023 · 22 comments
Labels
core-libraries Governance of Core Libraries other than base

Comments

@treeowl
Copy link

treeowl commented May 22, 2023

I have been wanting to break up the containers package for some time, for several reasons:

  1. The exposed internals violate PVP.
  2. The exposed internals give a very cluttered package overview on Hackage, which I'm sure is quite intimidating to beginners and makes it hard for anyone to find things.
  3. Many pieces of the package are quite thoroughly unrelated to each other.
  4. When considering the impact of changes on Hackage, a reverse dependency search is of limited utility—if I want to know how Data.Sequence or Data.Graph is being used, I have to wade through all the packages using Data.Map.

Since containers came about long before I was involved, I would like to get input from the community. Here's a draft concept of packages:

  • containers: Everything except internals. This would be maintained indefinitely, but might eventually drop Data.Tree and Data.Graph. Users would be advised to consider using more specific packages instead to reduce their dependencies.
  • containers-map: Data.Map, Data.Set, and associated merge and list utility modules
  • containers-map-internals: The internals for containers-map
  • containers-intmap: Data.IntMap, Data.IntSet, and associated merge and list utility modules
  • containers-sequence: Data.Sequence
  • containers-sequence-internals: Internals for containers-sequence
  • containers-tree: Data.Tree
  • containers-graph: Data.Graph

Possibly also:

  • strict-maybe: The strict version of Maybe
  • strict-tuples: Fully strict and left-strict pairs.
@Kleidukos
Copy link
Member

I can give my outside opinion: with what I can read, this seems like a very bad idea. The amount of energy asked for maintenance might take off very sharply. You would have to give to 8-10 packages the same amount of care as 1. Moreover I fear that discoverability would take a hit.

Any chance you would be open to having multiple public libraries in the containers package?

@tomjaguarpaw
Copy link
Member

  • strict-maybe: The strict version of Maybe

There are already packages for this. Let's not multiply them. Off the top of my head I can think of

@treeowl
Copy link
Author

treeowl commented May 22, 2023

Any chance you would be open to having multiple public libraries in the containers package?

I ... don't know what this means. Why do you think maintenance costs will increase significantly? Why do you fear for discoverability?

@Kleidukos
Copy link
Member

I ... don't know what this means

I'm specifically referring to https://fgaz.me/posts/2019-11-14-cabal-multiple-libraries/ that is implemented in Cabal & Hackage

My fears (but they're mostly fears) are mainly that the amount of administrative tasks (revisions, version bumps, dependency bumps) might multiply and become unwieldy.

Regarding discovery, well we suffer from quite a large problem of discoverability in the ecosystem, and I'm not sure it would be wise to prune Data.Tree & Graph from the package. That being I am absolutely willing to hear the argument that there will be documentation investments to point beginners in the right direction.

if I want to know how Data.Sequence or Data.Graph is being used, I have to wade through all the packages using Data.Map.

Is this something where https://hackage-search.serokell.io/?q=import.*Data.Graph could be useful?

@Bodigrim
Copy link
Collaborator

I'd probably just mechanically separate everything *.Internal into containers-internals, and leave the rest in containers proper. This gives you full PVP compliance, without making stuff too entangled.

Juggling 8 boot packages would likely be quite a nightmare.

@treeowl
Copy link
Author

treeowl commented May 22, 2023

I ... don't know what this means

I'm specifically referring to https://fgaz.me/posts/2019-11-14-cabal-multiple-libraries/ that is implemented in Cabal & Hackage

I will read up on that, but the fact that it requires GHC 8.8 is going to be a problem for a while—containers currently supports GHC back to 8.0.

My fears (but they're mostly fears) are mainly that the amount of administrative tasks (revisions, version bumps, dependency bumps) might multiply and become unwieldy.

That is a reasonable concern. However, the extremely limited dependencies between the different components should make it tractable. Data.Sequence is used to implement just one function in Data.Tree, and only uses a very old and very stable piece of its API. Just how bad would it be to split the Git repo? That would make it much easier to see at a glance what needs major/minor bumps.

Regarding discovery, well we suffer from quite a large problem of discoverability in the ecosystem, and I'm not sure it would be wise to prune Data.Tree & Graph from the package. That being I am absolutely willing to hear the argument that there will be documentation investments to point beginners in the right direction.

Data.Tree and Data.Graph don't seem to get terribly broad use anyway. Data.Graph is an odd and unpleasant artifact anyway; it's just the code from a particular paper splattered into a module with no consideration for API design or abstraction barriers, and updated/improved in limited ways since. I believe beginners are unlikely to need either of them, but certainly not Data.Graph.

if I want to know how Data.Sequence or Data.Graph is being used, I have to wade through all the packages using Data.Map.

Is this something where https://hackage-search.serokell.io/?q=import.*Data.Graph could be useful?

Sure, but what happens when Serokell no longer thinks that's worth its money to maintain? And how are people supposed to discover that resource anyway?

@hasufell
Copy link
Member

I have the same intention with filepath, just fyi: haskell/filepath#192

And I agree that the containers-internals is probably the right approach.

@chshersh
Copy link
Member

I agree with previous comments and would advise against splitting containers into multiple packages for several additional reasons:

  1. Some algorithms in Data.Map may require Data.Seq in the future and vice versa and mutually recursive packages are impossible. So you'll be faced with a serious problem.
  2. It's already perceived as inconvenient by many people to add containers to the list of dependencies to get access to the standard data structure whereas in other languages you'd expect them in the standard library. With multiple public libraries, it's even worse.
  3. I consider all Seq, Map, Set, etc. to be containers and it makes sense to me to have them in a single containers package.

I understand the maintenance burden, and I'm okay with having containers-internal because users are not expected to depend on this package. So, you can do the internal split today and users won't notice. This sounds like a safe move.

@chshersh chshersh added the core-libraries Governance of Core Libraries other than base label May 24, 2023
@treeowl
Copy link
Author

treeowl commented May 24, 2023

1. Some algorithms in `Data.Map` may require `Data.Seq` in the future and vice versa and mutually recursive packages are impossible. So you'll be faced with a serious problem.

This isn't a strong argument, because we can (hypothetically) have both containers-sequence and containers-map depend on both containers-sequence-internal and containers-map-internal. Anything that can't be broken up like that will need {-# SOURCE #-} imports, and I really don't think anything could justify those in containers.

2. It's already perceived as inconvenient by many people to add `containers` to the list of dependencies to get access to the standard data structure whereas in other languages you'd expect them in the standard library. With multiple public libraries, it's even worse.

containers won't go away regardless. What's the concern exactly?

3. I consider all `Seq`, `Map`, `Set`, etc. to be containers and it makes sense to me to have them in a single `containers` package.

Also hash maps, dependent maps, priority queues, priority search queues, and fgl-style graphs?

I understand the maintenance burden, and I'm okay with having containers-internal because users are not expected to depend on this package. So, you can do the internal split today and users won't notice. This sounds like a safe move.

Yes, it certainly seems that that's what the community is buying into at the moment. I wish I had a better understanding of why that's all. Regardless, it is something....

@hasufell
Copy link
Member

I wish I had a better understanding of why that's all.

https://nikita-volkov.github.io/internal-convention-is-a-mistake/

In short:

  1. using Internal module convention (usually) violates PVP in practice
  2. following PVP with Internal modules leads to either freezing the internal API or aggressively bumping major versions, making the PVP a moot point

I don't have a strong opinion about splitting up into many packages, as long as the main containers still reexports all those. It also makes PVP more meaningful: a user might not care whether the API of Seq broke, but whether the API of Map broke. Now they can define their "interests" better. And users who don't care can keep using containers.

But it's a maintenance burden for you, GHC devs and potential contributors. But both cabal and stack support multi package projects well.

@mixphix
Copy link
Collaborator

mixphix commented May 24, 2023

I'll say that I would consider this a good idea -- if not for the fact that it's a boot package.

Splitting modules across libraries this way doesn't increase the size of the codebase in a significant way. It does increase the maintenance burden, but only insofar as it gives you finer control (and thus more room for error) over versioning and organization. It may even improve compile times if done well. Such splits also allow newcomers to target their improvements more precisely.

However, because this is a boot package, I worry about the consequences of splitting it into more than, say, three or four. Even though the "user-facing surface area" of the library has not much changed, all of a sudden maintainers have to coordinate 8 sets of internal version bounds, release cycles, ticket labels... sounds like hell!

@jdkr
Copy link

jdkr commented May 24, 2023

The readme https://github.com/haskell/core-libraries-committee does not list the containers package as a core library. Wondering why it is missing there.

@jdkr
Copy link

jdkr commented May 24, 2023

I don't see any point in splitting off -internal packages in general. If they contain internal things, they are not intended for use by other packages. So why create an own package for the internals? Wouldn't it be better to just not expose the internals, using an explicit module export list?

@tomjaguarpaw
Copy link
Member

The readme https://github.com/haskell/core-libraries-committee does not list the containers package as a core library. Wondering why it is missing there.

Perhaps because it is not a core library, by the definition given in the readme?

Wouldn't it be better to just not expose the internals, using an explicit module export list?

Well no, sometimes it's really beneficial to have access to the internals.

@jdkr
Copy link

jdkr commented May 24, 2023

Perhaps because it is not a core library, by the definition given in the readme?

So actually this is not an issue to discuss on the CLC board?

@treeowl
Copy link
Author

treeowl commented May 24, 2023

@jdkr It was just a request for advisory opinions on a library of some significance.

@Kleidukos
Copy link
Member

It is indeed quite significant, which is why I've included it in the @haskell namespace on Flora: https://flora.pm/packages/@haskell/containers

@mixphix
Copy link
Collaborator

mixphix commented May 24, 2023

The boot libraries are not the same as the core libraries, though they generally overlap. Boot libraries are shipped with ghc. I'm personally not exactly sure what brought about our current list of core libraries.

@hasufell
Copy link
Member

Well no, sometimes it's really beneficial to have access to the internals.

Yes, an easy example is the new filepath API that abstracts over the raw bytes of each platform and preserves them.

The constructor to the bytes is not exposed by "proper" API, because it's really hard to use correctly. And yet, if you build another low-level library on top, that might be exactly what you want.

So, smart constructors is a very prominent example.

@Ericson2314
Copy link
Contributor

https://nikita-volkov.github.io/internal-convention-is-a-mistake/ I am glad this is getting spread around.

@alexfmpe and I have looked at non empty containers at times, and wondering just how much we should worry about stability in the internals was a (albeit minor in the trand scheme of things) hangup.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Jun 4, 2023

(IMO ideally boot libraries should be a subset of "core", so I'd welcome donation of containers to CLC, but it's entirely up to maintainers)

@treeowl is there anything left to discuss here?

@treeowl
Copy link
Author

treeowl commented Jun 4, 2023

Nope.

@treeowl treeowl closed this as completed Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libraries Governance of Core Libraries other than base
Projects
None yet
Development

No branches or pull requests

9 participants