Splitting containers #169

treeowl · 2023-05-22T19:21:32Z

I have been wanting to break up the containers package for some time, for several reasons:

The exposed internals violate PVP.
The exposed internals give a very cluttered package overview on Hackage, which I'm sure is quite intimidating to beginners and makes it hard for anyone to find things.
Many pieces of the package are quite thoroughly unrelated to each other.
When considering the impact of changes on Hackage, a reverse dependency search is of limited utility—if I want to know how Data.Sequence or Data.Graph is being used, I have to wade through all the packages using Data.Map.

Since containers came about long before I was involved, I would like to get input from the community. Here's a draft concept of packages:

containers: Everything except internals. This would be maintained indefinitely, but might eventually drop Data.Tree and Data.Graph. Users would be advised to consider using more specific packages instead to reduce their dependencies.
containers-map: Data.Map, Data.Set, and associated merge and list utility modules
containers-map-internals: The internals for containers-map
containers-intmap: Data.IntMap, Data.IntSet, and associated merge and list utility modules
containers-sequence: Data.Sequence
containers-sequence-internals: Internals for containers-sequence
containers-tree: Data.Tree
containers-graph: Data.Graph

Possibly also:

strict-maybe: The strict version of Maybe
strict-tuples: Fully strict and left-strict pairs.

The text was updated successfully, but these errors were encountered:

Kleidukos · 2023-05-22T19:36:14Z

I can give my outside opinion: with what I can read, this seems like a very bad idea. The amount of energy asked for maintenance might take off very sharply. You would have to give to 8-10 packages the same amount of care as 1. Moreover I fear that discoverability would take a hit.

Any chance you would be open to having multiple public libraries in the containers package?

tomjaguarpaw · 2023-05-22T19:36:32Z

strict-maybe: The strict version of Maybe

There are already packages for this. Let's not multiply them. Off the top of my head I can think of

https://hackage.haskell.org/package/strict
https://hackage.haskell.org/package/strict-wrapper (which I maintain)

treeowl · 2023-05-22T19:39:52Z

Any chance you would be open to having multiple public libraries in the containers package?

I ... don't know what this means. Why do you think maintenance costs will increase significantly? Why do you fear for discoverability?

Kleidukos · 2023-05-22T19:52:15Z

I ... don't know what this means

I'm specifically referring to https://fgaz.me/posts/2019-11-14-cabal-multiple-libraries/ that is implemented in Cabal & Hackage

My fears (but they're mostly fears) are mainly that the amount of administrative tasks (revisions, version bumps, dependency bumps) might multiply and become unwieldy.

Regarding discovery, well we suffer from quite a large problem of discoverability in the ecosystem, and I'm not sure it would be wise to prune Data.Tree & Graph from the package. That being I am absolutely willing to hear the argument that there will be documentation investments to point beginners in the right direction.

if I want to know how Data.Sequence or Data.Graph is being used, I have to wade through all the packages using Data.Map.

Is this something where https://hackage-search.serokell.io/?q=import.*Data.Graph could be useful?

Bodigrim · 2023-05-22T20:00:28Z

I'd probably just mechanically separate everything *.Internal into containers-internals, and leave the rest in containers proper. This gives you full PVP compliance, without making stuff too entangled.

Juggling 8 boot packages would likely be quite a nightmare.

treeowl · 2023-05-22T20:21:21Z

I ... don't know what this means

I'm specifically referring to https://fgaz.me/posts/2019-11-14-cabal-multiple-libraries/ that is implemented in Cabal & Hackage

I will read up on that, but the fact that it requires GHC 8.8 is going to be a problem for a while—containers currently supports GHC back to 8.0.

My fears (but they're mostly fears) are mainly that the amount of administrative tasks (revisions, version bumps, dependency bumps) might multiply and become unwieldy.

That is a reasonable concern. However, the extremely limited dependencies between the different components should make it tractable. Data.Sequence is used to implement just one function in Data.Tree, and only uses a very old and very stable piece of its API. Just how bad would it be to split the Git repo? That would make it much easier to see at a glance what needs major/minor bumps.

Regarding discovery, well we suffer from quite a large problem of discoverability in the ecosystem, and I'm not sure it would be wise to prune Data.Tree & Graph from the package. That being I am absolutely willing to hear the argument that there will be documentation investments to point beginners in the right direction.

Data.Tree and Data.Graph don't seem to get terribly broad use anyway. Data.Graph is an odd and unpleasant artifact anyway; it's just the code from a particular paper splattered into a module with no consideration for API design or abstraction barriers, and updated/improved in limited ways since. I believe beginners are unlikely to need either of them, but certainly not Data.Graph.

if I want to know how Data.Sequence or Data.Graph is being used, I have to wade through all the packages using Data.Map.

Is this something where https://hackage-search.serokell.io/?q=import.*Data.Graph could be useful?

Sure, but what happens when Serokell no longer thinks that's worth its money to maintain? And how are people supposed to discover that resource anyway?

hasufell · 2023-05-23T02:29:58Z

I have the same intention with filepath, just fyi: haskell/filepath#192

And I agree that the containers-internals is probably the right approach.

chshersh · 2023-05-24T07:04:07Z

I agree with previous comments and would advise against splitting containers into multiple packages for several additional reasons:

Some algorithms in Data.Map may require Data.Seq in the future and vice versa and mutually recursive packages are impossible. So you'll be faced with a serious problem.
It's already perceived as inconvenient by many people to add containers to the list of dependencies to get access to the standard data structure whereas in other languages you'd expect them in the standard library. With multiple public libraries, it's even worse.
I consider all Seq, Map, Set, etc. to be containers and it makes sense to me to have them in a single containers package.

I understand the maintenance burden, and I'm okay with having containers-internal because users are not expected to depend on this package. So, you can do the internal split today and users won't notice. This sounds like a safe move.

treeowl · 2023-05-24T07:46:39Z

1. Some algorithms in `Data.Map` may require `Data.Seq` in the future and vice versa and mutually recursive packages are impossible. So you'll be faced with a serious problem.

This isn't a strong argument, because we can (hypothetically) have both containers-sequence and containers-map depend on both containers-sequence-internal and containers-map-internal. Anything that can't be broken up like that will need {-# SOURCE #-} imports, and I really don't think anything could justify those in containers.

2. It's already perceived as inconvenient by many people to add `containers` to the list of dependencies to get access to the standard data structure whereas in other languages you'd expect them in the standard library. With multiple public libraries, it's even worse.

containers won't go away regardless. What's the concern exactly?

3. I consider all `Seq`, `Map`, `Set`, etc. to be containers and it makes sense to me to have them in a single `containers` package.

Also hash maps, dependent maps, priority queues, priority search queues, and fgl-style graphs?

I understand the maintenance burden, and I'm okay with having containers-internal because users are not expected to depend on this package. So, you can do the internal split today and users won't notice. This sounds like a safe move.

Yes, it certainly seems that that's what the community is buying into at the moment. I wish I had a better understanding of why that's all. Regardless, it is something....

hasufell · 2023-05-24T08:04:17Z

I wish I had a better understanding of why that's all.

https://nikita-volkov.github.io/internal-convention-is-a-mistake/

In short:

using Internal module convention (usually) violates PVP in practice
following PVP with Internal modules leads to either freezing the internal API or aggressively bumping major versions, making the PVP a moot point

I don't have a strong opinion about splitting up into many packages, as long as the main containers still reexports all those. It also makes PVP more meaningful: a user might not care whether the API of Seq broke, but whether the API of Map broke. Now they can define their "interests" better. And users who don't care can keep using containers.

But it's a maintenance burden for you, GHC devs and potential contributors. But both cabal and stack support multi package projects well.

mixphix · 2023-05-24T15:08:13Z

I'll say that I would consider this a good idea -- if not for the fact that it's a boot package.

Splitting modules across libraries this way doesn't increase the size of the codebase in a significant way. It does increase the maintenance burden, but only insofar as it gives you finer control (and thus more room for error) over versioning and organization. It may even improve compile times if done well. Such splits also allow newcomers to target their improvements more precisely.

However, because this is a boot package, I worry about the consequences of splitting it into more than, say, three or four. Even though the "user-facing surface area" of the library has not much changed, all of a sudden maintainers have to coordinate 8 sets of internal version bounds, release cycles, ticket labels... sounds like hell!

jdkr · 2023-05-24T20:49:00Z

The readme https://github.com/haskell/core-libraries-committee does not list the containers package as a core library. Wondering why it is missing there.

jdkr · 2023-05-24T20:50:21Z

I don't see any point in splitting off -internal packages in general. If they contain internal things, they are not intended for use by other packages. So why create an own package for the internals? Wouldn't it be better to just not expose the internals, using an explicit module export list?

tomjaguarpaw · 2023-05-24T20:53:07Z

The readme https://github.com/haskell/core-libraries-committee does not list the containers package as a core library. Wondering why it is missing there.

Perhaps because it is not a core library, by the definition given in the readme?

Wouldn't it be better to just not expose the internals, using an explicit module export list?

Well no, sometimes it's really beneficial to have access to the internals.

jdkr · 2023-05-24T20:59:19Z

Perhaps because it is not a core library, by the definition given in the readme?

So actually this is not an issue to discuss on the CLC board?

treeowl · 2023-05-24T21:31:31Z

@jdkr It was just a request for advisory opinions on a library of some significance.

Kleidukos · 2023-05-24T22:06:45Z

It is indeed quite significant, which is why I've included it in the @haskell namespace on Flora: https://flora.pm/packages/@haskell/containers

mixphix · 2023-05-24T22:16:37Z

The boot libraries are not the same as the core libraries, though they generally overlap. Boot libraries are shipped with ghc. I'm personally not exactly sure what brought about our current list of core libraries.

hasufell · 2023-05-25T02:23:04Z

Well no, sometimes it's really beneficial to have access to the internals.

Yes, an easy example is the new filepath API that abstracts over the raw bytes of each platform and preserves them.

The constructor to the bytes is not exposed by "proper" API, because it's really hard to use correctly. And yet, if you build another low-level library on top, that might be exactly what you want.

So, smart constructors is a very prominent example.

Ericson2314 · 2023-05-25T04:32:06Z

https://nikita-volkov.github.io/internal-convention-is-a-mistake/ I am glad this is getting spread around.

@alexfmpe and I have looked at non empty containers at times, and wondering just how much we should worry about stability in the internals was a (albeit minor in the trand scheme of things) hangup.

Bodigrim · 2023-06-04T15:45:46Z

(IMO ideally boot libraries should be a subset of "core", so I'd welcome donation of containers to CLC, but it's entirely up to maintainers)

@treeowl is there anything left to discuss here?

treeowl · 2023-06-04T15:48:08Z

Nope.

chshersh added the core-libraries Governance of Core Libraries other than base label May 24, 2023

treeowl closed this as completed Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting containers #169

Splitting containers #169

treeowl commented May 22, 2023 •

edited

Loading

Kleidukos commented May 22, 2023

tomjaguarpaw commented May 22, 2023

treeowl commented May 22, 2023

Kleidukos commented May 22, 2023

Bodigrim commented May 22, 2023

treeowl commented May 22, 2023

hasufell commented May 23, 2023

chshersh commented May 24, 2023

treeowl commented May 24, 2023

hasufell commented May 24, 2023

mixphix commented May 24, 2023 •

edited

Loading

jdkr commented May 24, 2023

jdkr commented May 24, 2023

tomjaguarpaw commented May 24, 2023

jdkr commented May 24, 2023

treeowl commented May 24, 2023

Kleidukos commented May 24, 2023

mixphix commented May 24, 2023

hasufell commented May 25, 2023

Ericson2314 commented May 25, 2023

Bodigrim commented Jun 4, 2023

treeowl commented Jun 4, 2023

Splitting containers #169

Splitting containers #169

Comments

treeowl commented May 22, 2023 • edited Loading

Kleidukos commented May 22, 2023

tomjaguarpaw commented May 22, 2023

treeowl commented May 22, 2023

Kleidukos commented May 22, 2023

Bodigrim commented May 22, 2023

treeowl commented May 22, 2023

hasufell commented May 23, 2023

chshersh commented May 24, 2023

treeowl commented May 24, 2023

hasufell commented May 24, 2023

mixphix commented May 24, 2023 • edited Loading

jdkr commented May 24, 2023

jdkr commented May 24, 2023

tomjaguarpaw commented May 24, 2023

jdkr commented May 24, 2023

treeowl commented May 24, 2023

Kleidukos commented May 24, 2023

mixphix commented May 24, 2023

hasufell commented May 25, 2023

Ericson2314 commented May 25, 2023

Bodigrim commented Jun 4, 2023

treeowl commented Jun 4, 2023

treeowl commented May 22, 2023 •

edited

Loading

mixphix commented May 24, 2023 •

edited

Loading