Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToGraph maybe doesn't need so many methods? #151

Open
michaelpj opened this issue Nov 30, 2018 · 4 comments
Open

ToGraph maybe doesn't need so many methods? #151

michaelpj opened this issue Nov 30, 2018 · 4 comments

Comments

@michaelpj
Copy link

I understand that the intention of ToGraph, like Foldable, is that some of the methods can be overridden to provide better performance.

But in practice it seems like the only ones that actually matter are

  • foldG (special implementation for Fold)
  • toGraph, toAdjacencyMap, toAdjacencyIntMap (special implementations for those very types or similar ones)

AFAICT the overriding implementations are all equivalent to calling the default implementations on ToGraph with the specialised version of e.g. toAdjacencyMap. For example, LabelledAdjacencyMap just delegates to the AdjacencyMap functions after calling skeleton... which is just its definition of toAdjacencyMap.

So I think we could have a ToGraph class that just had foldG and the toX methods, and then provide some generic implementations of the others as functions with a ToGraph constraint, rather than class methods.

Is this actually a problem? Not enormously, it just looks odd to a reader to have such a profusion of class methods, which will potentially grow indefinitely as more algorithms are added.

(I might just be missing some overrides that are in fact performance-critical, of course!)

@michaelpj
Copy link
Author

As an example of how this can lead to some (accidental?) inconsistency, scc isn't on ToGraph, although it's perfectly definable in terms of toAdjacencyMap.

@snowleopard
Copy link
Owner

snowleopard commented Dec 1, 2018

I'm not a big fan of ToGraph either and I'd be happy to trim it down, so let me list all current methods and give some rational for why they are not just functions with a ToGraph constraint:

  • foldg is the only interesting method really.

  • toGraph could in principle be demoted to a function, although it feels natural to define a ToGraph instance by implementing the toGraph method. The Graph data type has a nice implementation id, but I think performance-wise we should be able to simplify the default implementation foldg Empty Vertex Overlay Connect to id using @nobrakal's rewrite rules.

  • isEmpty: this one is unlikely to ever become a performance bottleneck, but still there are instances with O(1) run time cost, which you can't achieve when going via Graph or AdjacencyMap.

  • size: I think this one should just be deleted from the ToGraph module altogether, since it only makes sense for algebraic representations.

  • hasVertex, hasEdge, vertexCount, edgeCount: like isEmpty, these should probably stay, because some instances have O(1) implementations, which you can't achieve when going via Graph or AdjacencyMap. As examples, consider Data.Graph from containers (which I plan to make an instance of ToGraph), or an adjacency matrix that can answer hasEdge queries in O(1).

  • vertexList, vertexSet, vertexIntSet, preSet, preIntSet, postSet, postIntSet: algebraic representations like Graph can implement these functions more efficiently than by going via an AdjacencyMap (even if they currently don't do that).

  • edgeList, edgeSet, adjacencyList: again, some instances like Relation give you faster access to various collections of edges.

  • adjacencyMap, adjacencyIntMap, adjacencyMapTranspose, adjacencyIntMapTranspose: these could probably be removed from ToGraph. They can be efficiently implemented by going via the corresponding to* methods (at least until we change the implementation of AdjacencyMaps).

  • dfsForest, dfsForestFrom, dfs, reachable, topSort, isAcyclic, isDfsForestOf, isTopSortOf: all of these are currently implemented by going via the AdjacencyMap, but I hope to provide alternative implementations for algebraic graphs, which will be faster on dense graphs.

  • toAdjacencyMap, toAdjacencyMapTranspose, toAdjacencyIntMap, toAdjacencyIntMapTranspose: these seem to be important, at least for now, when we don't know many interesting algorithms that operate directly on algebraic graph representations. Why do we need transpose versions? Because you can have instances for which computing the transpose is just O(1), for example adjacency maps that store both forward and reverse edge directions, separately.

As an example of how this can lead to some (accidental?) inconsistency, scc isn't on ToGraph, although it's perfectly definable in terms of toAdjacencyMap.

This is not accidental. I just don't know how to make it a method of the ToGraph type class, because it returns a doubly-layered graph. The best we could do is something like this:

scc ::  (ToGraph g, ToGraph h, ToVertex h ~ g) => g -> h

Alas, this doesn't really express the fact that the inner graphs in the scc result are non-empty.


Summary: Here are the methods we should probably remove from ToGraph: size (remove from the module altogether, it was just a mistake), adjacencyMap, adjacencyIntMap, adjacencyMapTranspose, adjacencyIntMapTranspose.

@michaelpj Does this sound reasonable?

@michaelpj
Copy link
Author

Hm, it seems like many of these do in fact have at least the potential for more efficient implementations.

I would be tempted to split out the "can be folded into a graph" class from the "overridable graph operations class" for clarity, but that's somewhat subjective and based on my confusion on first discovering the class.

i.e. something like

class ToGraph t where
  toGraph :: ...
  foldg :: ...

class ToGraph t => GraphOps t where
   hasVertex :: ...
   ...

You could even drop the ToGraph constraint and just offer default signatures that add it.

This is not accidental. I just don't know how to make it a method of the ToGraph type class, because it returns a doubly-layered graph. The best we could do is something like this:

Couldn't you concretely return an AdjacencyMap (NonEmpty.AdjacencyMap a) in this case? It's not maximally generic, but I'm not sure how much that matters.

@snowleopard
Copy link
Owner

I have a feeling that splitting off ToGraph and GraphOps might bring even more confusion... Let's keep this issue open, perhaps we'll figure out a better approach some day.

Couldn't you concretely return an AdjacencyMap (NonEmpty.AdjacencyMap a) in this case? It's not maximally generic, but I'm not sure how much that matters.

Well, in this case I won't be able to add scc for algebraic graphs that returns Graph (NonEmpty.Graph a), which is annoying. I think there should be some generic way of getting to the NonEmpty version for all our graph implementations.

snowleopard added a commit that referenced this issue May 27, 2019
This revision addresses a few issues:

* Improves the testsuite (#13) by switching to simpler graph `API` type class.

* Simplifies the `ToGraph` class, as discussed in #151.

* Removes `Algebra.Graph.Fold` which, while cool, appears to be useless in practice.

* Removes dependencies on `base-compat` and `base-orphans`, which are no longer critical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants