Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Allow (read only) access to the directed flag complex via python #57

Open
flomlo opened this issue Dec 18, 2020 · 16 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@flomlo
Copy link
Contributor

flomlo commented Dec 18, 2020

Motivation

The implementation of the data structure needed for the directed flag complex seems to be quite memory efficient in the flagser code. The same holds true for the generation of the directed flag complex.
Despite betti-numbers and simplex counts, which are currently available via the pyflagser api, one can make use of the flag complex to calculate other interesting characteristics about the underlying directed graph.

Goal

  • Enhance pyflagser with the option to generate the directed flag complex (via flagser)
  • Python bindings to this C++ flag complex object.

As the flag complex corresponding to a directed graph should not be muteable anyway (once generated), read-only access would be enough.

Current progress:

I volunteer to look into this matter, as I need it for my own project anyway.

  • I've got reasonable knowledge of directed flag complex generation (already implemented that in python myself)
  • I've got absolutely no clue about python/C++ bindings and thus have research that first.
  • I've never contributed to a project which takes itself serious. So please have patience with me :)

Hit me up if you're interested in discussing details!

Versions

Python: 3.9, CPython interpreter
pyflagser: 0.4.3

@flomlo flomlo changed the title Feature request: Allow (read only) access to the directed flag complex with python Feature request: Allow (read only) access to the directed flag complex via python Dec 18, 2020
@ulupo
Copy link
Contributor

ulupo commented Dec 18, 2020

Thanks @flomlo! I have some questions:

  • In ripser, the full boundary matrix is not stored in memory as a whole. Rather, its entries are computed on-demand. I thought this was also the case for the filtered flag complex itself, and hence that this was also the case for flagser. Does C++ flagser have the capability of outputting a full flag complex in a single data structure? I worry that this would be extremely large for all but very small graphs.
  • Do you expect that adding this feature might impact the runtime/performance of existing features? My impression is that this would not be the case but I just want to make sure (it would be a no-no most likely if that were to be the case).

@ulupo ulupo added good first issue Good for newcomers enhancement New feature or request labels Dec 18, 2020
@flomlo
Copy link
Contributor Author

flomlo commented Dec 18, 2020

Hi,

  • I've just glanced over the the flagser-code but there seems to be some kind of flag_complex_in_memory function/template/whatever in the code. I'm not very good at reading C++ code, but my impression was that there was at least the possibility of a flag complex in memory. But yes, that thing may grow massive. Still, less massive than a native Python implementation I suppose.
  • I would be most surprised if that would have any influence on the existing pyflagser code. I do not plan to touch them at all, and if I understand correctly the more or less directly link to the corresponding flagser functions.

@ulupo
Copy link
Contributor

ulupo commented Dec 18, 2020

I've just glanced over the the flagser-code but there seems to be some kind of flag_complex_in_memory function/template/whatever in the code. I'm not very good at reading C++ code, but my impression was that there was at least the possibility of a flag complex in memory

I see! I would be very interested to know.

Still, less massive than a native Python implementation I suppose.

I'm not sure I see why. Computational runtime would be greatly reduced, but the final product would presumably be a Python object.

@flomlo
Copy link
Contributor Author

flomlo commented Dec 18, 2020

I'll let you know once I know more details!

@flomlo
Copy link
Contributor Author

flomlo commented Dec 18, 2020

I've just glanced over the the flagser-code but there seems to be some kind of flag_complex_in_memory function/template/whatever in the code. I'm not very good at reading C++ code, but my impression was that there was at least the possibility of a flag complex in memory

I see! I would be very interested to know.

ok, so the flagser/include/complex/directed_flag_complex_in_memory.h has a suspicious file name and even contains the comment
// Loading the whole complex into memory, trading memory for computation speed

So I'm pretty sure this option exists :D

Now I just need to figure out how to bind C++ datastructures/objects to python (read only) in the least painful way, if even possible.
How much memory that then consumes is another question.

In the paper it is stated that flagser usually computes the locally useful parts of the flag complex on demand, as it is done quickly enough. I'm not sure if the "locally useful" parts for my computations align with whats necessary for homology-computation. So I'm not sure if I'm motivated enough to bind the not-in-memory-flag-complexes stuff as well. Let's see.

@MonkeyBreaker
Copy link
Collaborator

Hello everyone and happy new year !

@flomlo I never played with the in_memory option from flagser and I made some changes in the C++ backend of flagser. As I said in the PR corresponding to the changes (in flagser), I never tested the in_memory features, I don't know if in the current master in flagser repository, still works fine when enabling this feature. Maybe Daniel validate it on his setup, but on my side, I did not validate it (this would be the occasion to write some regression test for this feature :D).

I'll have a look on this by the end of the week, I need to understand what exactly can be outputted by the in_memory data structure.

If you have already something, feel free to prepare a PR, I can help you if you're stuck in a specific issue. But from a first feeling, this feature will require two PR, one in flagser to retrieve the in_memory data, and a second one in pyflagser to retrieve the new data from flagser

Best,
Julián

@flomlo
Copy link
Contributor Author

flomlo commented Jan 11, 2021

Hi!

So far I've only theoretically analzized the flagser code. My main problem is currently that (even with in_memory), the flag complex in flagser does not save all the cofaces of dimension d+1 (which, I think, are called coboundaries?).
See this simple sketch on what I've expected versus what is there: simplex stuff.pdf

The coboundaries seem to be computed on the fly are then directly written to a coboundary matrix. I'm still trying to figure out if/how I can restore the full flag complex from the flagser flag complex and the information in the coboundary matrix.

Once I feel like this works I'll play around with it a bit and will do a pull request then.

Binding with pybind11 seems to be a smaller issue that I thought, that tool is moderately well documented.

Just bear with my slowness, it's the first time reading (and binding) C++ code in my life, progress is slow and hard :D

Best,

Florian

@MonkeyBreaker
Copy link
Collaborator

Hi !

So far I've only theoretically analysed the flagser code. My main problem is currently that (even with in_memory), the flag complex in flagser does not save all the cofaces of dimension d+1 (which, I think, are called coboundaries?).

Yes, I think you're right, cofaces are generated on the fly and they are not stored in memory.

If I'm not wrong, what you're interested in is the data in directed_flag_complex_cell_in_memory_t class.
This data as I understood correspond to a pair of <index, value> where the index is the index corresponding to a certain simplex and the value to the filtration of the corresponding simplex (I'm not 100% sure about this).

In order to retrieve this data, the method get_data is there. This method takes as input the dimension and the corresponding vertices for a simplex. There's a third parameter "offset" but it's only used to iterate over the vertices calling recursively the method get_data.

Then what I think we should do is as follow:

  • Create a function to retrieve the directed flag complexes computed after compute_persistence was called in flagser.cpp
  • In this function iterate over all the dimensions, and look for all the simplicies (I'm not sure how to do that quite efficiently).
  • Retrieve the data in a vector by dimension

In my opinion, this should be discussed directly in the flagser repository, and then we could implement a retrieving method inside of flagser and only then create the necessary bindings.

@luetge Hi Daniel, I tag you directly here instead of creating a new issue in flagser, because there's an ongoing discussion for the possibility to retrieve (in python) the in memory data available when we compile with the option KEEP_FLAG_COMPLEX_IN_MEMORY.

From my first understanding and suggestion, do you think it's doable to retrieve this in memory data ?
If you don't see any issues,anyone interested could prepare a PR inside of flagser to retrieve this information.

Best,
Julián

@flomlo
Copy link
Contributor Author

flomlo commented Jan 14, 2021

Hi Julián,

first and foremost thanks for supporting me :)

After an additional day of staring at the code I'm now pretty sure that the coface-generating-magic is entirely hidden in the operator function of store_coboundaries_in_cache_t.

This would be line 258 to 306 in https://github.com/luetge/flagser/blob/7ffe92782ae639b4265a52403d2d90d3151bb67b/include/complex/directed_flag_complex_in_memory_computer.h#L258.

In that code we have
a) an outer loop, which loops over all simplices of a given dimension,
b) a middle loop, which loops over all possible positions i where a new vertex could be inserted
c) an innermost loop, which loops over all possible vertices b which could be inserted at position i.

This is precisely the information I would like to know about on the python side.

The only problem so far is that this operator routine now saves all this information in a coboundary_matrix, with entries I haven't fully understood yet. To be more precise: not at all.

If I can reconstruct the i and b data from the coboundary matrix I would be happy with just exporting that to python. This would mean zero(?) code changes on the flagser side I think.

Otherwise, the right angle of attack (at least for me) would be somewhere around there, separating the search for cofaces and the generation of the coboundary_matrix.

As far as my debugging shows, the ExtraData saved in the directed_flag_complex_cell_in_memory_t does not really save anything important to me. This might change if filtration comes into play though ... So far I've only analyzed the code on unweighted graphs (on test/e.flag mostly). But so far it saves only pairs of zeros.

Best,
Florian

@flomlo
Copy link
Contributor Author

flomlo commented Jan 15, 2021

Ok, I've now understood the final piece in the puzzle, I hope.

Simplices of dimension 2 and onwards get a number (simply starting with 0, counting upwards). The particular number (current_index set in https://github.com/luetge/flagser/blob/7ffe92782ae639b4265a52403d2d90d3151bb67b/include/complex/directed_flag_complex_in_memory_computer.h#L64) is then the first data point saved in directed_flag_complex_cell_in_memory_t<pair<a,b>> on position a. This is also the number saved in the coboundary_matrix.

I do not see however how to quickly jump to this simplex with just the simplex index given.

Furthermore, I do not see how to regain the insert_position i and insert_vertex b (see post above) from the information of the simplex index saved in the coboundary_matrix.

Thus I agree with you @MonkeyBreaker:
Exporting the relevant data for the flag complex to a python interface would require modifying flagser a little bit.

I'll have a look on where to split the operator function of store_coboundaries_in_cache_t so that it is well readable from python and won't slow down flagser notably.

@flomlo
Copy link
Contributor Author

flomlo commented Jan 15, 2021

Forgot to attach my pen-and-paper analysis of the flag complex e.flag which showcases what the coboundary_matrix saves

gdb_debug_flagser-memory_on_flag_e.pdf

@luetge
Copy link

luetge commented Mar 29, 2021

Hi everyone,

Sorry for the extremely late response. The storage of the flag complex simplices is quite straightforward (prefix trees either in memory or on the fly), all the magic that is performed in the code you are discussing is about choosing labels for the simplices of a fixed dimension (starting with label zero and counting up). This is needed to choose a basis for describing the coboundary operator with a coboundary matrix. The entries of that matrix are then computed by iterating over all possible insertion points as you rightfully pointed out. I did not have a look at that code for quite some time, but if I am not mistaken the entries of the matrix are given by (index of d+1 simplex, coefficient). Looking up index -> simplex is simply done by some sort of binary search on the tree, but the most efficient way would probably be too explicitly call the coboundary computation function to generate the matrix on demand when asked for in python (probably needs some refactoring of that part of the code).

This should not have performance implications for the rest of the code, so I am happy to receive a pull request.

Best,
Daniel

@MonkeyBreaker
Copy link
Collaborator

Hi Daniel,

Thank you so much for your answer and the details.
The idea I have (but didn't have the time to implement it): In class directed_flag_complex_in_memory_computer_t , create a method that allows you to retrieve all directed flag complex of the current dimension (after calling prepare_next_dimension(dim)).

If you think that this is a valid approach, I can prepare the necessary in order to have working example. If not, I need to dig a bit more 😅

Best,
Julián

@luetge
Copy link

luetge commented Mar 30, 2021

Hi,

That would work I think, happy to see it in flagser.

Best, Daniel

@flomlo
Copy link
Contributor Author

flomlo commented Apr 29, 2021

I'm considering offering a proper implementation of this interface and some documentation alongside to interested Master Students (CS) as a Master Thesis / Project.
It is a real problem and a proper, clean solution will be some work.

Not sure if the result will be up to everyone's standard, but it might be good starting point!

@MonkeyBreaker: You're way deeper into serious C++ programming that I am, do you think this is a reasonable idea? Or would you recommend against?

@MonkeyBreaker
Copy link
Collaborator

@flomlo well I wouldn't have at the moment any use cases on my side, but the more tools we have to explore the data, the better in my opinion !

I think it is a reasonable idea, it would be a great opportunity to think about the design when at the same time we add features. See if there are some parts that could be made more flexible.

I would be more than glad to discuss about the implementation and to review the work done, but I won't be able to follow actively during the implementation. I am afraid the following months will be quite loaded on my side ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants