Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Chop #188

Merged
merged 23 commits into from
Sep 12, 2024
Merged

Implement Chop #188

merged 23 commits into from
Sep 12, 2024

Conversation

susan-garry
Copy link
Contributor

@susan-garry susan-garry commented Jul 8, 2024

Chop works! After cargo build --release, try something like fgfa -I ../tests/k.gfa chop -c 3 -l. -c 3 specifies that nodes are to be chopping into segments no longer than 3, and -l specifies that the output file should compute new links (at this time, it's still not clear to me what need we have for links, if any, but it would be easy to make computing links the default behavior or to always compute links). (Side note, slow_odgi does not compute links - do we care to change this?)

The basic algorithm for chop is as follows:

seg_map;     // map from old segments to their new, chopped counterparts
for each segment:
    chop into segments of size c or smaller
    if args.l:
         link the new segments together, from head to tail (i.e., in the forward orientation)
    update seg_map

for each path:
    new_path;
    for each step in path:
        for new_seg in seg_map(step.seg):
              append new_seg to our new_path
    add new_path to new_fgfa

if args.l:
    for link (A -> B) in old_fgfa:
        add a new link from
             (A.forward ? (A.end, forward) : (A.begin, backwards))
                 -> (B.forward ? (B.begin, forward) : (B.end ? backwards))

One weird note here: the implementation of chop is split between cmd.rs and main.rs. The brunt of the work is done in cmd.rs, but the logic for which aspects of our original graph to preserve is in main.rs. It's unclear that a nice fix exists; because our new graph is borrowing elements from a GFAStore created by chop in cmd.rs, ownership of the GFAStore must be passed to the main function in order for our new FlatGFA to be valid. The best fix may be to compute the FlatGFA in chop and return both the FlatGFA and GFAStore, but right now we do not.

Copy link
Collaborator

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWESOME!! This looks really great. I have just a few low-level suggestions, if you're interested!

.github/workflows/build.yml Outdated Show resolved Hide resolved
.github/workflows/build.yml Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
bench/bench.py Outdated Show resolved Hide resolved
bench/config.toml Outdated Show resolved Hide resolved
flatgfa/src/cmds.rs Outdated Show resolved Hide resolved
Comment on lines 350 to 351
let mut seg_map: Vec<(Id<Segment>, Id<Segment>)> = Vec::new();
let mut max_node_id = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could perhaps use comments to describe what they do and what invariants they maintain?

Copy link
Contributor Author

@susan-garry susan-garry Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, yep - I have added comments. max_node_id is actually the maximum node_id currently in existence + 1, so maybe I should pick a better name for it...

Comment on lines 429 to 436
path_end = flat.add_steps(
(start_idx..end_idx).map(|idx| {
Handle::new(
Id::new(idx),
Orientation::Forward
)
})
).end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFINITELY not for this PR, but: I wonder if there's some kind of utility method that we can invent to help with this stuff. Impressionistically speaking, what we want here is...

let segs = seg_map[step.segment()];
flat.add_steps(segs.map(|s| Handle::new(s, Orientation::Forward)));

Like, we kind of want a way to do a map directly on a chunk of segments, without having to fiddle with the index math here. Maybe we can think of a clever way to make that look nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! seg_map currently maps from indexes (or segment names) to a tuple of Ids, which preserves some type safety but could be better (if I wrap Segment.name in an Id, it would provide more type safety but I think I'd have to use a HashMap instead of a vector. Maybe there's a trait that could be implemented to automatically convert Ids to ints? Also, I had been thinking of the Id wrapper as a zero cost abstraction, but actually, wouldn't it need to be stored on the heap instead of the stack, and wouldn't this add overhead in some cases?).

Anyways, this probably means that doing a map directly on a chunk of segments is not particularly helpful here, since it would still entail mapping from indexes to segments. However, I think that implementing Range or something similar, such that we can write something like start_id..end_id.map(|id| { Handle::new(Id, Orientation::Forward }) would help make this function more readable. In general, it seems like being able to treat Ids like regular numbers is a good thing (as long as they're distinct to the type checker). Something like this could probably be implemented in a small follow-up PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is silly, but could the RHS of seg_map contain Span<Segment> instead of (Id<Segment>, Id<Segment>)? Then we could perhaps implement a map function on Spans...

flatgfa/src/cmds.rs Outdated Show resolved Hide resolved
flatgfa/src/cmds.rs Outdated Show resolved Hide resolved
@sampsyo sampsyo mentioned this pull request Jul 12, 2024
@susan-garry susan-garry merged commit ccf1911 into main Sep 12, 2024
10 checks passed
@susan-garry susan-garry deleted the chop branch September 12, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants