-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Chop #188
Implement Chop #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWESOME!! This looks really great. I have just a few low-level suggestions, if you're interested!
flatgfa/src/cmds.rs
Outdated
let mut seg_map: Vec<(Id<Segment>, Id<Segment>)> = Vec::new(); | ||
let mut max_node_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could perhaps use comments to describe what they do and what invariants they maintain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, yep - I have added comments. max_node_id
is actually the maximum node_id
currently in existence + 1, so maybe I should pick a better name for it...
flatgfa/src/cmds.rs
Outdated
path_end = flat.add_steps( | ||
(start_idx..end_idx).map(|idx| { | ||
Handle::new( | ||
Id::new(idx), | ||
Orientation::Forward | ||
) | ||
}) | ||
).end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFINITELY not for this PR, but: I wonder if there's some kind of utility method that we can invent to help with this stuff. Impressionistically speaking, what we want here is...
let segs = seg_map[step.segment()];
flat.add_steps(segs.map(|s| Handle::new(s, Orientation::Forward)));
Like, we kind of want a way to do a map
directly on a chunk of segments, without having to fiddle with the index math here. Maybe we can think of a clever way to make that look nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! seg_map
currently maps from indexes (or segment names) to a tuple of Id
s, which preserves some type safety but could be better (if I wrap Segment.name
in an Id
, it would provide more type safety but I think I'd have to use a HashMap instead of a vector. Maybe there's a trait that could be implemented to automatically convert Id
s to ints
? Also, I had been thinking of the Id
wrapper as a zero cost abstraction, but actually, wouldn't it need to be stored on the heap instead of the stack, and wouldn't this add overhead in some cases?).
Anyways, this probably means that doing a map
directly on a chunk of segments is not particularly helpful here, since it would still entail mapping from indexes to segments. However, I think that implementing Range
or something similar, such that we can write something like start_id..end_id.map(|id| { Handle::new(Id, Orientation::Forward })
would help make this function more readable. In general, it seems like being able to treat Id
s like regular numbers is a good thing (as long as they're distinct to the type checker). Something like this could probably be implemented in a small follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is silly, but could the RHS of seg_map
contain Span<Segment>
instead of (Id<Segment>, Id<Segment>)
? Then we could perhaps implement a map
function on Span
s...
…away spans and ids would be nice, but if we're using ids, we might as well use spans
Chop works! After
cargo build --release
, try something likefgfa -I ../tests/k.gfa chop -c 3 -l
.-c 3
specifies that nodes are to be chopping into segments no longer than 3, and-l
specifies that the output file should compute newlinks
(at this time, it's still not clear to me what need we have for links, if any, but it would be easy to make computing links the default behavior or to always compute links). (Side note,slow_odgi
does not compute links - do we care to change this?)The basic algorithm for
chop
is as follows:One weird note here: the implementation of
chop
is split betweencmd.rs
andmain.rs
. The brunt of the work is done incmd.rs
, but the logic for which aspects of our original graph to preserve is inmain.rs
. It's unclear that a nice fix exists; because our new graph is borrowing elements from aGFAStore
created bychop
incmd.rs
, ownership of theGFAStore
must be passed to themain
function in order for our newFlatGFA
to be valid. The best fix may be to compute theFlatGFA
inchop
and return both theFlatGFA
andGFAStore
, but right now we do not.