Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boink extensions: or, streaming cDBG stuff #1821

Open
wants to merge 202 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
202 commits
Select commit Hold shift + click to select a range
18f0cad
Add basic skethes of streaming partitioner
camillescott Nov 2, 2016
e55b142
Sketch out searching and tag finding, component mapping
camillescott Nov 3, 2016
12141f9
First pass at cythonizing StreamingPartitioner
camillescott Nov 5, 2016
9f6c196
Move static member initialization to cc file
camillescott Nov 5, 2016
b3c6a3d
Working Cython bindings and initial tests for streaming partitioner
camillescott Nov 5, 2016
1967b74
Expose underlying StreamingPartitioner data
camillescott Nov 6, 2016
c3c57c7
Add in missing add_tag when creating component, debug statements
camillescott Nov 7, 2016
7f63817
Fix Component constructor, expose StreamingPartitioner containers
camillescott Nov 7, 2016
def411a
Fix Component n_merges tracking
camillescott Nov 7, 2016
9a3d827
Add tests for container access, n_merges, component merging, tag access
camillescott Nov 7, 2016
4648333
Add Cython to dev-packages
camillescott Nov 7, 2016
3d13cb8
Make ComponentPtr set compare on the component id; fix consume_sequen…
camillescott Nov 8, 2016
6e7201e
Pass sets by reference, map ALL tags from merged components to proper…
camillescott Nov 8, 2016
0479b8f
Add coverage guarantee to reads gen
camillescott Nov 8, 2016
b18d20d
Add tests for edge cases of k-1 neighbors and components from simulat…
camillescott Nov 8, 2016
efab055
Imrpve some code clarity, add tracker for number of components destro…
camillescott Nov 8, 2016
8d21366
Remove n_merges and expose n_destroyed to Python
camillescott Nov 8, 2016
fa9fde2
Manually collect garbage between tests to check for leaked Components
camillescott Nov 8, 2016
438826f
Add functions to check average component coverage, write out results
camillescott Nov 11, 2016
b03946b
Minor reorganization of control flow, always add at least one tag
camillescott Nov 11, 2016
72eeb5a
add the partitioning script
camillescott Nov 11, 2016
3659790
Use build_ext method for cython compilation
camillescott Nov 11, 2016
13b9578
gcc wants algorithm header for max_element
camillescott Nov 11, 2016
1dcd249
reduce report interval
camillescott Nov 11, 2016
19084f8
Add consume_fasta to c++ layer of partitioner
camillescott Nov 14, 2016
1007191
Expose consume_fasta and add staticmethod def'd for python for tag_co…
camillescott Nov 14, 2016
9d00007
consume_fasta dec and start porting Traversal
camillescott Nov 14, 2016
b50ae1b
Tag counts and consume fasta exposure
camillescott Nov 14, 2016
3cbb8d0
Add a pop_filter method to Traverser
camillescott Nov 14, 2016
8fe4326
Expose Traverser in pxd
camillescott Nov 14, 2016
aae706c
Reorganize oxli files
camillescott Nov 14, 2016
a99a482
Reorganization of Cython wrappers
camillescott Nov 14, 2016
77d0f29
Move all defs to _oxli.pxd
camillescott Nov 14, 2016
e3cad33
Update setup and Makefile for Cython reorg
camillescott Nov 14, 2016
d86dd80
Finish reorganization of cython files and add Kmer and Traverser
camillescott Nov 14, 2016
eeed8b8
Update cpp files
camillescott Nov 14, 2016
28095cf
Add normalized streaming option
camillescott Nov 15, 2016
748f5f5
basic wrapping of FastxParser
camillescott Nov 15, 2016
60f3597
Add cythonized BrokenPairedReader
camillescott Nov 17, 2016
e380028
Remove Cython-generated cpp files from main branch
camillescott Nov 17, 2016
9131a05
Update tests for new cython Extension organization
camillescott Nov 17, 2016
35e0d5a
Add paired-read consume function, split consume into tagging and comp…
camillescott Nov 17, 2016
3fd044d
Revert to always tagging the start and end k-mers in a read
camillescott Nov 17, 2016
6e9f909
Add tests for paired component merging
camillescott Nov 17, 2016
b58f3d6
Add SplitPairedReader
camillescott Nov 17, 2016
f7d7f90
Remove extraneous tag clear
camillescott Nov 17, 2016
cc0e4b1
Change name of Component.create to Component.wrap to clarify that it …
camillescott Nov 17, 2016
dc0d0a6
Isolated the partitioner into its own class
camillescott Nov 18, 2016
245921d
Update clean to remove __pycache__ folders and cython .so modulefiles
camillescott Nov 28, 2016
b0bda47
Add save methods to Component and StreamingPartitioner
camillescott Nov 28, 2016
9ec41bf
Add an option to require matching names for pairs or not
camillescott Nov 28, 2016
5a8a6ed
Modify output names, update reporting interval, expose graph
camillescott Nov 28, 2016
8139102
Add a load method for StreamingPartitioner
camillescott Nov 28, 2016
e41bd4d
Convert component_dict to property, update StreamingPartitioner prope…
camillescott Nov 28, 2016
7492b66
Add option to save partitioner to partitiong-streaming.py'
camillescott Nov 28, 2016
0a0ed67
Add a CompactingAssembler for building compact de bruijn graphs
camillescott Dec 3, 2016
f2d2c30
Add pxd files to MANIFEST to distribute cython bindings
camillescott Dec 3, 2016
825e23e
Always merge into largest component
camillescott Dec 3, 2016
f250009
Add locking to Partitioner
camillescott Dec 3, 2016
d7257d1
cdef some more variable types and fix stats writing at end of run
camillescott Dec 3, 2016
6270789
Expose LinearAssembler with cython bindings, flesh out Traverser
camillescott Dec 3, 2016
c91165e
Fix load issue with relative paths, fix enumeration of tags
camillescott Dec 3, 2016
9cb2fca
Add reverse complementation options to Kmer wrapper
camillescott Dec 3, 2016
b17c4e9
Add initial tests for cython wrapped assemblers
camillescott Dec 4, 2016
e65cd1c
add compacting assembler and tests
camillescott Dec 14, 2016
be2f166
add tag-density, track new kmers
camillescott Jan 10, 2017
5db005f
Modify partitioning test
camillescott Jan 17, 2017
5bd074e
Bring in updated Cython integration, new c++ arch
camillescott Feb 7, 2017
faa10c8
Use c++ prime function
camillescott Feb 7, 2017
312d90d
Bring in fixes for OSX python.h macro bug
camillescott Feb 8, 2017
29febda
Add add method to cython binding
camillescott Feb 9, 2017
0b0c67b
Bring in cython_all_the_things
camillescott Feb 22, 2017
ed518f5
Add citations to partitioning script
camillescott Feb 23, 2017
4038b9b
Add partitioning to lib Makefile
camillescott Mar 1, 2017
c98a331
Change output format
camillescott Mar 2, 2017
98fe61a
Fix wrong var name
camillescott Mar 2, 2017
665c4a1
Add a tag density getter
camillescott Mar 8, 2017
9c694ad
Finish update of output format
camillescott Mar 8, 2017
f971cc6
Move sample-reads-randompy to cythonized reader
camillescott Mar 9, 2017
c085fd0
merge
camillescott Mar 9, 2017
47cbf62
Fix Count and Nodetable namespaces in wrapper
camillescott Apr 4, 2017
eb0cbe4
Add repr to Kmer
camillescott Apr 4, 2017
5a322b4
Add kmer generator to Sequence
camillescott Apr 4, 2017
e2ab297
Make LinearAssembler functions virtual
camillescott Apr 10, 2017
0f5bbde
Remove smart pointer usage for Assembler wrappers
camillescott Apr 10, 2017
222deb4
Remove NonLoopingAT and merge its functionality into AssemblerTraverser
camillescott Apr 24, 2017
9240450
Split find connected tags call from consume_and_connect_tags, rename …
camillescott Apr 24, 2017
d8cb0cd
add args and kwargs for StreamingPartitioner subclassing
camillescott Apr 26, 2017
cfea1bf
Add slicing to Sequence
camillescott Apr 30, 2017
04b1614
Report number of partitions merged
camillescott Apr 30, 2017
eb47283
Use shared_ptr for StreamingPartiioner _this
camillescott Apr 30, 2017
d87348a
Fix bad string conversion
camillescott May 1, 2017
4533631
Add informative messages to KmerIterator exceptions
camillescott May 6, 2017
365fb49
Apply min_length filter to both pairs in SplitPairedReader
camillescott May 6, 2017
975591c
Move GuardedKmerMap into its own file and template in thread safety
camillescott May 6, 2017
9e1eb53
Remove unused merged method from Component
camillescott May 6, 2017
cf9b75f
Factor out tag to component mapping from Partitioner and remove most …
camillescott May 7, 2017
adfba25
remove stale comment
camillescott May 7, 2017
6417265
Have StreamingPartitioner derive ComponentnMap
camillescott May 8, 2017
2d3759b
Add some coverage dist tracking
camillescott May 10, 2017
1727b9e
comment debug stuff
camillescott May 10, 2017
2de73ce
colors
camillescott May 10, 2017
c11cc00
Add coverage tracking output
camillescott May 18, 2017
595b5b9
Bring in stuff from master cython_all_the_things branch
camillescott May 31, 2017
a83adc1
Merge new cythonization from master
camillescott Jul 6, 2017
faf5c8c
Merge in the cythonized Hashgraph PR
camillescott Sep 6, 2017
1378244
Move hashing functions to Cython
camillescott Sep 6, 2017
dd2e8e4
Convert get_version_cpp to Cython
camillescott Sep 6, 2017
17c6bab
Allow hash functions to accept string derivitives
camillescott Sep 6, 2017
533c57c
cythonize FILETYPES dict
camillescott Sep 6, 2017
80ba62c
move extraction functions to graph classes
camillescott Sep 7, 2017
c232d5b
Remove __future__ imports
camillescott Sep 7, 2017
8512174
Introduce paired_fastx_handler, update sample-reads-randomly
camillescott Sep 7, 2017
9682480
Split Sequence to its own module, add a clean method to Sequence, mak…
camillescott Sep 8, 2017
05452fb
Update trim-low-abund for cython
camillescott Sep 8, 2017
323167e
remove ReadParser import
camillescott Sep 8, 2017
aeb1e62
Switch split-paired-reads
camillescott Sep 8, 2017
7b798e1
Remove ReadParser from filter abund scripts
camillescott Sep 8, 2017
d009e84
Remove ReadParser from extract paired
camillescott Sep 8, 2017
64a9ff7
First pass at diginorm screed removal
camillescott Sep 8, 2017
caa692c
Convert diginorm to FastxParser, with exception of odd streaming issu…
camillescott Sep 8, 2017
8aaf112
First pass unifying consume functions and removing ReadParser from gr…
camillescott Sep 8, 2017
30ee23f
merge cleanup
camillescott Sep 8, 2017
198b893
update imports
camillescott Sep 8, 2017
255e522
Add headers to package data
camillescott Sep 8, 2017
8407500
Merge branch 'fix/installation' into projects/boink
camillescott Sep 8, 2017
57e8dc1
Move convert headers to include
camillescott Sep 8, 2017
742427e
Merge branch 'fix/installation' into projects/boink
camillescott Sep 8, 2017
66bd46c
Fix path
camillescott Sep 8, 2017
7749924
Merge branch 'fix/installation' into projects/boink
camillescott Sep 8, 2017
8b81ca5
PIMPL for CQF
camillescott Sep 8, 2017
3425b78
Merge branch 'fix/installation' into projects/boink
camillescott Sep 8, 2017
afcd7d7
Merge remote-tracking branch 'origin' into projects/boink
camillescott Sep 10, 2017
b84266e
Change gmap to unordered_map
camillescott Sep 12, 2017
813d367
Add a performance comparison of map v unordered_map
camillescott Sep 12, 2017
3c81f5c
Sketch Linked dbg and implemented get_junction_choices
camillescott Sep 12, 2017
1012ff3
stub
camillescott Sep 13, 2017
2e9a828
Merge branch 'feature/unordered_maps' into projects/boink
camillescott Sep 13, 2017
41c145a
More attributes on Link
camillescott Sep 14, 2017
6c4d382
Update terminology, add fw and rc junction discovery plus initial lin…
camillescott Sep 15, 2017
7d39c1d
First pass full c++ linked dbg
camillescott Sep 28, 2017
bf16d16
expand test to check count
camillescott Sep 28, 2017
f3f52f8
Require new junctions in order to create Link
camillescott Sep 29, 2017
9e55232
Link content assert in test
camillescott Sep 29, 2017
6fc247d
Sketch out linked assembler
camillescott Sep 29, 2017
12316d4
Update assembler sketching
camillescott Sep 29, 2017
56e6c75
work on link trversal abstractions
camillescott Oct 2, 2017
979199f
add left flanking node to junction
camillescott Oct 7, 2017
f49e7f1
expand 3 node junction
camillescott Oct 9, 2017
9fac2f4
move toward link tree
camillescott Oct 12, 2017
590662c
switch toward compact dbg?
camillescott Oct 20, 2017
b166e3b
First pass pre compile
camillescott Oct 24, 2017
6d36af4
Bring down more base class methods to ATs, have neighbors return dire…
camillescott Oct 25, 2017
f394557
First pass compiling, partially working...
camillescott Oct 25, 2017
a4d9abf
ignore ctags file
camillescott Oct 25, 2017
ac95a46
Modify seeding method to create tags around high degree nodes
camillescott Oct 25, 2017
e80e372
transition toward constrained approach...
camillescott Oct 26, 2017
6b696f0
First pass at bounded compact dbg updating...
camillescott Nov 1, 2017
814a4f1
compator redux with left exploration
camillescott Nov 1, 2017
c60c239
Fix kmer indexing for compact edges, now working on forks
camillescott Nov 2, 2017
e36dc8f
Add revcomp test and remove some debug output
camillescott Nov 2, 2017
975744f
WIP
camillescott Nov 8, 2017
0eb8551
wip 2, compiling directionally aware
camillescott Nov 9, 2017
e74fa05
fix bad strncmp
camillescott Nov 9, 2017
09efacb
Fix edge case where out or in edge from discovered node which skips i…
camillescott Nov 10, 2017
c3dcff0
add trivial edges
camillescott Nov 13, 2017
67d5d55
Include HDN sequence in edge sequence
camillescott Nov 14, 2017
c673b36
Fix trivial edge length
camillescott Nov 14, 2017
65bfb5f
Test right half of x structure
camillescott Nov 14, 2017
873232a
Convert K size to fixture
camillescott Nov 15, 2017
2b7487c
fix K value in hdn_counts
camillescott Nov 15, 2017
8486a40
Change set_ksize to using_ksize
camillescott Nov 15, 2017
9a02638
Update partitioning tests with ksize fixture, reduce test case sizes
camillescott Nov 15, 2017
03de927
basic triple fork test
camillescott Nov 15, 2017
74a32c0
Functionalize the graph structure fixtures
camillescott Nov 16, 2017
42f837d
add tandem hdn fixture
camillescott Nov 16, 2017
2a070c5
addtrivial edge test
camillescott Nov 16, 2017
efcf1f3
fix trivial edge meta deduction, names
camillescott Nov 16, 2017
af05744
Fix check for matching canonical orientations between edge sequence a…
camillescott Nov 17, 2017
92df38b
Add some docs to the confusing orientation functions
camillescott Nov 17, 2017
f55e367
Better trivial edge test granularity
camillescott Nov 17, 2017
d67927d
expose CompactNode.degree
camillescott Nov 17, 2017
228e61c
Add linear merge
camillescott Nov 17, 2017
69a4eb6
Add GML output and Edge map
camillescott Nov 22, 2017
c0ef229
add write_fasta
camillescott Dec 9, 2017
ca75991
Fix edge case with reads intersecting non-induced HDNs
camillescott Dec 9, 2017
d1d4c5c
wrap node and edge factories
camillescott Dec 9, 2017
d5bb069
fix cythonization in setup
camillescott Dec 9, 2017
ac53da7
update queue arg
camillescott Dec 16, 2017
c23ec4a
merge master into boink
camillescott Jan 9, 2018
b5a04f4
Merge remote-tracking branch 'origin' into tmp_mrg_boink
camillescott Jan 9, 2018
302bb16
Update streaming diginorm tests for FastxParser
camillescott Jan 10, 2018
495c1b0
cythonized script does not work with stdout and err injection
camillescott Jan 11, 2018
9957b2c
convet load-graph to FastxParser
camillescott Jan 11, 2018
2d9e79d
switch streaming input to dash instead of stdin
camillescott Jan 11, 2018
bbf7c4b
update assembly tests from master for ksize fixture
camillescott Jan 11, 2018
5d62b3e
Change links and graphlinks filenames to cdbg basename
camillescott Jan 11, 2018
cf9e6c3
unidirectional mrmur
camillescott Jan 27, 2018
c9456ce
fix murmurhash call
camillescott Feb 22, 2018
141f221
include memory header in storage.hh for export
camillescott Feb 22, 2018
fa76c85
Merge branch 'projects/boink' of github.com:dib-lab/khmer into projec…
camillescott Feb 22, 2018
f6e9efb
Improve Cyclic and Murmur forward hash impls
camillescott Feb 27, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add left flanking node to junction
camillescott committed Oct 7, 2017
commit 979199f612074fc72b694319736eb53f51a0e1f7
164 changes: 149 additions & 15 deletions include/oxli/links.hh
Original file line number Diff line number Diff line change
@@ -37,6 +37,7 @@ Contact: [email protected]
#ifndef LINKS_HH
#define LINKS_HH

#include <algorithm>
#include <functional>
#include <memory>
#include <list>
@@ -54,21 +55,66 @@ Contact: [email protected]
namespace oxli {

struct Junction {
// [u]-->[v (HDN)]-->[w]
HashIntoType u;
HashIntoType v;
//uint32_t distance_prev;
HashIntoType w;
uint64_t count;
Junction() = default;
HashIntoType id() const { return u ^ v; }

HashIntoType id() const { return u ^ v ^ w; }
bool matches(HashIntoType u, HashIntoType v) { return (u ^ v ^ w) == id(); }

friend std::ostream& operator<< (std::ostream& stream,
const Junction& j);
friend bool operator== (const Junction& lhs,
const Junction& rhs) {
return lhs.id() == rhs.id();
}
};


typedef std::list<Junction*> JunctionList;

#define FW 1
#define RC 0

class LinkNode {
private:
static uint64_t counter;

public:

LinkNode* next;
Junction* junction;
HashIntoType from;
uint64_t node_id;

LinkNode(Junction* junction, HashIntoType from) :
junction(junction), from(from), time_created(time_created),
node_id(counter), next(nullptr)
{
++counter;
}
};


class LinkHead {
private:
static uint64_t counter;
public:
uint64_t link_id;
LinkNode* start;
HashIntoType from;
bool forward;

LinkHead(LinkNode* start, HashIntoType from, bool forward,
uint64_t time_created) :
start(start), from(from), forward(forward), time_created(time_created)
{
++counter;
}
};


class Link {
private:
@@ -174,50 +220,131 @@ class LinkCursor
{
public:
Link* link;
uint64_t age;
uint64_t traversal_age;
JunctionList::iterator cursor;

LinkTraversal(Link* link, uint64_t age) :
link(link), age(age), cursor(link->begin())
link(link), traversal_age(age), cursor(link->begin())
{
}

bool done() {
return cursor == link->end();
}

Junction* current() const {
return &cursor;
}

bool increment() {
++cursor;
return done();
}

friend bool operator==(const LinkCursor& lhs, const LinkCursor& rhs)
{
return lhs->link == rhs->link;
}
};


// Compare the two link cursors first by their age within the
// traversal, then by their global age. We prefer links
// that were found earlier in the traversal, but were created
// most recently.
bool CompareLinks(const LinkCursor& a, const LinkCursor& b)
{
if (a.traversal_age < b.traversal_age) { return false; }
if (b.traversal_age < a.traversal_age) { return true; }

if ((a.link)->time_created() > (b.link)->time_created()) { return false; }
if ((b.link)->time_created() > (a.link)->time_created()) { return true; }

return false;
}

/*
class LinkTraversal
{
std::shared_ptr<std::list<LinkCursor>> link_cursors;
typedef std::priority_queue<LinkCursor,
std::vector<LinkCursor>,
CompareLinks> CursorQueue;
std::shared_ptr<CursorQueue> link_cursors;
std::shared_ptr<std::set<HashIntoType>> constraints; // constraint junction ids
LinkCursor last_link;

LinkTraversal()
LinkTraversal() : has_active_cursor(false)
{
link_cursors = std::make_shared<std::list<LinkCursor>>();

link_cursors = std::make_shared<CursorQueue>();
}

void add_links(std::shared_ptr<LinkList> links,
uint64_t age)
{
for (Link* link: &links) {
link_cursors->push_back(LinkCursor(link, age));
link_cursors->push(LinkCursor(link, age));
}
}

void pop_all(Junction* to_pop)
{
for (LinkCursor cursor : &link_cursors) {
if (!cursor.done() && (to_pop == cursor.current())) {
cursor.increment();
}
}
}

bool get_top_link(&LinkCursor result)
{
while(link_cusors->size() > 0) {
if (!link_cusors->top()->done()) {
result = link_cusors->top();
return true;
} else {
link_cusors->pop();
}
}
return false;
}

template<bool direction>
void try_link_neighbors(AssemblerTraverser<direction> cursor)
bool try_link_neighbors(AssemblerTraverser<direction> asmt)
{
KmerQueue neighbors;
cursor.neighbors(neighbors);
asmt.neighbors(neighbors);
bool decided = false;
Kmer src = asmt.cursor;
Kmer dst;

LinkCursor link;
bool has_link = get_top_link(link);
if (!has_link) {
return false;
}

if (has_active_cursor) {

for (Kmer neighbor : neighbors) {
if ((&active_cursor.current())->matches(neighbor.kmer_f,
neighbor.kmer_r)) {
decided = true;
dst = neighbor;
}
}


} else {


}

}



};
*/


typedef std::unordered_multimap<HashIntoType, Link*> LinkMap;
@@ -330,7 +457,7 @@ public:
{
std::cout << "build_links()" << std::endl;
KmerIterator kmers(sequence.c_str(), graph->ksize());
Kmer u, v;
Kmer u, v, w;
uint64_t d = 0;

u = kmers.next();
@@ -340,6 +467,11 @@ public:
return;
}
v = kmers.next();
if (kmers.done()) {
fw_link = rc_link = nullptr;
return;
}
w = kmers.next();
++d;

std::cout << " - build_links: allocate new Link*" << std::endl;
@@ -365,7 +497,8 @@ public:
}

u = v;
v = kmers.next();
v = w;
w = kmers.next();
++d;
}

@@ -475,6 +608,7 @@ public:

};

/*

class LinkedAssembler
{
@@ -499,7 +633,7 @@ public:
void _assemble_directed(AssemblerTraverser<direction>& cursor,
StringVector& paths) const;
};

*/


}
3 changes: 2 additions & 1 deletion src/oxli/links.cc
Original file line number Diff line number Diff line change
@@ -16,6 +16,7 @@ namespace oxli {

uint64_t Link::n_links = 0;

/*
template <bool direction>
void LinkedAssembler::
_assemble_directed(AssemblerTraverser<direction> cursor,
@@ -64,6 +65,6 @@ const

}
}

*/

}