-
Notifications
You must be signed in to change notification settings - Fork 25
McCortex HTTP Webserver
Python script scripts/mccortex-server.py
starts a webserver which answers queries with JSON:
usage: python scripts/python/mccortex-server.py <port> [-k,--kmer <K>] [mccortex args]
The script starts up McCortex and loads the graph and link files, which are stored in memory. It then starts listening on the given port for HTTP connections. Starting the server:
./scripts/mccortex-server.py 2306 -k 21 -m 80G --coverages --edges -p NA12878.ctp.gz NA12878.ctx
You can then query the server about a kmer:
curl localhost:2306/CAGTGGCCA
Response:
{ "key": "CAGTGGCCA", "colours": [15], "left": "T", "right": "T", "edges": "88", "links": [{"forward": false, "juncs": "A", "colours": [1]}] }
You can query the server for a random kmer:
curl localhost:2306/random
Response:
{ "key": "CATCAGTGG", "colours": [1], "left": "GT", "right": "C", "edges": "c2", "links": [] }
You can request info about the graph:
curl localhost:2306/info
Response:
{"file_key":"ad13f293a8cf1794","graph":{"num_colours":1,"kmer_size":9,"num_kmers_in_graph":29,"colours":[{"colour":0,"sample":"Genome","total_sequence":48,"cleaned_tips":false,"cleaned_supernodes":false}]},"commands":[{"key":"1bdea7a5","cmd":["/Users/isaac/mccortex/scripts/../bin/mccortex31","server","--single-line","-p","reads.pe.two.ctp","genome.k9.ctx"],"cwd":"/Users/isaac/mccortex/tests/thread_pe_short","date":"2015-09-14 16:31:38","mccortex":"v0.0.3-330-g0aa99ba-dirty","htslib":"1.2.1-195-g81b173e-dirty","zlib":"1.2.5","user":"isaac","host":"Montag.local","os":"Darwin","osrelease":"14.5.0","osversion":"Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64","hardware":"x86_64","prev":["fc240a62"]},{"key":"fc240a62","cmd":["../../bin/mccortex31","thread","-m","1M","--print-contigs","--two-way","--seq2","read.1.fa:read.2.fa","-o","reads.pe.two.ctp","genome.k9.ctx"],"cwd":"/Users/isaac/mccortex/tests/thread_pe_short","out_path":"/Users/isaac/mccortex/tests/thread_pe_short/reads.pe.two.ctp","out_key":"3ed90113f3bbde93","date":"2015-09-09 17:42:15","mccortex":"v0.0.3-327-ge458358-dirty","htslib":"1.2.1-195-g81b173e-dirty","zlib":"1.2.5","user":"isaac","host":"Montag.local","os":"Darwin","osrelease":"14.5.0","osversion":"Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64","hardware":"x86_64","prev":[],"thread":{"inputs":[{"files":["read.1.fa","read.2.fa"],"interleaved":false,"fq_offset":0,"fq_cutoff":0,"hp_cutoff":0,"matepair":"FR","frag_len_min_bp":0,"frag_len_max_bp":1000,"one_way_gap_fill":false,"use_end_check":false,"max_context":1,"gap_variance":0.100000,"gap_wiggle":5}]}}],"paths":{"num_kmers_with_paths":2,"num_paths":2,"path_bytes":2}}
-
key
(string): the kmer key of the requested kmer -
colours
(list): a list of coverage in each colour -
left
(string): bases that prepend to the left of the kmer key to give another kmer in the graph -
right
(string): bases that append to the right of the kmer key to give another kmer in the graph -
edges
(string): two characters (per sample of edges if--edges
given) in hexadecimal coding -
links
(list): a list of links:-
forward
(boolean): if this link leaves to the right of the kmer -
juncs
(string): sequence of junction choices made by the link -
colours
(list): a list of link coverages in each colour
-
Bases {A,C,G,T}
are represented as {1,2,4,8}
respectively. One character for the edges to the left, another for the edges to the right. Example: kmer CAGTGGCCA
with edges c3
(=> 0b 1100 0011
) means {G,T}
to the left, {A,C}
to the right:
[G]CAGTGGCC __ __ AGTGGCCA[A]
\_ CAGTGGCCA _/
[T]CAGTGGCC __/ \__ AGTGGCCA[C]
-
-C,--coverages
: if set then kmer/link coverages are counts, otherwise they are 0/1 values. Storing coverages requires more memory. -
-E,--edges
: if set thenedges
is a list of edges in each sample, two characters per colour. Otherwise all sample edges are merged into colour 0. Storing sample rather than population edges increases memory. -
-m,--memory <mem>
: how much memory for the server to use -
-p,--paths <in.ctp.gz>
: links file to load
The idea is that the webserver can be used to aid visualisation, haplotyping and other third party applications of de Bruijn graphs and McCortex. Loading a human sample can take ~20 minutes, so loading up a server and being able to get instant answers to kmer queries is useful.