Skip to content

McCortex HTTP Webserver

Isaac Turner edited this page Sep 28, 2016 · 7 revisions

Python script scripts/mccortex-server.py starts a webserver which answers queries with JSON:

usage: python scripts/python/mccortex-server.py <port> [-k,--kmer <K>] [mccortex args]

The script starts up McCortex and loads the graph and link files, which are stored in memory. It then starts listening on the given port for HTTP connections. Starting the server:

./scripts/mccortex-server.py 2306 -k 21 -m 80G --coverages --edges -p NA12878.ctp.gz NA12878.ctx

Kmer query

You can then query the server about a kmer:

curl localhost:2306/CAGTGGCCA

Response:

{ "key": "CAGTGGCCA", "colours": [15], "left": "T", "right": "T", "edges": "88", "links": [{"forward": false, "juncs": "A", "colours": [1]}] }

Random kmer

You can query the server for a random kmer:

 curl localhost:2306/random

Response:

{ "key": "CATCAGTGG", "colours": [1], "left": "GT", "right": "C", "edges": "c2", "links": [] }

Graph info

You can request info about the graph:

curl localhost:2306/info

Response:

{"file_key":"ad13f293a8cf1794","graph":{"num_colours":1,"kmer_size":9,"num_kmers_in_graph":29,"colours":[{"colour":0,"sample":"Genome","total_sequence":48,"cleaned_tips":false,"cleaned_supernodes":false}]},"commands":[{"key":"1bdea7a5","cmd":["/Users/isaac/mccortex/scripts/../bin/mccortex31","server","--single-line","-p","reads.pe.two.ctp","genome.k9.ctx"],"cwd":"/Users/isaac/mccortex/tests/thread_pe_short","date":"2015-09-14 16:31:38","mccortex":"v0.0.3-330-g0aa99ba-dirty","htslib":"1.2.1-195-g81b173e-dirty","zlib":"1.2.5","user":"isaac","host":"Montag.local","os":"Darwin","osrelease":"14.5.0","osversion":"Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64","hardware":"x86_64","prev":["fc240a62"]},{"key":"fc240a62","cmd":["../../bin/mccortex31","thread","-m","1M","--print-contigs","--two-way","--seq2","read.1.fa:read.2.fa","-o","reads.pe.two.ctp","genome.k9.ctx"],"cwd":"/Users/isaac/mccortex/tests/thread_pe_short","out_path":"/Users/isaac/mccortex/tests/thread_pe_short/reads.pe.two.ctp","out_key":"3ed90113f3bbde93","date":"2015-09-09 17:42:15","mccortex":"v0.0.3-327-ge458358-dirty","htslib":"1.2.1-195-g81b173e-dirty","zlib":"1.2.5","user":"isaac","host":"Montag.local","os":"Darwin","osrelease":"14.5.0","osversion":"Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64","hardware":"x86_64","prev":[],"thread":{"inputs":[{"files":["read.1.fa","read.2.fa"],"interleaved":false,"fq_offset":0,"fq_cutoff":0,"hp_cutoff":0,"matepair":"FR","frag_len_min_bp":0,"frag_len_max_bp":1000,"one_way_gap_fill":false,"use_end_check":false,"max_context":1,"gap_variance":0.100000,"gap_wiggle":5}]}}],"paths":{"num_kmers_with_paths":2,"num_paths":2,"path_bytes":2}}

Kmer response fields:

  • key (string): the kmer key of the requested kmer
  • colours (list): a list of coverage in each colour
  • left (string): bases that prepend to the left of the kmer key to give another kmer in the graph
  • right (string): bases that append to the right of the kmer key to give another kmer in the graph
  • edges (string): two characters (per sample of edges if --edges given) in hexadecimal coding
  • links (list): a list of links:
    • forward (boolean): if this link leaves to the right of the kmer
    • juncs (string): sequence of junction choices made by the link
    • colours (list): a list of link coverages in each colour

Hexadecimal edge coding

Bases {A,C,G,T} are represented as {1,2,4,8} respectively. One character for the edges to the left, another for the edges to the right. Example: kmer CAGTGGCCA with edges c3 (=> 0b 1100 0011) means {G,T} to the left, {A,C} to the right:

[G]CAGTGGCC __               __ AGTGGCCA[A]
              \_ CAGTGGCCA _/
[T]CAGTGGCC __/             \__ AGTGGCCA[C]

Server options:

  • -C,--coverages: if set then kmer/link coverages are counts, otherwise they are 0/1 values. Storing coverages requires more memory.
  • -E,--edges: if set then edges is a list of edges in each sample, two characters per colour. Otherwise all sample edges are merged into colour 0. Storing sample rather than population edges increases memory.
  • -m,--memory <mem>: how much memory for the server to use
  • -p,--paths <in.ctp.gz>: links file to load

Motivation

The idea is that the webserver can be used to aid visualisation, haplotyping and other third party applications of de Bruijn graphs and McCortex. Loading a human sample can take ~20 minutes, so loading up a server and being able to get instant answers to kmer queries is useful.

Clone this wiki locally