-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Pre-built Linux and Mac binaries are available from Releases page.
- Qt 6
- CMake
- C++17-compliant compiler
mkdir build
cd build
cmake ..
make
Sometimes the graph layout in Bandage is not perfect and manual correction can be useful. In addition to standard nodes' movement, the ability for nodes' rotation has been implemented. To rotate contig you should hold down the right mouse button and move one of the ends of the contig.
Hi-C links between different contigs can be visualized on the de Bruijn graph. Hi-C links are drawn as dotted lines connecting the midpoints of contigs.
To load Hi-C metadata in Bandage you need to choose "Load Hi-C data" item in menu "File". You can load file with Hi-C data only after loading de Bruijn graph.
Each row of the file should contain three fields, separated by '\t': IDs of two connected nodes and the weight (number of Hi-C links). First row should contains name of columns.
Below is an example of a Hi-C metadata TXT file:
v1 v2 hic_w
1268598 831795 6516
1072702 831795 5454
1268598 524477 1548
To draw de Bruijn graph with Hi-C links you should click on the "Draw graph" button after loading Hi-C metadata.
You can fit and choose filter settings and click on the "Draw graph" button to draw graph after changing parameters of Hi-C links visualization.
-
You can choose minimum Hi-C weight, thus Hi-C links with weight less than min weight will not be visualized.
-
You can choose minimum length of contig's sequence. Hi-C links connecting shorter contigs will not be shown.
-
You can choose filter of Hi-C links display:
- All edges - All Hi-C links will be shown.
- All edges link groups - All Hi-C links connecting contigs from different graph's connected components will be shown.
- One edge links groups - Only one Hi-C link between different graph's connected components will be shown.
The ability to visualize RandomForest, AdaBoost or Gradient Boosted Decision Trees machine learning models was implemented. If you use Decision Trees models based on features extracted from metagenomic dataset (for example, features extracted by MetaFX), then it is possible to simultaneously visualize predictive model and de Bruijn graph in BandageNG. Also, the implementation of BandageNG supports mapping features, used in predictive model, on the nodes (contigs) in de Bruijn graph.
To load ML model in Bandage you need to choose "Load features forest" item in menu "File".
All trees should be described in one TXT file. All tree nodes in Forest model should have unique ID. All data should be separated by tab symbol ("\t"). Every row starts with one of special symbols: N, F, C or S.
There are four types of rows in an input file:
Row format | Description | Example |
---|---|---|
N <Node ID> [<Left child ID>] [<Right child ID>] | Row describes tree node and contains node ID, and IDs of children for inner node | N 1 2 3 |
F <Node ID> <Feature ID> <Threshold> | Row describes feature and contains node ID, feature ID and threshold value (float) used to split node into children | F 1 f_1 0.25 |
C <Node ID> <Class> | Row describes node's class and contains node ID and class of leaf (for leaves) or class of feature (for inner nodes) | C 1 NonIBD |
S <Fetaure ID> <Sequence> | Row describe one nucleotide sequence of feature and contains feature ID and nucleotide sequence | S 1 GGAGCG |
Some properties:
- Every tree node should have only one row with prefix "N" and "C" in input file.
- Every inner tree node should have only one row with prefix "F" in input file.
- Every feature can have one or multiple rows with prefix "S" in input file. If feature's nucleotide sequences are unknown, it cannot be matched with contig in de Bruijn graph.
You can write TXT file by yourself, or you can use build_model_for_bandage.py script to generate it.
To run script you should provide the following parameters:
Parameter | Description |
---|---|
--model-file | Joblib dump trained model. Support RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier |
--res-file | File name to save output result |
--source-dir | Directory that contains FASTA files {sourceDir}/contigs_<category>/components.seq.fasta for every class in ML model with nucleotide sequences of features. Sequences name should be the same as feature ID |
build_model_for_bandage.py --model-file <RandomForest.joblib> --res-file <RandomForestModel.txt> --source-dir <source-dir>
Mapping forest model on De Bruijn graph is implemented based on colour schema: tree node and part of contig in de Bruijn graph have the same colour when nucleotide sequences of tree node map on the part of contig. To synchronize classification model and De Bruijn graph you should click on the "Map features to De Bruijn graph" button.
- This implementation allows visualizing labels for every tree nodes: node ID, class or custom notes.
- Choosing one node, you can obtain all information from forest on the right panel (ID, splitting rule description, class (class of feature or class of leaf) and set of nucleotide sequences).
This implementation allows to select one of the colour schemas:
- Uniform colour – all tree nodes have one uniform colour.
- Class colour – tree nodes are coloured according to their classes. Nodes with one class have the same colour and nodes with different classes have different colours.
- BLAST hits (solid) – could be used only after mapping features on de Bruijn graph. Coloured tree nodes and matched parts of contigs in the same random colours.
- BLAST hits (class colours) – could be used only after mapping features to de Bruijn graph. Coloured tree nodes and matched parts of contigs colored according to tree node classes.
This implementation allows visualisation of several de Bruijn graphs (from different files) on one screen simultaneously.
To load multiple graphs in Bandage you need to choose "Load graphs from dir" item in menu "File". All graph's files from selected directory will be loaded recursively. Only files with *.gfa
and *.fastg
extensions will be added.
To draw de Bruijn graphs you should click on the "Draw graph" button after loading graphs. All graphs will be presented on the one screen. Every graph will be named using its relative path in the folder. The different graphs can contain nodes with the same names, so a prefix with the graph ID is added to the name of all nodes. Every graph has random unique ID. Also new names of nodes can be used in gfa or fasta files generated in BandageNG app to save selected part of graphs.
CSV metadata can be used to visualize taxonomic annotation of graph nodes. In this case you can use CSV table with columns: Superkingdom, Phylum, Class, Order, Family, Genus, Species, Serotype and Strains. This table can be filled with annotation data from Kraken2 output.
Perform the following steps:
- Run taxonomy classification by Kraken2 with names in output.
kraken2 --threads 8 --use-names --db ./kraken2/k2_standart ./components.seq.fasta > kraken_class.txt
- Transform output from kraken2 to CSV file by custom python script tax_to_csv.py. To run this script you should use python3 and ete3 (
pip install ete3
) library:
py tax_to_csv.py --class-file=kraken_class.txt --res-file=graph.csv
Load CSV data (Single graph):
To load CSV data you need to choose "Load CSV data" item in menu "File".
Load CSV data (Multiple graphs):
- CSV data will be loaded automatically with multiple graphs if name of csv file is equal with name of graph. For example, if in the folder A there are three files: graph_1.gfa, graph_2.gfa and graph_2.csv then csv metadata will be applied to graph_2.
- Also you can loaded one CSV file for all graphs, but in this case all nodes in all graphs should be unique. To do it you should choose point "Load CSV data" in menu "File".