Below are some examples of how to use the gfak
command line tools
to perform manipulations on GFA.
./gfak convert -S 2.0 data/v1.gfa
Produces the following output:
H VN:Z:2.0
O path1 1+ 4+ 5+ 2+
O path2 1+ 4+ 5+ 6+ 3+
O ref 1+ 2+ 3+
S 1 1 G
E 8 1+ 2+ 1$ 1$ 0 0 0M
E 9 1+ 4+ 1$ 1$ 0 0 0M
S 2 1 T
E 10 2+ 3+ 1$ 1$ 0 0 0M
S 3 1 G
S 4 1 C
E 11 4+ 5+ 1$ 1$ 0 0 0M
S 5 1 C
E 12 5+ 2+ 1$ 1$ 0 0 0M
E 13 5+ 6+ 1$ 1$ 0 0 0M
S 6 1 T
E 14 6+ 3+ 1$ 1$ 0 0 0M
./gfak convert -S 2.0 data/v1.gfa | md5sum: 268e075f19c7600304b51247b11e5f0f
./gfak convert -S 1.0 data/gfa_2.gfa
H VN:Z:1.0
P 1p 12-,11+,32+,28-,20-,16+ 140M,22M,140M,22M,81M,70M
P 2p 12-,8+,32-,31-,20-,16+,23-,16+ 140M,22M,140M,22M,81M,70M,22M,70M
S 8 AAAGATAGAAAAGTGAGTGTAT
C 8 + 32 - 11 11M
S 11 AAAGATAGAAATACACGATGCG
C 11 + 32 + 11 11M
S 12 TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
C 12 - 11 + 0 11M
C 12 - 8 + 0 11M
S 16 AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
C 16 + 23 - 59 11M
S 20 GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
C 20 - 16 + 0 11M
C 20 - 16 + 0 11M
S 23 GTGTAATTTCTTTCAGTTCTCT
C 23 - 16 + 0 11M
S 28 GAATATCTGTTAGTGAGTGTAT
C 28 - 20 - 0 11M
S 31 GAATATCTGTTTACACGATGCG
C 31 - 20 - 0 11M
S 32 TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
C 32 + 28 - 129 11M
C 32 - 31 - 0 11M
./gfak convert -S 1.0 data/gfa_2.gfa | md5sum: d7bb881a8880850acb2977efa28c7979
./gfak sort data/test.gfa
H VN:Z:1.0
S 1 CGATGCAA
S 2 TGCAAAGTAC
S 3 TGCAACGTATAGACTTGTCAC RC:i:4
S 4 GCATATA
S 5 CGATGATA
S 6 ATGA
L 1 + 2 + 5M
L 3 + 2 + 0M
L 3 + 4 - 1M1D2M1S
L 4 - 5 + 0M
C 5 + 6 + 2 4M
./gfak sort data/test.gfa | md5sum: 6dd44a9a0cc7308c7d6b92e8f0d9e648
./gfak sort data/gfa_2.gfa
H VN:Z:2.0
S 8 22 AAAGATAGAAAAGTGAGTGTAT
S 11 22 AAAGATAGAAATACACGATGCG
S 12 140 TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
S 16 70 AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
S 20 81 GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
S 23 22 GTGTAATTTCTTTCAGTTCTCT
S 28 22 GAATATCTGTTAGTGAGTGTAT
S 31 22 GAATATCTGTTTACACGATGCG
S 32 140 TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
F 11 1+ 0 22$ 129 151 11M
F 12 1- 0 140$ 0 140 11M
F 12 2- 0 140$ 0 140 11M
F 16 1+ 0 70$ 350 420$ 11M
F 16 2+ 0 70$ 350 420 11M
F 16 2+ 0 70$ 420 490$ 11M
F 20 1- 0 81$ 280 361 11M
F 20 2- 0 81$ 280 361 11M
F 23 2- 0 22$ 409 431 11M
F 28 1- 0 22$ 269 291 11M
F 31 2- 0 22$ 269 291 11M
F 32 1+ 0 140$ 140 280 11M
F 32 2- 0 140$ 140 280 11M
F 8 2+ 0 22$ 129 151 11M
E 34 8+ 32- 11 22$ 129 140$ 11M
E 35 11+ 32+ 11 22$ 0 11 11M
E 36 12- 11+ 0 11 0 11 11M
E 37 12- 8+ 0 11 0 11 11M
E 38 16+ 23- 59 70$ 11 22$ 11M
E 39 20- 16+ 0 11 0 11 11M
E 40 20- 16+ 0 11 0 11 11M
E 41 23- 16+ 0 11 0 11 11M
E 42 28- 20- 0 11 70 81$ 11M
E 43 31- 20- 0 11 70 81$ 11M
E 44 32+ 28- 129 140$ 11 22$ 11M
E 45 32- 31- 0 11 11 22$ 11M
O 1p 12- 11+ 32+ 28- 20- 16+
O 2p 12- 8+ 32- 31- 20- 16+ 23- 16+
./gfak sort data/gfa_2.gfa | md5sum: fa3b92296d3a23f9db99e611815788d4
./gfak extract data/gfa_2.gfa
>8
AAAGATAGAAAAGTGAGTGTAT
>11
AAAGATAGAAATACACGATGCG
>12
TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
>16
AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
>20
GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
>23
GTGTAATTTCTTTCAGTTCTCT
>28
GAATATCTGTTAGTGAGTGTAT
>31
GAATATCTGTTTACACGATGCG
>32
TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
./gfak extract data/gfa_2.gfa | sort | md5sum: 43bbe8fee3f67fd90b90ee885ddb15e3
cat data/no_seqs.fa | sort | md5sum: 43bbe8fee3f67fd90b90ee885ddb15e3
./gfak fillseq -f data/no_seqs.fa data/no_seqs.gfa
H VN:Z:2.0
O 1p 12- 11+ 32+ 28- 20- 16+
O 2p 12- 8+ 32- 31- 20- 16+ 23- 16+
S 8 22 AAAGATAGAAAAGTGAGTGTAT
F 8 2+ 0 22$ 129 151 11M
E 34 8+ 32- 11 22$ 129 140$ 11M
S 11 22 AAAGATAGAAATACACGATGCG
F 11 1+ 0 22$ 129 151 11M
E 35 11+ 32+ 11 22$ 0 11 11M
S 12 140 TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
F 12 1- 0 140$ 0 140 11M
F 12 2- 0 140$ 0 140 11M
E 36 12- 11+ 0 11 0 11 11M
E 37 12- 8+ 0 11 0 11 11M
S 16 70 AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
F 16 1+ 0 70$ 350 420$ 11M
F 16 2+ 0 70$ 350 420 11M
F 16 2+ 0 70$ 420 490$ 11M
E 38 16+ 23- 59 70$ 11 22$ 11M
S 20 81 GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
F 20 1- 0 81$ 280 361 11M
F 20 2- 0 81$ 280 361 11M
E 39 20- 16+ 0 11 0 11 11M
E 40 20- 16+ 0 11 0 11 11M
S 23 22 GTGTAATTTCTTTCAGTTCTCT
F 23 2- 0 22$ 409 431 11M
E 41 23- 16+ 0 11 0 11 11M
S 28 22 GAATATCTGTTAGTGAGTGTAT
F 28 1- 0 22$ 269 291 11M
E 42 28- 20- 0 11 70 81$ 11M
S 31 22 GAATATCTGTTTACACGATGCG
F 31 2- 0 22$ 269 291 11M
E 43 31- 20- 0 11 70 81$ 11M
S 32 140 TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
F 32 1+ 0 140$ 140 280 11M
F 32 2- 0 140$ 140 280 11M
E 44 32+ 28- 129 140$ 11 22$ 11M
E 45 32- 31- 0 11 11 22$ 11M
./gfak fillseq -f data/no_seqs.fa data/no_seqs.gfa | md5sum: caaf91eac390521d68d56bad57f7b3b3
./gfak ids -s 9:9:9 data/test.gfa
H VN:Z:1.0
S 10 CGATGCAA
L 10 + 11 + 5M
S 11 TGCAAAGTAC
S 12 TGCAACGTATAGACTTGTCAC RC:i:4
L 12 + 11 + 0M
L 12 + 13 - 1M1D2M1S
S 13 GCATATA
L 13 - 14 + 0M
S 14 CGATGATA
C 14 + 15 + 2 4M
S 15 ATGA
diff <(./gfak ids -s 9:9:9 data/test.gfa) <(cat data/re_id.gfa)
cat data/re_id.gfa``` ./gfak merge -S 2.0 data/test.gfa data/gfa_2.gfa
H VN:Z:2.0 O 1p 12- 11+ 32+ 28- 20- 16+ O 2p 12- 8+ 32- 31- 20- 16+ 23- 16+ S 1 8 CGATGCAA E 8 1+ 2+ 8$ 8$ 0 0 5M S 2 10 TGCAAAGTAC S 3 21 TGCAACGTATAGACTTGTCAC RC:i:4 E 9 3+ 2+ 21$ 21$ 0 0 0M E 10 3+ 4- 21$ 21$ 0 0 1M1D2M1S S 4 7 GCATATA E 11 4- 5+ 7$ 7$ 0 0 0M S 5 8 CGATGATA E 12 5+ 6+ 2 6 0 4$ 4M S 6 4 ATGA S 8 22 AAAGATAGAAAAGTGAGTGTAT F 8 2+ 0 22$ 129 151 11M E 34 8+ 32- 11 22$ 129 140$ 11M S 11 22 AAAGATAGAAATACACGATGCG F 11 1+ 0 22$ 129 151 11M E 35 11+ 32+ 11 22$ 0 11 11M S 12 140 TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT F 12 1- 0 140$ 0 140 11M F 12 2- 0 140$ 0 140 11M E 36 12- 11+ 0 11 0 11 11M E 37 12- 8+ 0 11 0 11 11M S 16 70 AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA F 16 1+ 0 70$ 350 420$ 11M F 16 2+ 0 70$ 350 420 11M F 16 2+ 0 70$ 420 490$ 11M E 38 16+ 23- 59 70$ 11 22$ 11M S 20 81 GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT F 20 1- 0 81$ 280 361 11M F 20 2- 0 81$ 280 361 11M E 39 20- 16+ 0 11 0 11 11M E 40 20- 16+ 0 11 0 11 11M S 23 22 GTGTAATTTCTTTCAGTTCTCT F 23 2- 0 22$ 409 431 11M E 41 23- 16+ 0 11 0 11 11M S 28 22 GAATATCTGTTAGTGAGTGTAT F 28 1- 0 22$ 269 291 11M E 42 28- 20- 0 11 70 81$ 11M S 31 22 GAATATCTGTTTACACGATGCG F 31 2- 0 22$ 269 291 11M E 43 31- 20- 0 11 70 81$ 11M S 32 140 TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT F 32 1+ 0 140$ 140 280 11M F 32 2- 0 140$ 140 280 11M E 44 32+ 28- 129 140$ 11 22$ 11M E 45 32- 31- 0 11 11 22$ 11M
./gfak merge -S 2.0 data/test.gfa data/gfa_2.gfa | md5sum: ca3b52673b63de931cd64a50669e7147