Cant manage to get working with alternative FASTA #13

ebioman · 2021-01-18T11:22:23Z

I was eager to try your tool and the examples in the repository work well.
I tried directly plotting from your examples as well as re-alignment (minimap2) on hg38 with subsequent plotting.
In that case both without and with provided fasta sequence worked.

Whenever I try though to use a non-human reference genome, I get an error :

2021-01-18 10:51:57,140 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
    refseq = rseq.get_refseq(pos1)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
    refseq = self.get_refseq_from_localfasta(pos1)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
    refseq[gpos+1] = seq[i]
IndexError: string index out of range

In order to assure that is should work I took really headers such as ">sequence1",">sequence2" and so on and verified that reads mapped actually on provided positions but it always gave the same error.
Any hints how to fix that, or plans to add to your test repository some user reference sequences to make it easier to understand what is the proper syntax in that case ?

The text was updated successfully, but these errors were encountered:

erinyoung · 2021-01-19T18:28:46Z

Can I second this issue?

I want to use this to visualize reads of SARS-CoV-2 with reference MN908947.3

The command:

$ bamsnap -ref MN908947.3.fasta -bam test.primertrim.sorted.bam -pos MN908947.3:241 -out test.bamsnap.png

The error:

2021-01-19 11:16:15,958 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
    refseq = rseq.get_refseq(pos1)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
    refseq = self.get_refseq_from_localfasta(pos1)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
    refseq[gpos+1] = seq[i]
IndexError: string index out of range

ebioman · 2021-01-21T07:35:33Z

Looking at the code, this part seems to be source of the error:

    def get_refseq_from_localfasta(self, pos1):
        spos = pos1['g_spos']-self.opt['margin'] - 500
        epos = pos1['g_epos']+self.opt['margin'] + 1 + 500
        seq = self.get_refseq_from_fasta(pos1['chrom'], spos, epos, self.opt['ref_index_rebuild'])
        i = 0
        refseq = {}
        for gpos in range(spos, epos):
            refseq[gpos+1] = seq[i]
            i += 1
        return refseq

Where we can see already that start and end-position take the provided position and shift by 500bp + the given margin (default 50bp) in order to find anything to plot at all.
@erinyoung Might that be the issue in your case, that since you have the position 241bp that start would be 241 - 500 -50bp = -309bp which would be out of range ?

Will have to go back to mine and see whether that applies - certainly not for all which I tested.....

erinyoung · 2021-02-08T23:11:39Z

This still did not work:

$ bamsnap -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png
 
2021-02-08 15:47:54,183 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 229, in run_process_drawplot_bamlist
    imagefname = bsplot.drawplot_bamlist(pos1, image_w, bamlist, xscale, refseq)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 475, in drawplot_bamlist
    ia = self.append_geneplot_image(ia, pos1, image_w, xscale)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 429, in append_geneplot_image
    geneplot = GenePlot(pos1['chrom'], pos1['g_spos'], pos1['g_epos'], xscale,
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 101, in __init__
    self.load_gene_structure()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 114, in load_gene_structure
    for rec in self.gene_annot_tb.querys(pos_str):
tabix.TabixError: query failed

I was reading #11 (comment) and finally got bamsnap working for me:

bamsnap -draw coordinates bamplot coverage base \
     -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png

I guess I just need to be wary of the edges.

tkonopka · 2021-02-16T20:56:33Z

Encountered similar issues when drawing short chromosomes and also near chromosome edges. I made some edits in a branch in a fork. (branch cli_margin)

An example synthetic fasta sequence and synthetic alignment:

>random_small
GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG
GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT

@SQ	SN:random_small	LN:120
@PG	ID:bwa	PN:bwa	VN:0.7.17-r1188	CL:bwa mem -o example.sam small.fa small.fastq
random_small-1	0	random_small	1	60	60M	*	0	0	GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG	ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss	NM:i:0MD:Z:60	AS:i:60	XS:i:0
random_small-2	0	random_small	61	60	60M	*	0	0	GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT	ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss	NM:i:0MD:Z:60	AS:i:60	XS:i:0

The current version gives IndexError on this data. In the edits, setting -pos random_small:1-120 (entire chromosome) and -margin 0, it becomes possible to draw the chromosome edge-to-edge.

When -margin is set to nonzero and would cause the genomic region off the chromosome boundaries, the alignment becomes left-justified and leaves empty space on the right.

The examples on readthedocs also seem to generate images (sometimes with one base different on the right). Does this seem reasonable?

@danielmsk - I'd be happy to send a pull request with this. There is potential these changes may break other things that I may not be aware of. It would be great to get your input.

gtrichard · 2022-09-01T16:21:26Z

This is still an issue and really undermines bamsnap scope and usefulness.

lskatz · 2022-09-06T14:45:41Z

We added 1000 to the start and subtracted 1000 from the end position and I really really hate that this worked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cant manage to get working with alternative FASTA #13

Cant manage to get working with alternative FASTA #13

ebioman commented Jan 18, 2021

erinyoung commented Jan 19, 2021

ebioman commented Jan 21, 2021

erinyoung commented Feb 8, 2021

tkonopka commented Feb 16, 2021

gtrichard commented Sep 1, 2022

lskatz commented Sep 6, 2022

Cant manage to get working with alternative FASTA #13

Cant manage to get working with alternative FASTA #13

Comments

ebioman commented Jan 18, 2021

erinyoung commented Jan 19, 2021

ebioman commented Jan 21, 2021

erinyoung commented Feb 8, 2021

tkonopka commented Feb 16, 2021

gtrichard commented Sep 1, 2022

lskatz commented Sep 6, 2022