Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant manage to get working with alternative FASTA #13

Open
ebioman opened this issue Jan 18, 2021 · 6 comments
Open

Cant manage to get working with alternative FASTA #13

ebioman opened this issue Jan 18, 2021 · 6 comments

Comments

@ebioman
Copy link

ebioman commented Jan 18, 2021

I was eager to try your tool and the examples in the repository work well.
I tried directly plotting from your examples as well as re-alignment (minimap2) on hg38 with subsequent plotting.
In that case both without and with provided fasta sequence worked.

Whenever I try though to use a non-human reference genome, I get an error :

2021-01-18 10:51:57,140 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
    refseq = rseq.get_refseq(pos1)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
    refseq = self.get_refseq_from_localfasta(pos1)
  File "/usr/local/lib/python3.7/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
    refseq[gpos+1] = seq[i]
IndexError: string index out of range

In order to assure that is should work I took really headers such as ">sequence1",">sequence2" and so on and verified that reads mapped actually on provided positions but it always gave the same error.
Any hints how to fix that, or plans to add to your test repository some user reference sequences to make it easier to understand what is the proper syntax in that case ?

@erinyoung
Copy link

Can I second this issue?

I want to use this to visualize reads of SARS-CoV-2 with reference MN908947.3

The command:

$ bamsnap -ref MN908947.3.fasta -bam test.primertrim.sorted.bam -pos MN908947.3:241 -out test.bamsnap.png 

The error:

2021-01-19 11:16:15,958 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 227, in run_process_drawplot_bamlist
    refseq = rseq.get_refseq(pos1)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 537, in get_refseq
    refseq = self.get_refseq_from_localfasta(pos1)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 586, in get_refseq_from_localfasta
    refseq[gpos+1] = seq[i]
IndexError: string index out of range

@ebioman
Copy link
Author

ebioman commented Jan 21, 2021

Looking at the code, this part seems to be source of the error:

    def get_refseq_from_localfasta(self, pos1):
        spos = pos1['g_spos']-self.opt['margin'] - 500
        epos = pos1['g_epos']+self.opt['margin'] + 1 + 500
        seq = self.get_refseq_from_fasta(pos1['chrom'], spos, epos, self.opt['ref_index_rebuild'])
        i = 0
        refseq = {}
        for gpos in range(spos, epos):
            refseq[gpos+1] = seq[i]
            i += 1
        return refseq

Where we can see already that start and end-position take the provided position and shift by 500bp + the given margin (default 50bp) in order to find anything to plot at all.
@erinyoung Might that be the issue in your case, that since you have the position 241bp that start would be 241 - 500 -50bp = -309bp which would be out of range ?

Will have to go back to mine and see whether that applies - certainly not for all which I tested.....

@erinyoung
Copy link

This still did not work:

$ bamsnap -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png
 
2021-02-08 15:47:54,183 : [INFO] Total running time: 0.0 sec
Process proc 1:
Traceback (most recent call last):
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 229, in run_process_drawplot_bamlist
    imagefname = bsplot.drawplot_bamlist(pos1, image_w, bamlist, xscale, refseq)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 475, in drawplot_bamlist
    ia = self.append_geneplot_image(ia, pos1, image_w, xscale)
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/bamsnap.py", line 429, in append_geneplot_image
    geneplot = GenePlot(pos1['chrom'], pos1['g_spos'], pos1['g_epos'], xscale,
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 101, in __init__
    self.load_gene_structure()
  File "/home/eriny/anaconda3/envs/bamsnap/lib/python3.9/site-packages/bamsnap/geneplot.py", line 114, in load_gene_structure
    for rec in self.gene_annot_tb.querys(pos_str):
tabix.TabixError: query failed

I was reading #11 (comment) and finally got bamsnap working for me:

bamsnap -draw coordinates bamplot coverage base \
     -ref MN908947.3.fasta -bam test.sorted.bam -pos MN908947.3:1181 -out test.bamsnap.png

I guess I just need to be wary of the edges.

@tkonopka
Copy link
Contributor

Encountered similar issues when drawing short chromosomes and also near chromosome edges. I made some edits in a branch in a fork. (branch cli_margin)

An example synthetic fasta sequence and synthetic alignment:

>random_small
GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG
GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT
@SQ	SN:random_small	LN:120
@PG	ID:bwa	PN:bwa	VN:0.7.17-r1188	CL:bwa mem -o example.sam small.fa small.fastq
random_small-1	0	random_small	1	60	60M	*	0	0	GTATGATACCTAGATAATAAAGGCTGCTAGGAGCCCTTTAACAGAGGACCTATACGTAAG	ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss	NM:i:0MD:Z:60	AS:i:60	XS:i:0
random_small-2	0	random_small	61	60	60M	*	0	0	GCACATGAAAAATGAGTTAGCGGCAGACCTATGTTGTAAGCTAGCTCGCGGAGTAAACGT	ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss	NM:i:0MD:Z:60	AS:i:60	XS:i:0

The current version gives IndexError on this data. In the edits, setting -pos random_small:1-120 (entire chromosome) and -margin 0, it becomes possible to draw the chromosome edge-to-edge.

entire-chromosome

When -margin is set to nonzero and would cause the genomic region off the chromosome boundaries, the alignment becomes left-justified and leaves empty space on the right.

margin-outside

The examples on readthedocs also seem to generate images (sometimes with one base different on the right). Does this seem reasonable?

@danielmsk - I'd be happy to send a pull request with this. There is potential these changes may break other things that I may not be aware of. It would be great to get your input.

@gtrichard
Copy link

This is still an issue and really undermines bamsnap scope and usefulness.

@lskatz
Copy link

lskatz commented Sep 6, 2022

We added 1000 to the start and subtracted 1000 from the end position and I really really hate that this worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants