Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Add anchor information into re-alignment BAM file #15

Open
samarth51 opened this issue Jul 4, 2015 · 14 comments
Open

Error: Add anchor information into re-alignment BAM file #15

samarth51 opened this issue Jul 4, 2015 · 14 comments

Comments

@samarth51
Copy link

Hi
I have BWA bam files. I tried to run ./Socrates realignment step but got an error message:
Add anchor information into re-alignment BAM file

Command given my me is:
./Socrates inputbam(generated in first step of socrates) outputbam

Any help will be help full.
Thanks

@jibsch
Copy link
Owner

jibsch commented Jul 6, 2015

Try running the program from start to finish by calling Socrates all and filling in the necessary parameters.
Please let me know if that gets you there.

@samarth51
Copy link
Author

Hi,
I tried running with Socrates all as well using command:
./Socrates all InputBAM
This command gives me a message on screen:

"Bowtie2 DB is required to perform soft-clip realignment. Please specify this parameter with --bowtie2_db "

In help section this parameter is defined as:
bowtie2_db BOWTIE2_DB -- Prefix of Bowtie2 indexed database for sample (default: None)
What "None" is for here?? It creates confusion.

Another thing is I have BWA generated BAM files so in that case what will be the value of "bowtie2_db" parameter?? Do i need bowtie2 generated BAM only to use Socrates??

Need suggestions on this.

@jibsch
Copy link
Owner

jibsch commented Jul 6, 2015

The "None" default is supposed to indicate that this parameter is required
as depends on the data.
It is fine to use BWA alignments, but the re-alignment is to be done with
bowtie2 (at least for the time being).
Create a bowtie2 index with the bowtie2-build command. Provide the prefix
of the resulting set of files (not including ".") as the bowtie2_db
parameter.

On 6 July 2015 at 18:33, Samarth Kulshrestha [email protected]
wrote:

Hi,
I tried running with Socrates all as well using command:
./Socrates all InputBAM
This command gives me a message on screen:

"Bowtie2 DB is required to perform soft-clip realignment. Please specify
this parameter with --bowtie2_db "

In help section this parameter is defined as:
bowtie2_db BOWTIE2_DB -- Prefix of Bowtie2 indexed database for sample
(default: None)
What "None" is for here?? It creates confusion.

Another thing is I have BWA generated BAM files so in that case what will
be the value of "bowtie2_db" parameter?? Do i need bowtie2 generated BAM
only to use Socrates??

Need suggestions on this.


Reply to this email directly or view it on GitHub
#15 (comment).

@samarth51
Copy link
Author

Hi,
Thanks for valuable suggestions. Everything is going good so far.
I have one another query. I have paired samples (Tumor vs Normal) so what parameter i need to adjust for paired samples?

Thanks

@jibsch
Copy link
Owner

jibsch commented Jul 8, 2015

Glad that worked.
For tumour vs normal: Run both samples independently first. Then use the
Socrates annotate module and specify the normal results file (paired) after
"--normal".

On 8 July 2015 at 16:08, Samarth Kulshrestha [email protected]
wrote:

Hi,
Thanks for valuable suggestions. Everything is going good so far.
I have one another query. I have paired samples (Tumor vs Normal) so what
parameter i need to adjust for paired samples?

Thanks


Reply to this email directly or view it on GitHub
#15 (comment).

@samarth51
Copy link
Author

Hi,
I ran tumor and normal samples separately, i got raw output for both the samples and all went well so far. Now to find out somatic SVs what parameters needs to be used ?

./Socrates annotate --features raw_tumor -- normal raw_blood
Is this the right way ??

Thanks

@jibsch
Copy link
Owner

jibsch commented Jul 16, 2015

Almost:
./Socrates annotate --normal raw_blood raw_tumor

On 14 July 2015 at 17:10, Samarth Kulshrestha [email protected]
wrote:

Hi,
I ran tumor and normal samples separately, i got raw output for both the
samples and all went well so far. Now to find out somatic SVs what
parameters needs to be used ?

./Socrates annotate --features raw_tumor -- normal raw_blood
Is this the right way ??

Thanks


Reply to this email directly or view it on GitHub
#15 (comment).

@samarth51
Copy link
Author

Hi.
I performed annotation and get SV breakpoints. But i have few doubts regarding output.

  1. Accoring to SV typing criterion (assuming C1 realign pos < C1 anchor pos) DELETION would be if : C1_realign_dir + & C1_anchor_dir -
    I applied the mentioned criterion and get few breakpoints on different chromosome in case of Deletion.

chr1:9849740 + CCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGA chr2:130489048 - TAAAGAACAATAAAGGCCAGGCACTGTGGCTCATACCTGTAATCCCAGCACTTTGGG 1 43 0 0 0 39.0 chr2:130489052 - GAACAATAAAGGCCAGGCACTGTGGCTCA chr1:9849744 + TTTACCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGAGTTGTTAGCTTTTAGAGATGTATG 1 29 00 0 38.0 Micro-homology: 4bp homology found! (TAAA)

This read has C1_realign chr1:9849740 and C1_anchor chr2:130489048 . Why this happened for a DELETION event??

  1. I also have SV breakpoints in output where C1 realign pos > C1 anchor pos. So how to deal with this kind of breakpoints?

Please give some suggestions for the problem.

@jibsch
Copy link
Owner

jibsch commented Jul 24, 2015

Hi,
Socrates calls fusions between two coordinates. The orientation of the
breakpoints (+/-) determines what kind of event it is. In case 1) it is
indeed a deletion signature: coordinate 1 < coordinate 2, and orientations
+, -. Case 2) occurs, if realignment was successful on one side of the
fusion only. You can treat them the same way: if the smaller coordinate has
a +, the second a -, it's the deletion signature, -+ for tandem
duplication, ++, -- for inversions types. More complex events contain more
than one fusion and are not as easily identifiable.
Cheers,
Jan

On 21 July 2015 at 17:59, Samarth Kulshrestha [email protected]
wrote:

Hi.
I performed annotation and get SV breakpoints. But i have few doubts
regarding output.

  1. Accoring to SV typing criterion (assuming C1 realign pos < C1 anchor
    pos) DELETION would be if : C1_realign_dir + & C1_anchor_dir -

I applied the mentioned criterion and get few breakpoints on different
chromosome in case of Deletion.

chr1:9849740 + CCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGA chr2:130489048

  • TAAAGAACAATAAAGGCCAGGCACTGTGGCTCATACCTGTAATCCCAGCACTTTGGG 1 43 0 0 0 39.0
    chr2:130489052 - GAACAATAAAGGCCAGGCACTGTGGCTCA chr1:9849744 +
    TTTACCTTTAAGCTCTATTGGACTTGATATGGTTAGTTTTAAAAAGAGTTGTTAGCTTTTAGAGATGTATG 1
    29 00 0 38.0 Micro-homology: 4bp homology found! (TAAA)

This read has C1_realign chr1:9849740 and C1_anchor chr2:130489048 . Why
this happened for a DELETION event??

  1. I also have SV breakpoints in output where C1 realign pos > C1 anchor
    pos. So how to deal with this kind of breakpoints?

Please give some suggestions for the problem.


Reply to this email directly or view it on GitHub
#15 (comment).

@samarth51
Copy link
Author

Hi
I used --repeatmask (UCSC repeat masker track) for annotation purpose. I get output column with name "repeat1" "repeat2" but do not get any value for these columns. So what are the expected values or output for those columns if i use repeat masker track ??
UCSC repeat masker track format
chr1 16777160 16777470 AluSp 2147 +
chr1 25165800 25166089 AluY 2626 -

@jibsch
Copy link
Owner

jibsch commented Aug 5, 2015

That's interesting. Can you try again by using --features instead of
--repeatmask?
Otherwise, are the chromosome names in the Socrates output the same as in
the annotation?

On 5 August 2015 at 18:00, Samarth Kulshrestha [email protected]
wrote:

Hi
I used --repeatmask (UCSC repeat masker track) for annotation purpose. I
get output column with name "repeat1" "repeat2" but do not get any value
for these columns. So what are the expected values or output for those
columns if i use repeat masker track ??
UCSC repeat masker track format
chr1 16777160 16777470 AluSp 2147 +
chr1 25165800 25166089 AluY 2626 -


Reply to this email directly or view it on GitHub
#15 (comment).

@samarth51
Copy link
Author

I tried with --features( gene coordinates) this give me "feature1" and "feature2" with gene names but --repeatmask parameter does not output anything. Yes chromosome names in the output and annotation are same... Any suggestion for repeatmask??

@jibsch
Copy link
Owner

jibsch commented Aug 5, 2015

Sorry, I meant using --features and the repeat track. Does that give you
repeat annotation, or does the problem persist?

On 5 August 2015 at 23:16, Samarth Kulshrestha [email protected]
wrote:

I tried with --features( gene coordinates) this give me "feature1" and
"feature2" with gene names but --repeatmask parameter does not output
anything. Yes chromosome names in the output and annotation are same... Any
suggestion for repeatmask??


Reply to this email directly or view it on GitHub
#15 (comment).

@samarth51
Copy link
Author

Hi,
I tried all the possible combinations of parameters. I also ran a combination of --feature (UCSS_repeatmask) and gets repeat annotation (output pasted below).
C1_realign C1_realign_dir C1_realign_consensus C1_anchor C1_anchor_dir C1_anchor_consensus C1_long_support C1_long_support_bases C1_short_support C1_short_support_bases C1_short_support_max_len C1_avg_realign_mapq C2_realign C2_realign_dir C2_realign_consensus C2_anchor C2_anchor_dir C2_anchor_consensus C2_long_support C2_long_support_bases C2_short_support C2_short_support_bases C2_short_support_max_len C2_avg_realign_mapq BP_condition normal feature1 feature2

chr1:821605 - GCCCTTTGGCAGAGCAGGTGTGCTGTGCTGTGCTGATCCCCGGGAGTC chr1:821634 + CAGCACAGCACACCTGCTCTGCCAAAGGGCAGCCAGACTGCTTCTTTAAGCAGTTCCTGATCTTGTTT 11 443 15 215 23 37.363636 chr1:821634 + CAGCACAGCACACCTGCTCTGCCAAAGGGCAGCCAGACTGCTTCTTTA chr1:821605 - GCCCTTTGGCAGAGCAGGTGTGCTGTGCTGTGCTGATCCCCGGGAGTCTCCAGAGCC
AGCAGGCTGGA 13 519 11 100 15 29.692308 Blunt-end joining normal L1PA7 L1PA7.

When i provide repeatmasker file for --repeatmak it does not work but when i provide the same repeatmasker file for --feature parameter it works (output pasted above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants