You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 3, 2024. It is now read-only.
I was trying to use IsoPhase do call SNPs from the Pacbio Isoseq data for a single sample. However, when I run the select_loci_to_phase.py with all the files required, I was keeping getting 0 loci.
The GFF file is obtained after running SQANTI3 filtering
The FASTQ file is obtained after doing CCS analyses - LIMA - Isoseq refine, but before clustering. The headers of each sequence are in the format @m64319e_211227_144417/3211320/ccs.
The read_stat file is the count information after collapse obtained by running get_abundance_post_collapse.py
id is_fl stat pbid
m64319e_211227_144417/3211320/ccs Y unique PB.1.1
m64319e_211227_144417/65145123/ccs Y unique PB.1.1
m64319e_211227_144417/111935771/ccs Y unique PB.2.1
m64319e_211227_144417/67240066/ccs Y unique PB.2.1
m64319e_211227_144417/123208230/ccs Y unique PB.3.1
m64319e_211227_144417/160236121/ccs Y unique PB.3.1
m64319e_211227_144417/42860688/ccs Y unique PB.3.1
I have also manually checked that the coverage at some loci is over 40, even some isoforms (e.g. P.B.78.13) are having support from more than 40 reads.
It was very mysterious since when I used the exactly same reference genome and GFF file, but changed the FASTQ file to sample1..FL.RE.CL.hq.fasta (obtained after doing clustering, headers are formated as >transcript/0 full_length_coverage=4;length=5054), as well as changed the read_stat file into a new file converted using python <path_to>/phasing/utils/convert_group_to_read_stat_file.py, everything worked fine.
The new read_stat file is like:
id pbid is_fl stat
transcript/304 PB.1.1 Y unique
transcript/3111 PB.2.1 Y unique
transcript/3713 PB.3.1 Y unique
transcript/3605 PB.4.1 Y unique
transcript/2086 PB.5.1 Y unique
transcript/2546 PB.6.1 Y unique
transcript/1800 PB.7.1 Y unique
I was very confused, since I was hoping the phasing step would take the information of the full-length read sequences before clustering into isoform sequences. It would be very helpful if you could help with this problem. Thank you.
Monica
The text was updated successfully, but these errors were encountered:
@Yexin-Zhang did you use the --fq flag in the command when you ran select_loci_to_phase.py? I had the same problem as you, but when I added --fq to the commands, everything ran as expected.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
I was trying to use IsoPhase do call SNPs from the Pacbio Isoseq data for a single sample. However, when I run the
select_loci_to_phase.py
with all the files required, I was keeping getting 0 loci.The FASTQ file is obtained after doing
CCS analyses - LIMA - Isoseq refine
, but before clustering. The headers of each sequence are in the format @m64319e_211227_144417/3211320/ccs.The read_stat file is the count information after collapse obtained by running
get_abundance_post_collapse.py
I have also manually checked that the coverage at some loci is over 40, even some isoforms (e.g. P.B.78.13) are having support from more than 40 reads.
It was very mysterious since when I used the exactly same reference genome and GFF file, but changed the FASTQ file to
sample1..FL.RE.CL.hq.fasta
(obtained after doing clustering, headers are formated as >transcript/0 full_length_coverage=4;length=5054), as well as changed theread_stat
file into a new file converted usingpython <path_to>/phasing/utils/convert_group_to_read_stat_file.py
, everything worked fine.The new read_stat file is like:
I was very confused, since I was hoping the phasing step would take the information of the full-length read sequences before clustering into isoform sequences. It would be very helpful if you could help with this problem. Thank you.
Monica
The text was updated successfully, but these errors were encountered: