Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipperBams failed. Error: processed all unmapped reads but there are mapped reads remaining to be read. #978

Open
teryyoung opened this issue Apr 10, 2024 · 3 comments
Labels

Comments

@teryyoung
Copy link

teryyoung commented Apr 10, 2024

hello! I'm doing with sequencing data with UMI, I followed #fgbio-best-practise-fastq---consensus-pipeline
When I run fgbio ZipperBams to merge my consensus.ubam and consensus.mapped.bam, there is an exception:
Exception in thread "main" java.lang.IllegalStateException: Error: processed all unmapped reads but there are mapped reads remaining to be read.

and I check my two bam files; they all are queryname sorted, have same number of reads, the difference between them is all of queryname of ubam have suffix "/A". like this:
image
image
I found it is caused by fgbio CallConsensusReads...

I wanna know whether this is the problem caused ZipperBams failed, or how to deal with this problem(ZipperBams)?
#my fgbio version is 2.2.2-3a74fd2-SNAPSHOT

@nh13 nh13 added the question label Apr 11, 2024
@nh13
Copy link
Member

nh13 commented Apr 11, 2024

Somehow the /A is getting lost between the uBAM and the mapped BAM. I see your filename has picard in it, but we don't use picard in the best-practices pipeline. Can you provide the commands you used to go from the uBAM to the mapped BAM?

@teryyoung
Copy link
Author

I used STAR to map the uBAM, so the code looks weird,
1)First I FastqtoBam the raw reads, and picard sorted it;
java -Xmx10g -jar /share/apps/softwares/fgbio/target/scala-2.13/fgbio-2.2.2-3a74fd2-SNAPSHOT.jar FastqToBam \
--input $r1 $r2 \
--read-structures 5M2S+T 5M2S+T \
--sample ${sample} \
--library ${sample} \
--platform-unit BGIT7 \
--output ./0.1.uBAM/${sample}.uBAM
java -Xmx10G -jar /share/apps/softwares/picard-2.26.2/picard.jar SortSam I=0.1.uBAM/${sample}.uBAM \
O=0.1.uBAM/${sample}.picard.sorted.uBAM\
SORT_ORDER=queryname

2)then I extract fastq from uBAM, mapped them using STAR, forgive me that I didn't find ways to mapping one fastq file using STAR.
samtools fastq 0.1.uBAM/${sample}.picard.sorted.uBAM -1 tmp/${sample}.1.fq.bak -2 tmp/${sample}.2.fq.bak

awk 'NR%4==1 {print $0"/1"} NR%4 !=1 {print $0}' tmp/${sample}.1.fq.bak >tmp/${sample}.uBAM_1.fq
awk 'NR%4==1 {print $0"/2"} NR%4 !=1 {print $0}' tmp/${sample}.2.fq.bak >tmp/${sample}.uBAM_2.fq

ref_dir="~/reference_genome/GRC38_p14/hg38_ucsc/STAR_index/"
rm tmp/${sample}/ -rf
/share/apps/softwares/STAR-2.7.10b/bin/Linux_x86_64_static/STAR --twopassMode Basic\
--quantMode TranscriptomeSAM GeneCounts\
--runThreadN 6\
--genomeDir ${ref_dir}\
--alignIntronMin 20\
--alignIntronMax 50000\
--outSAMtype BAM Unsorted\
--outSAMattrRGline ID:"${sample}" SM:"${sample}" PL:BGI\
--outFilterMismatchNmax 2\
--outSJfilterReads Unique --outSAMmultNmax 1\
--outFileNamePrefix ./0.1.uBAM/${sample}/${sample}.\
--outSAMmapqUnique 60\
--outSAMunmapped Within KeepPairs\
--readFilesCommand cat\
--readFilesIn tmp/${sample}.uBAM_1.fq tmp/${sample}.uBAM_2.fq\
--outTmpDir ./tmp/${sample}
3)After first mapping, I sorted result bam file using picard, and ZipperBams;
java -Xmx10G -jar /share/apps/softwares/picard-2.26.2/picard.jar SortSam I=0.1.uBAM/${sample}/${sample}.Aligned.out.bam\
O=0.1.uBAM/${sample}/${sample}.mapped.picard.sorted.BAM\
SORT_ORDER=queryname
java -jar /share/apps/softwares/fgbio/target/scala-2.13/fgbio-2.2.2-3a74fd2-SNAPSHOT.jar ZipperBams \
--unmapped 0.1.uBAM/${sample}.picard.sorted.uBAM \
--input 0.1.uBAM/${sample}/${sample}.mapped.picard.sorted.BAM \
--ref ${ref}/../hg38.p14.fa \
--output 0.1.uBAM/${sample}.umi.mapped.bam

All commands above runs fine, but After I GroupReadsByUmi, CallMolecularConsensusReads, FilterConsensusReads, and mapping them sencond time, ZipperBams have problem;
4)java -jar /share/apps/softwares/fgbio/target/scala-2.13/fgbio-2.2.2-3a74fd2-SNAPSHOT.jar GroupReadsByUmi \
--input=${dir01}/1.umi.mapped.bam \
--output=${dir02}/1.umi.group.BAM \
--strategy=paired \
--min-map-q=10 \
--edits=1 \
--raw-tag=RX \ --family-size-histogram=${dir02}/1.umi.tag_family_histogram.txt`

java -jar /share/apps/softwares/fgbio/target/scala-2.13/fgbio-2.2.2-3a74fd2-SNAPSHOT.jar CallMolecularConsensusReads \
--min-reads=1 \
--min-input-base-quality=20 \
--input=${dir02}/1.umi.group.BAM \
--output=${dir03}/1.consensus.uBAM \
--threads=4 \
--read-group-id=false

java -jar /share/apps/softwares/fgbio/target/scala-2.13/fgbio-2.2.2-3a74fd2-SNAPSHOT.jar FilterConsensusReads \
--input=${dir03}/1.consensus.uBAM \
--output=${dir03}/1.consensus.filter.uBAM \
--ref="${ref}/../hg38.p14.fa" --min-reads=3 \
--max-read-error-rate=0.2 \
--max-base-error-rate=0.2 \
--min-base-quality=30 \
--max-no-call-fraction=0.20

java -Xmx10G -jar /share/apps/softwares/picard-2.26.2/picard.jar SortSam \
I=${dir03}/1.consensus.filter.uBAM\
O=${dir03}/1.consensus.filter.sorted.uBAM\
SORT_ORDER=queryname

samtools fastq ./${dir03}/1.consensus.filter.sorted.uBAM -1 tmp/1.consensus.uBAM_1.fq.bak -2 tmp/1.consensus.uBAM_2.fq.bak
awk 'NR%4 ==1 {print $0"/1"} NR%4 !=1 {print$0}' tmp/1.consensus.uBAM_1.fq.bak >tmp/1.consensus.uBAM_1.fq
awk 'NR%4==1 {print $0"/2"} NR%4!=1 {print$0}' tmp/1.consensus.uBAM_2.fq.bak >tmp/1.consensus.uBAM_2.fq

STAR alignment code is like it above, and then picard sorted;
then ZipperBams;
java -Xmx10G -jar /share/apps/softwares/fgbio/target/scala-2.13/fgbio-2.2.2-3a74fd2-SNAPSHOT.jar ZipperBams \
--unmapped ./${dir03}/1.consensus.filter.sorted.uBAM \
--ref ${ref}/../hg38.p14.fa \
--input ${dir03}/1.consensus.mapped.picard.sorted.bam \
--output ${dir03}/1.consensus.mapped.zipper.bam

Tips: using picard sortBam is because I used to use picard mergeBam to merge unmapped and mapped reads; I know the sort_order need to be the same when using picard mergeBam; (why I didnt use samtools sort is that some BAM files didnt support samtools's sort somehow.)

@teryyoung
Copy link
Author

Did you mean that '/A' is necessary for ZipperBams? And that should be retain whatever I run command on it?
now I think my fault is misuse the STAR alignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants