-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With BWA, R1 & R2 files can have very different mapping signatures #36
Comments
I just noticed that the databases are different (another member of our team is running fastq_screen). That could be the reason for the problem. I'll track that down. |
All of the databases are copies of one another, with the exception of the human. Anyway, it seems there is some odd behavior when the aligner is set to bwa. |
Here I met another problem when the aligner is set to Since it's a chipseq data, so I took the same steps to analyze the Input data and the mapping results were still largely different. The When the aligner is set to I'm new to bioinformatics and have been confuse by the different mapping results of bwa and bowtie2 for a very long time. I've searched and tried the method described in https://www.biostars.org/p/117225/ , but So far, the result of bowtie2 is trustworthy to me. And I start to think about "Is those research data that already published trustable? Since there must exits some people, that new to bioinformatics like me, to do the analysis work" |
Hi, Interesting. To comment on the first point, my guess is that there is some kind non-genomic sequence towards the start of your lowly mapped reads read. If you remove, say the first 20bps from the start of the read, are you now able to map with BWA? Other than that, do the R1 / R2 reads look similar in FASTQC? Thanks, |
Thank you for the suggestions. There's nothing noticeably different between the R1 and R2 reads in the libs where this is happening. We reprocessed the data prior to employing |
question, do we need the fasta files from the background contaminant genomes, or just the indexes of the fasta files? Disk space limitations on our hpc inspired us to only put the index files in the dir of background contaminant genomes. Given that we obtained the odd results above, I'm wondering if leaving the fastas out of the background dir caused a problem. |
Hi, Many thanks, |
When the aligner is set to
bwa
, in two separate data sets we are seeing some libs with very different "hit profiles" or "mapping signatures" between R1 and R2 files. For example, R1 may have 40% of reads map to our fish genome fasta, and R2 from the same library can have 0%. We have not observed this behavior when the aligner is set tobowtie2
.I've attached two report files, one from
bwa
and one frombowtie2
. Realize that we are still optimizing the slurm array that runsfastq_screen
, which explains why some results are missing. I will get the exact options and arguments that were used from the person running the analyses and post them.The fastq files were previously processed with
fastp
to remove questionable reads (<140 bp after trimming, low complexity were filtered out) andclumpify.sh
to remove PCR and optical duplicates, so the data fed tofastq_screen
should be fairly pristine.bwa_vs_bowtie2.zip
The text was updated successfully, but these errors were encountered: