Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Zero Reads Pairs and Output Size with yahs and Juicer Preprocessing Pipeline #85

Open
Hanjiangna opened this issue Apr 16, 2024 · 4 comments

Comments

@Hanjiangna
Copy link

Hello Developers,

I have been utilizing yahs for Hi-C data processing followed by Juicer's preprocessing pipeline. My command for running yahs was:

yahs /NGS/Fungi/Rc/juicer/references/R0301HifiHicOnt.asm.hic.hap2.p_ctg.fa R0301onthap2.sort.bam -e GATC

During the execution of yahs, it recognized and logged the following information:

[I::find_re_from_seqs] number restriction enzyme cutting sites found in sequences: 336082
[I::find_re_from_seqs] restriction enzyme cutting sites density: 0.008152
[I::main] dump hic links (BAM) to binary file yahs.out.bin
[I::dump_links_from_bam_file] 1 million records processed, 0 read pairs
[I::dump_links_from_bam_file] 2 million records processed, 0 read pairs

However, even after processing millions of records from the BAM file, there were no read pairs detected. This was confirmed by the message:

0 read pairs processed

Subsequently, I proceeded with the Juicer preprocessing step using the following command:

juicer pre -a -o out_JBAT yahs.out.bin yahs.out_scaffolds_final.agp /NGS/Fungi/Rc/juicer/references/R0301HifiHicOnt.asm.hic.hap2.p_ctg.fa.fai

Upon completion, the output out_JBAT.txt file contained no data, i.e., its size was effectively 0 bytes.

My question is, given the absence of read pairs in the yahs output, is it normal for the Juicer preprocessing step to generate an empty out_JBAT.txt file? Could the lack of detected read pairs indicate an issue with either the alignment in the BAM file (R0301onthap2.sort.bam) or how yahs is handling the data?

It seems unusual that no valid interactions would be identified, especially considering the large number of records processed. I would appreciate any insights into what might cause such an outcome and suggestions on how to troubleshoot this issue.

Thank you for your attention and assistance.

Best regards,
Han jiangna

@c-zhou
Copy link
Owner

c-zhou commented Jun 4, 2024

Hello @Hanjiangna,

Sorry for the delayed reply. This is usually caused by a malformatted BAM file. How did you generate your BAM file? If you can show me the header lines of your BAM file and a few lines of records, I can probably tell the reason.

Best,
Chenxi

@Hanjiangna
Copy link
Author

Hello Developer
Sorry for the late response, as I was occupied with various exams. Below is a screenshot of the header section and a few lines of the record entries.
image
image
Best regards,
Han jiangna

@c-zhou
Copy link
Owner

c-zhou commented Jun 28, 2024

Hi Jiangna,

I can see two problems regarding the BAM file you showed.

  1. The SAM flags say the three read pairs were all properly mapped (the 2nd column), but none of them are really paired. They all have different read names (the 1st column). Two paired reads should be grouped together if sorted by read names.
  2. For all three read pairs, the two reads were mapped to the exact same position (as indicated by the 4th, 7th, 8th and 9th columns), which does not look right.

There is probably something wrong with your read mapping.

Best,
Chenxi

@Hanjiangna
Copy link
Author

Hello Developer
Thanks your reply.I will check the step of read mapping.
Best wishes!
Han jiangna

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants