Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate hic contact map based on purged genome? #79

Open
SmnYanGe opened this issue Jan 15, 2024 · 3 comments
Open

generate hic contact map based on purged genome? #79

SmnYanGe opened this issue Jan 15, 2024 · 3 comments

Comments

@SmnYanGe
Copy link

Hi, Thank you for creating and maintaining YaHS. I would like to ask question regarding the following:
After I assembled the genome using yahs, I purged the assembly using purge_dups. I would like to know if I can use the purged genome to generate a new hic contact map, my steps are as follows:
I followed the 'https://github.com/c-zhou/yahs#generate-hic-contact-maps' procedure,
First filter the alignments_sorted.txt generated in this step to retain what is still retained in the purged assembly.
Then run juicer_tools to generate a new hic contact heatmap with new alignments_sorted.txt and new scaffolds_final.chrom.sizes, but it runs with an error:

java.lang.ArrayIndexOutOfBoundsException: Index 174 out of bounds for length 174
at juicebox.tools.utils.original.ExpectedValueCalculation.addDistance(ExpectedValueCalculation.java:171)
at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.incrementCount(Preprocessor.java:1670)
at juicebox.tools.utils.original.Preprocessor$MatrixPP.incrementCount(Preprocessor.java:1511)
at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:777)
at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:419)
at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:122)
at juicebox.tools.HiCTools.main(HiCTools.java:94)

I would like to know your opinion and advice, Thanks for your help!

@c-zhou
Copy link
Owner

c-zhou commented Jan 18, 2024

Hello @SmnYanGe,

I cannot tell where the problem is from the error message. My guess is that you have a very short sequence in your scaffolds - maybe 174bp long, and this triggered a juicer_tools problem. One solution could be removing those extremely short sequences. You just need to update the scaffolds_final.chrom.sizes to exclude the corresponding lines.

Best,
Chenxi

@SmnYanGe
Copy link
Author

Hi @c-zhou and thank you for your response!

I have identified the issue: if I use purge_dups on the assembly results generated by yahs, it changes the lengths of the scaffolds, thereby causing a mismatch in data between the two files 'alignments_sorted.txt' and 'scaffolds_final.chrom.sizes'. If I simply remove some low coverage scaffoldss without performing a purge, I can successfully generate a hic map.

This leads to a second problem. I tried to manually edit the hic map, so I used the method here 'https://github.com/c-zhou/yahs?tab=readme-ov-file#manual-curation-with-juicebox-jbat' to generate a hic map of the assembly with low coverage scaffolds removed. However, it seems to be unsupported, and I receive an error message: 'Segmentation fault (core dumped)'.

From my attempts, I believe the issue is that I cannot modify the 'scaffolds_final.agp' file. I tried deleting the contents of the second scaffold only, and encountered the same error. However, when I delete the contents of the last scaffold, it runs normally. I would like to hear your opinion on this. Thanks again!

Best wishes,
Yang

@frwjo
Copy link

frwjo commented Mar 29, 2024

Hi @c-zhou and thank you for your response!

I have identified the issue: if I use purge_dups on the assembly results generated by yahs, it changes the lengths of the scaffolds, thereby causing a mismatch in data between the two files 'alignments_sorted.txt' and 'scaffolds_final.chrom.sizes'. If I simply remove some low coverage scaffoldss without performing a purge, I can successfully generate a hic map.

This leads to a second problem. I tried to manually edit the hic map, so I used the method here 'https://github.com/c-zhou/yahs?tab=readme-ov-file#manual-curation-with-juicebox-jbat' to generate a hic map of the assembly with low coverage scaffolds removed. However, it seems to be unsupported, and I receive an error message: 'Segmentation fault (core dumped)'.

From my attempts, I believe the issue is that I cannot modify the 'scaffolds_final.agp' file. I tried deleting the contents of the second scaffold only, and encountered the same error. However, when I delete the contents of the last scaffold, it runs normally. I would like to hear your opinion on this. Thanks again!

Best wishes, Yang

Hi,I meet the same promblem.But I solve the question.Because I write down the error " echo "assembly num" " ,you can check your code at this location.if you don't know how to get the num,you can try this code:awk '{s+=$2} END{print s}' your.fasta.fai
Best wishes
Wu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants