-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is no longer a need to manually set the scale factor in Juicebox #94
Comments
Thanks for the wonderful software @c-zhou, and this update @zengxiaofei The *.hic and *.assembly files are opening normally (see the figure below), and there seems to be no need to manually set the scale factor. But after manual curation, Could you please have a look at my script and let me know what went wrong?
This generated a nice HiC map, which I had to modify a little.
While I was expecting a final fasta file of the genome is around 5.8G, the last command generated a file of 399G. Below are the input contig, intermediate scaffold and final scaffold fasta files. Could you please let me know what went wrong?
Thanks in advance. |
Hi @gunjanpandey, You are currently using an outdated version of Best wishes, |
Hi @zengxiaofei, Thanks for your prompt reply. When I try to run the pipeline with the latest version (v1.2.1) released 3 days back, I am getting this error.
|
I am not the author of YaHS; you may need to ask @c-zhou by creating a new issue. My modified version of |
Hi @gunjanpandey, Your commands seem all good to me. Something might be wrong with the For the error you saw when running Best, |
Hi Xiaofei @zengxiaofei, Thank you very much for the updates. They are very useful. For scaling of the genome size, I have not got a chance to test it yet so did not include it in the new release. May I ask if this applies to all juicer_tools versions? I would certainly like to add this to YaHS at some point. For the quality score filtering in coordinate-sorted BAMs, the problem is that we cannot know the mapping quality of the mate read when you process a SAM record. For example, if the mapping qualities of the two reads are 10 and 0, we will only see mapping quality 10 when reading the first SAM record. However, we would not want to include it, as the mapping quality of the second read is 0. Because of this, we decided not to allow quality filtering for coordinate-sorted BAMs. Best, |
Hi @c-zhou, Could you please help me understand the reason for the error below and how to fix it? I did not have this problem with your earlier version but there was a different problem. Is there a fix for that?
|
Hi @gunjanpandey, Have you checked your When I refer to Best, |
Also @gunjanpandey, have you solved the unexpected file ( |
thanks @c-zhou, Meanwhile, if you could suggest anything for the file size, If I sort it out, I will update. |
Hi @gunjanpandey, if you could share these files, I can help have a look. Chenxi |
Hello @gunjanpandey, Which version of juicer post are you using? I discovered a bug in the latest version (v1.2.1), but it should be fine with version v1.2 and earlier. Best, |
I was using V1.2.1. I will try with an earlier version. I am trying to sort out the problem. If I am not able to, I will share the files. Otherwise post the fix. I am also running AllHiC and 3Ddna, which are working fine, but this program gave me the best-looking map and I would like to proceed with it for finalizing the genome. Interestingly, It is aggressive in contig breaking undefault settings. Mabs has generate all 11 ungapped t2t chromosomes from hic, hifi and ul-ont data, but this program still breaks contigs. I am keen to get assembly from this, gapfill with hifi or ul-ont (and polish if need), and compare it with other programs. |
For the file size problem, you do not need to redo everything from the scratch, only need to rerun the YaHS is sometimes quite sensitive to the HiC signal changes along the contigs which will be considered as assembly errors. This mostly depends on the HiC library you have. If you believe it is too aggressive in breaking contigs, you could try to run it with the Best, |
Hi Chenxi,
I have not tested different versions of juicer_tools yet. But I think they should be compatible, the main compatibility issue comes from Juicebox. Juicebox will automatically infer the scale factor based on the genome size in the .assembly file and the genome size in the .hic file.
Yes, I understand that MAPQ filtering is not compatible with coordinate-sorted BAM files. According to the SAM Format Specification, valid values of sorting order of alignments (SO) include unknown, unsorted, queryname, and coordinate. Here I have enabled MAPQ filtering for unsorted BAM. The SAM files produced by bwa mem are now marked as unsorted (lh3/bwa#336), where paired reads appear in adjacent lines, which should also be compatible with MAPQ filtering. However, if you feel this is not appropriate, you can leave it unchanged. Best regards, |
Dear YaHS Developers and Users,
For large genomes, the .assembly and .hic files generated by YaHS are not fully compatible with Juicebox. Manually setting the scale factor in Juicebox may be necessary. Additionally, Juicebox cannot correctly import and parse a modified assembly (i.e.,
.review.assembly
) in these scenarios. To address this, I made some modifications injuicer.c
,asset.c
andasset.h
to eliminate the need for manually setting the scale factor in Juicebox for YaHS.This issue arises because the method for calculating the scale factor in YaHS differs from that of Juicebox. Juicebox uses
1 + genome_size / 2,100,000,000
as the scale factor, while YaHS uses the smallestn
that fulfillsgenome_size / 2^n < INT_MAX
, resulting in Juicebox being unable to infer the correct scale factor. For example, a genome with a size of 9 Gb will have a scale factor of5
in Juicebox, whereas YaHS will calculate a scale factor of2^3 = 8
.I also made another modification for the MAPQ filtering function in
juicer.c
. In the original version, MAPQ filtering is enabled only when the BAM is queryname sorted. However, the filtering process should also support unsorted BAM files.The modified version of
juicer.c
,asset.c
andasset.h
is available at: https://github.com/zengxiaofei/yahs. Please feel free to use it in your work. I can make a pull request if the developers think it appropriate to merge these changes.Best regards,
Xiaofei
The text was updated successfully, but these errors were encountered: