-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError during linked_fragment_file generation #77
Comments
I have the same problem. I've also identified and removed row causing the error (by reading the error message
Commands used to get input files
|
What happens if you change directory to the "utilities" directory and run test_LinkFragment.py? (Please change the value of EXTRACTHAIRS variable in the test script to the path of the extractHAIRS binary). |
Thanks for taking the time to answer. I'm running it through ResultsIt seems to finish without any errors.
What I didTo run the test_LinkFragments.py I dowloaded the latest release's source files (v1.2) and then changed the EXTRACTHAIRS variable to the output given by
|
Gotcha. So this bug is not caught by the existing tests. Can you show me which row of the unlinked fragment file was "causing" the crash? |
From the first error where it couldn't decode position 7010:
|
This is very helpful. Would you mind seeking to a little earlier, and reading more, so that I can have the equivalent of 5-10 lines from the fragment file around that spot? |
I think I reproduced this bug with a 10x dataset of my own. Should be fixed soon. |
That's great, I'll leave the closing of the issue to you when it's been implemented. Let me know if you need something more. Thank you so much for the help! If I could rate this interaction I'd give it a 10/10! |
You'll find the row posted before highlighted in bold.
|
So I didn't actually reproduce this same issue, it was a separate utf-8 issue that looked similar. Would it be possible for you to send your entire unlinked.txt (just that file) to me? edge dot peterj at gmail dot com |
Alright, I've mailed the file to you.
|
A little bit down the road... Fixed by two commits: There was also a downstream problem with buffer sizes which weren't part of the original issue, but was patched to verify the initial I've just finished running through my data through all steps without any further hiccups so I consider the issue closed. Thanks to @pjedge for his excellent perseverance in resolving this issue. |
<Anyone who has author privileges, feel free to close this issue> |
thanks |
still have the same errors: LinkFragments.py 10X ... |
our systems installed v1.2 release, which may not include all the recent fixes. I guess this may be the reason I still got the same error. |
Hi, I hit this same issue on the conda install and when I tried to quick install from github directly to troubleshoot, "make" generates this error: "gcc -g -O3 -Wall -D_GNU_SOURCE -Ihtslib -c hairs-src/bamread.c -lhts -o build/bamread.o In file included from hairs-src/bamread.c:1:0: htslib v1.9 is in my path. Best, |
If you add the actual path to 'htslib' to the Makefile (variable HTSLIB), the code should compile. I pushed a minor change to LinkFragments.py that could potentially fix the UnicdoeDecode error. If you could please try and let me know, it would be really helpful. |
Downloaded the new LinkFragments.py and successfully installed htslib and HapCut2 from github and everything appears to be working!! Will post confirmation when it finishes completely, but it has passed that place where it used to error. Thanks so much for your quick response and fix to this issue!! Really needed a win today! |
Hi Bansal lab, |
Not normal, but depends on the data. What's the scale of the dataset?
…On Sun, Apr 19, 2020, 9:35 AM Brendan J. Pinto ***@***.***> wrote:
Hi Bansal lab,
I appears that the LinkFragments.py has been successfully patched,
although it's still running (going on 10 days) and is only about 1/3 of the
way through the genome (purely by scaffold #).. Is this a normal run-time
for this script?
Thanks again!
Best,
bjp
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUBV2EEB7ULT5LKKVGCGR3RNMR6DANCNFSM4H4NKTYQ>
.
|
1.8Gb genome, ~3,000 scaffolds (w/ scaffold L90 of 112) w/ ~1.7 billion 10X reads. Currently approaching scaffold 1,000 w/o any notable increase in speed (although these scaffolds are might small). |
I'm experiencing the same error as @chunlinxiao and @FrickTobias with I installed HapCUT2 Just to be clear, I don't think this is an issue with the bioconda recipe. In the recipe YAML it clearly uses |
Can you please share a dataset for reproducing this error? I can provide an upload link if needed. Thanks. |
@vibansal If I use the method used above to find the particular line that causes the problem, will that suffice? I cannot share the full dataset. EDIT: The "method used above" being the following. with open("unlinked.txt", "rb") as f:
f.seek(7100)
f.read(60) |
Yes, having a few lines around the problematic line will be helpful. |
Hi, sorry about leaving this hanging. The problem solved itself. Most likely there was some versioning confusion. You can close this issue. |
Traceback (most recent call last):
File "./HapCUT2/utilities/LinkFragments.py", line 492, in
link_fragments(args.fragments,args.VCF,args.bam_file, args.outfile, args.distance, args.single_SNP_frags)
File "./HapCUT2/utilities/LinkFragments.py", line 327, in link_fragments
flist = read_fragment_matrix(hairs_file,vcf_file,chrom_filter=curr_chrom)
File "./HapCUT2/utilities/LinkFragments.py", line 212, in read_fragment_matrix
for line in fm:
File "./HapCUT2/myenv3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 644: invalid start byte
Step1 of extractHAIRS for unlinked_fragment_file is okay.
But step2 of LinkFragments.py for linked_fragment_file generation had an error above.
Do you have any idea? thanks
The text was updated successfully, but these errors were encountered: