some sequences are missing in pyfastx.Fasta object #41

dawnmy · 2022-03-12T22:06:42Z

I loaded a fasta file containing 4542 sequences with average length of 2.5kb, however only 4539 sequences were in the pyfastx.Fasta object.

fa = pyfastx.Fasta('assembly.fasta')
fa['contig_4540'] # keyError

Besides, I could access a sequence e.g. fa['contig_999'] for the first time. But when I try to access it again I got keyError.

The version of pyfastx I used is 0.8.4, Python version 3.7

The text was updated successfully, but these errors were encountered:

lmdu · 2022-03-15T13:07:38Z

Thank you for reporting this issue. I will check that. A new version will be released soon.

floccinauc · 2023-08-31T09:16:27Z

Any updates on this? I'm getting the same error: I'm loading a large fasta file (~59M entries), and for some of the indices (when accessing by string key and by integer index), I'm getting a key does not exist error. Reloading the file solves the problem for given keys, but shifts it to others.
I'm using pyfastx 1.1.0

lmdu · 2023-08-31T09:21:17Z

Thanks. Could you provide me your code and data https links.

floccinauc · 2023-08-31T11:18:51Z

I'm using the unzipped version of this file https://stringdb-downloads.org/download/protein.sequences.v12.0.fa.gz.
As for my code, the simple snippet below does not seem to reproduce this error:

import pyfastx
from tqdm import tqdm
FILEPATH="/dccstor/bmfmbio/datasets/STRING/all/protein.sequences.v12.0.fa"
loaded_fasta = pyfastx.Fasta(FILEPATH)
for idx in tqdm(range(int(5e7))):
a = loaded_fasta[idx]

Maybe it has to do with multiple workers accessing the same fasta file? I'm afraid I cannot post the actual code I'm using at this point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some sequences are missing in pyfastx.Fasta object #41

some sequences are missing in pyfastx.Fasta object #41

dawnmy commented Mar 12, 2022 •

edited

Loading

lmdu commented Mar 15, 2022

floccinauc commented Aug 31, 2023 •

edited

Loading

lmdu commented Aug 31, 2023

floccinauc commented Aug 31, 2023 •

edited

Loading

some sequences are missing in pyfastx.Fasta object #41

some sequences are missing in pyfastx.Fasta object #41

Comments

dawnmy commented Mar 12, 2022 • edited Loading

lmdu commented Mar 15, 2022

floccinauc commented Aug 31, 2023 • edited Loading

lmdu commented Aug 31, 2023

floccinauc commented Aug 31, 2023 • edited Loading

dawnmy commented Mar 12, 2022 •

edited

Loading

floccinauc commented Aug 31, 2023 •

edited

Loading

floccinauc commented Aug 31, 2023 •

edited

Loading