You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many things depend on Project.records being a list of SeqRecord objects. For now, I have a private function __get_qualifiers_dictionary__ in parallel to the public one, get_qualifiers_dictionary.
To use it, one needs to first do:
Project.__records_list_to_dict__()
It is not done automatically to i) avoid a duplicate representation of the data if we don't want to use the private version and ii) get the latest data into the dict.
Then __get_qualifiers_dictionary__(Project, 'feature_id')
can be used.
get_qualifiers_dictionary is a dependency of several functions and methods. My plan is to add a private version for each of them where I use the __get_qualifiers_dictionary__ instead of the public version. For now I've written __make_concatenation_alignments__. This step was the most noticable bottleneck. It can now be avoided like this: Instead of doing:
This will make the concatenation process much much faster. Downside: you'll now have all the data twice, once as a list and again as a dict, in your Project.
I am working towards making the changes throughout and eliminating the diplication.
Records are stored as a list. This means that fetching a records by its accession or a feature by its feature_id requires iteration.
Records should be stored as a dictionary (
SeqIO.to_dict(SeqIO.parse(...))
). This way, getting to a feature will be much much faster.get_qualifiers_dictionary
can then be done by getting the record using a key and the feature index based on number of the _f suffix.metadata editing methods that take a feature_id will become much much faster.
any input file writer that will be much faster.
This requires some work changing the way pj.records is iterated throughout.
The text was updated successfully, but these errors were encountered: