consider to_dict or index for pj.records. IMPORTANT #43

szitenberg · 2014-11-26T12:29:03Z

Records are stored as a list. This means that fetching a records by its accession or a feature by its feature_id requires iteration.

Records should be stored as a dictionary (SeqIO.to_dict(SeqIO.parse(...))). This way, getting to a feature will be much much faster.

get_qualifiers_dictionary can then be done by getting the record using a key and the feature index based on number of the _f suffix.

metadata editing methods that take a feature_id will become much much faster.

any input file writer that will be much faster.

This requires some work changing the way pj.records is iterated throughout.

The text was updated successfully, but these errors were encountered:

szitenberg · 2015-03-09T17:24:41Z

Many things depend on Project.records being a list of SeqRecord objects. For now, I have a private function __get_qualifiers_dictionary__ in parallel to the public one, get_qualifiers_dictionary.

To use it, one needs to first do:

Project.__records_list_to_dict__()
It is not done automatically to i) avoid a duplicate representation of the data if we don't want to use the private version and ii) get the latest data into the dict.

Then
__get_qualifiers_dictionary__(Project, 'feature_id')

can be used.

get_qualifiers_dictionary is a dependency of several functions and methods. My plan is to add a private version for each of them where I use the __get_qualifiers_dictionary__ instead of the public version. For now I've written __make_concatenation_alignments__. This step was the most noticable bottleneck. It can now be avoided like this: Instead of doing:

concat = Concatenation(...)
Project.add_concatenation(concat)
Project.make_concatenation_alignment()

You can do:

concat = Concatenation(...)
Project.add_concatenation(concat)
Project.__records_list_to_dict__()
Project.__make_concatenation_alignment__()

This will make the concatenation process much much faster. Downside: you'll now have all the data twice, once as a list and again as a dict, in your Project.

I am working towards making the changes throughout and eliminating the diplication.

szitenberg added the enhancement label Nov 26, 2014

szitenberg added bug and removed enhancement labels Jan 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider to_dict or index for pj.records. IMPORTANT #43

consider to_dict or index for pj.records. IMPORTANT #43

szitenberg commented Nov 26, 2014

szitenberg commented Mar 9, 2015

consider to_dict or index for pj.records. IMPORTANT #43

consider to_dict or index for pj.records. IMPORTANT #43

Comments

szitenberg commented Nov 26, 2014

szitenberg commented Mar 9, 2015