Refactorization of Analyze schema #11

luissian · 2023-12-19T22:57:08Z

Testing result of 70 Locus.

saramonzon · 2024-01-05T09:18:02Z

setup.py

+
+from setuptools import setup, find_packages
+
+version = "2.2.0"


maybe version 3.0.0?

Yes, changed to version 3.0.0

saramonzon · 2024-01-05T09:21:41Z

taranis/__main__.py

+    # for schema_file in schema_files:
+    results = []
+    start = time.perf_counter()
+    with concurrent.futures.ProcessPoolExecutor() as executor:


Please include the parameter for setting the threads you want to use so we can test it on the HPC

Included parameter for cpu used. Prokka usage cpus was set to 3. more testing is required to getting the optimize value

saramonzon · 2024-01-05T10:08:22Z

taranis/analyze_schema.py

+                a_quality[record.id] = {"quality": "Good quality", "reason": "-"}
+                allele_seq[record.id] = str(record.seq)
+                a_quality[record.id]["length"] = len(str(record.seq))
+                if len(record.seq) % 3 != 0:


Have you checked that the translate protein function of biopython only fails with not a 3 multiple. It would make sense to me but just in case. Also I think that keeping he protein info output could be interesting as it was in the previous code.
Can you run the previous code for analyze schema so we can check the output and see if we are missing something?

This part of the code is replaced now for using biopython

saramonzon · 2024-01-05T10:09:00Z

taranis/analyze_schema.py

+                a_quality[record.id]["length"] = len(str(record.seq))
+                if len(record.seq) % 3 != 0:
+                    a_quality[record.id]["quality"] = "Bad quality"
+                    a_quality[record.id]["reason"] = "Can not be converted to protein"


I would call this as in the previous code : Not a CDS, cannot be converted to protein. Or something like this

error messages are collected from biopython text message

saramonzon · 2024-01-05T10:10:30Z

taranis/analyze_schema.py

+                ):
+                    bad_quality_record.append(record.id)
+
+        if self.remove_duplicated:


duplicates and subset checkings must be done all times, when you use remove_duplicated or remove_subset etc.. it's only for removing them from the schema, but the statistics and que quality flag should be there indicating that there is a duplicate or a subset

duplicated and subset alleles are showed in quality data

saramonzon · 2024-01-05T10:33:12Z

After checking the new and old code, I would add also information about different number of proteins found per gene, and some graphs and stats about the lenght variability of the alleles for each gene as in previous code

saramonzon · 2024-01-05T10:53:11Z

In the pie graph:
blue should be name known gene name
red - hypothetical protein, I assume that prokka label them like this? (please check this)

saramonzon · 2024-01-05T10:54:25Z

In the good quality/bad quality pie, The bad quality (if any) I assume that all the possibilities would appear, right?
no start codon, no stop codon, etc.

luissian added 6 commits November 2, 2023 16:53

first change to new format

6022168

First changes to the new refactorization

5c7e3e3

Implemented analyze_schema in multiple cpus

dc10a7a

Liting analyze_schema

dacde99

Added code to test alfaclust results

beb0413

Implemented Analyze schema

a25365c

luissian changed the title ~~Refactorization~~ Refactorization of Analyze schema Jan 4, 2024

luissian marked this pull request as ready for review January 4, 2024 16:59

saramonzon approved these changes Jan 5, 2024

View reviewed changes

luissian merged commit d212ab4 into BU-ISCIII:develop Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactorization of Analyze schema #11

Refactorization of Analyze schema #11

luissian commented Dec 19, 2023 •

edited

Loading

saramonzon Jan 5, 2024

luissian Jan 8, 2024

saramonzon Jan 5, 2024

luissian Jan 8, 2024

saramonzon Jan 5, 2024

luissian Jan 8, 2024

saramonzon Jan 5, 2024

luissian Jan 8, 2024

saramonzon Jan 5, 2024

luissian Jan 8, 2024

saramonzon commented Jan 5, 2024

saramonzon commented Jan 5, 2024

saramonzon commented Jan 5, 2024


		from setuptools import setup, find_packages

		version = "2.2.0"

Refactorization of Analyze schema #11

Refactorization of Analyze schema #11

Conversation

luissian commented Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saramonzon commented Jan 5, 2024

saramonzon commented Jan 5, 2024

saramonzon commented Jan 5, 2024

luissian commented Dec 19, 2023 •

edited

Loading