Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes to maintenance script / ensembl push #1646

Merged
merged 13 commits into from
Jan 17, 2025
5 changes: 5 additions & 0 deletions maintenance/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,11 @@ def black_format(code):
def ensembl_stdpopsim_id(ensembl_id):
# below is do deal with name changes in Ensembl
# TODO: remove this once we have moved to the new names
# this was added as a temporary fix to allow the release to be
# updated to 113, where the names of these species were changed
# in the future we might keep this code block, but comment it out
# to show others how maintenance updates were performed in the case
# where the species name changed in Ensembl
if ensembl_id == "canis_lupus_familiaris":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps insert a comment explaining what this is doing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, explaining why this is necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, actually, I don't understand why it's necessary. I see below that now species.ensembl_id == "canis_lupus_familiaris", so where does ensembl_id equal "canis_familiaris"? Just missing something here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah this is because of the transition that happened when running the maintenance script! now i bet i can take it out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - could you either take this out, or say in the comment exactly what needs to happen to take it out? (It says "once we have moved to the new names" - but: moved what names? where? I am confused. =)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a larger comment now explaining the context that says:

    # below is do deal with name changes in Ensembl
    # TODO: remove this once we have moved to the new names
    # this was added as a temporary fix to allow the release to be
    # updated to 113, where the names of these species were changed
    # in the future we might keep this code block, but comment it out
    # to show others how maintenance updates were performed in the case
    # where the species name changed in Ensembl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what I understood from talking to you - I thought that (a) we could remove this code right now if we wanted, since this if clause will never get triggered; but (b) we were going to leave the code in as an example of how to do this. Is that right?

ensembl_id = "canis_familiaris"
elif ensembl_id == "escherichia_coli_str_k_12_substr_mg1655_gca_000005845":
Expand Down
7 changes: 5 additions & 2 deletions stdpopsim/catalog/ApiMel/species.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,11 @@

def add_if_unique(chrom, synonyms):
for syn in synonyms:
if syn not in chrom.synonyms:
chrom.synonyms.append(syn)
# commented this out for now as this case
# is not occurring and it's messing with code coverage
# if syn not in chrom.synonyms:
# chrom.synonyms.append(syn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry - I misled you here. Note that two lines down, this function is called on chr_synonyms_dict, so this function is used. And, codecov is telling us not that this line isn't called it's that the other branch of the if is never evaluated - i.e., that it's always true that syn not in chrom.synonyms. So lines 135-145 are equivalent to

for chrom in _genome.chromosomes:
    chrom.synonyms.append(chr_synonyms_dict[chrom.id])

I think what's going on here is that usually we get synonyms from genome_data.py (ie ensembl); but here we wanted to add some synonyms manually; however some of these overlapped with ensembl.

And, deleting these lines will remove these synonyms (ie, we won't use chr_synonyms_dict any more)! So, we shouldn't do that.

How about this: delete add_if_unique and change lines 144-145 to

for chrom in _genome.chromosomes:
    syns = list(set(chrom.synonyms + chr_synonyms_dict[chrom.id])
    chrom.synonyms = syns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay made this change. it passed the ApiMel test. Let's see what codecov says

pass


for chrom in _genome.chromosomes:
Expand Down
25 changes: 5 additions & 20 deletions stdpopsim/catalog/EscCol/species.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,33 +29,18 @@
)

_species_ploidy = 1
_chromosomes = []
for name, data in genome_data.data["chromosomes"].items():
_chromosomes.append(
stdpopsim.Chromosome(
id=name,
length=data["length"],
synonyms=data["synonyms"],
# Wielgoss et al. (2011) calculated for strain REL606,
# from synonymous substitutions over 40,000 generations.
mutation_rate=8.9e-11,
ploidy=_species_ploidy,
recombination_rate=8.9e-11,
gene_conversion_length=542,
)
)
_chromosomes_names = genome_data.data["chromosomes"].keys()


_ploidy_data = {str(i.id): _species_ploidy for i in _chromosomes}
_ploidy_data = {str(i): _species_ploidy for i in _chromosomes_names}

_mutation_rate = 8.9e-11
_mutation_rate_data = {str(i.id): _mutation_rate for i in _chromosomes}
_mutation_rate_data = {str(i): _mutation_rate for i in _chromosomes_names}

_recombination_rate = 8.9e-11
_recombination_rate_data = {str(i.id): _recombination_rate for i in _chromosomes}
_recombination_rate_data = {str(i): _recombination_rate for i in _chromosomes_names}

_gene_conversion_length = 542
_gene_conversion_data = {str(i.id): _gene_conversion_length for i in _chromosomes}
_gene_conversion_data = {str(i): _gene_conversion_length for i in _chromosomes_names}
_genome = stdpopsim.Genome.from_data(
genome_data=genome_data.data,
mutation_rate=_mutation_rate_data,
Expand Down
Loading