Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Duplication_model.slim #33

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

DanteWensby
Copy link

First draft of SLiM model of single duplication. Where part of the genome is allocated to house the duplication once it is introduced, since modification of genome length is not possible in SLiM

First draft of SLiM model of single duplication. Where part of the genome is allocated to house the duplication once it is introduced, since modification of genome length is not possible in SLiM
Copy link
Contributor

@bhaller bhaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Dante! Sorry for the delay in getting this done. Code review like this is a bit time-consuming so I needed a good block of undistracted time to do it. Overall the model looks great; you've done a nice job of implementing the idea. I think there are a handful of minor errors, and some cleanup and commenting to be done. Once you've pushed fixes from this review, I'll review it again including actually running the model and observing it in SLiMgui, etc., to make sure it seems to be working as intended. (You should be doing that too, of course! :->) For the next revision of this PR, please revise the README file to add a mention of this model. Also, please add a block comment at the top of the file that states what it does, and credits yourself in whatever way you want to be credited, and gives a date for when the model was written, stuff like that. A header comment that describes the file with credits. Sound good? Thanks a bunch for doing this! It's a good idea, and I'm surprised nobody (myself included) has done it before now!

@@ -0,0 +1,167 @@
initialize() {
initializeSLiMOptions(nucleotideBased=T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For demonstration purposes, probably no reason for this to be nucleotide-based, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But if you want it to be, it seems harmless...)

initializeSLiMOptions(nucleotideBased=T);
defineConstant("L", 1010000); // Genome length
defineConstant("DUP_LENGTH", 10000); // Duplication size
defineConstant("DUP_START", L - DUP_LENGTH); // Start position of duplication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess so that L isn't implicitly hard-coded with DUP_LENGTH built into it I'd suggest (1) make L_BASE be 1000000, (2) make DUP_START be equal to L_BASE, and then (3) make L be L_BASE + DUP_LENGTH. Something like that.

defineConstant("DUP_START", L - DUP_LENGTH); // Start position of duplication
defineConstant("N", 500); // Population size

initializeTreeSeq();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For demo purposes I'd also remove this. (I realize you probably want this in your model, I'm reviewing for SLiM-Extras purposes right now.) Tree-seq will probably be fairly confused by the gene duplication stuff anyway. If you intend to also submit an analysis script in Python that is smart enough to do the right thing with the way that the duplication events are recorded, then that's awesome, and in that case, leave tree-seq in. Otherwise, probably take it out and keep the recipe as simple as possible.

initializeTreeSeq();

// Generate random nucleotides for the first L - DUP_LENGTH bases
randomPart = sample(c("A", "T", "C", "G"), L - DUP_LENGTH - 1, replace=T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another place where L_BASE would be useful. Also, note that there is a randomNucleotides() function built into SLiM; I think you can use it here, yes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But if you take nucleotides out of the demo model, then this goes away anyway)

m2.convertToSubstitution = F;

// Define genomic element
initializeGenomicElementType("g1", m1, 1.0, mmJukesCantor(1e-7));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you know this or not, but 1e-7 here is the alpha parameter to the JK model, and the realized mutation rate is 3*alpha, so your realized mutation rate is 3e-7. Is that what you want? (Don't blame me, blame Jukes and Cantor :->)

}
if (gm1 & gm2) {
//homozygot duplication
return F;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct; you have not modified the breakpoints in this code branch.

return F;
}
else {
// heterozygote duplicated: resample to get an even # of breakpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, here one parental genome is duplicated and the other is not. So presumably recombination between them is not allowed? So you just want to filter out breakpoints in the duplication region, just like in case 1. If the copy strand, when you reach DUP_START, is on the non-duplicate strand you'll make a non-duplicate gamete; if it is on the duplicate strand, you'll make a duplicate gamete. That should just work, I think? So the code here seems fine, except for needing the same change that I commented on above for case 1.

But you have this comment above, "resample to get an even # of breakpoints". I don't know why you want to do that, and it doesn't look like the code here tries to do that. Please clarify.

// If no breakpoints remain after filtering, return F (no recombination)
if (length(breakpoints) == 0) {
return F;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these lines, as above; incorrect and unnecessary

randomPart = sample(c("A", "T", "C", "G"), L - DUP_LENGTH - 1, replace=T);

// Define the segment to duplicate (last DUP_LENGTH bases of randomPart)
duplicatedPart = randomPart[(length(randomPart) - DUP_LENGTH):length(randomPart) - 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do exactly what you want it to do? The tricky thing is that, because of operator precedence, A:B-1 generates a sequence from A-1 to B-1, not a sequence from A to B-1 as people often expect. If this code does do what you intend it to do, then it would be a good idea to add parentheses to make that clear, as (A:B)-1, so the reader knows it isn't an error. The parens are unnecessary, and maybe you know what you're doing, but the precendence of : is such a common source of bugs that it is really better to make it very explicitly clear that you're doing what you intend to do. Check this carefully.

// Initialize the ancestral sequence with the full sequence
initializeAncestralNucleotides(fullSequence);


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trim blank lines down to one or maybe 2 if you want section separation

@bhaller bhaller marked this pull request as draft December 21, 2024 01:41
@bhaller
Copy link
Contributor

bhaller commented Dec 21, 2024

Hi again. Pondering this, I realized there's an issue we hadn't thought about: fixed mutations at the time of duplication. This is particularly apparent with nucleotides, but it's an issue even without nucleotides too, really. Suppose the initial state of the model, with nucleotides is AAAA|AAAA, showing a duplicated region just four bases long, and ignoring all the bases to the left of the original region that gets duplicated. So AAAA is the original region, and your code replicates that in the ancestral sequence so we have AAAA|AAAA. Now we simulate the burn-in, and at some point a mutation in the original region fixes, so now we have AATA|AAAA. The T mutation has fixed, so it has been removed and turned into a substitution object. Now the duplication event occurs, and mutations get copied – but the T does not get copied, because it is not a mutation. What to do about this?

The easy solution is to set convertToSubstitution to F for m1 also. Then the T mutation is still a mutation, and so it gets copied to the duplicated region correctly. That can make models very slow, though; they get bogged down with mutations that have fixed and are present in every genome. So a better solution would be preferable. I think it would work for your duplication event code to first duplicate substitutions within the original region. In other words, first get sim.substitutions, select the ones in the original region, and create new mutations for them in the duplicated region, just the way you do for mutations now. Then – it has to be after the substitutions – do the same for the mutations in the original region, as you do now.

This works nicely in the nucleotide-based model because nucleotide-based mutations don't stack, so if a particular position has had successive fixation events, and maybe now has a segregating mutation on top of those past fixations, adding new mutations into the duplicated region in the correct order will leave you in the correct state. In a non-nucleotide model the mutations you add would stack, instead, which would not be what you'd want. You'd want to change the stacking policy, or otherwise jump through some hoops to address that problem. Setting a stacking policy of 'l' (that's a lowercase L) on the m2 mutation type, in a non-nucleotide model, would do the trick and seems perfectly reasonable.

Anyhow, those are some additional thoughts. It would be good to figure out a way to test the model for correctness. In the nucleotide-based version (and the more I ponder these things, the more I think you ought to keep the model nucleotide-based, as the idea just works much more smoothly in that paradigm), you could do a check, at the end of the duplication event code, using the genome method nucleotides(). The nucleotide sequence for the original region versus the duplicated region ought to be identical, after the duplication. If they're not, throw an error.

Sound good? I hope all this is making sense to you.

This is an updated version of my duplication model in SLiM
@DanteWensby DanteWensby marked this pull request as ready for review January 3, 2025 14:25
problem with losing substitutions after restarting fixed by saving after duplication event have taken place
@DanteWensby DanteWensby requested a review from bhaller January 14, 2025 09:13
@bhaller
Copy link
Contributor

bhaller commented Jan 14, 2025

Hi @DanteWensby! I'm swamped as usual, but will try to get to this soon. Are you impatient to have it reviewed, or are you busy with other things too?

@DanteWensby
Copy link
Author

DanteWensby commented Jan 15, 2025

Hi @DanteWensby! I'm swamped as usual, but will try to get to this soon. Are you impatient to have it reviewed, or are you busy with other things too?
@bhaller
I am busy with other things too, so no need to reviewee any time soon. I'm sorry if I made the impression that i was impatient, i just did some minor adjustments. :)

@bhaller
Copy link
Contributor

bhaller commented Jan 15, 2025

@DanteWensby no worries, you didn't seem impatient. :-> Did you address my comments from the previous review? Or is this revision for a different purpose?

@DanteWensby
Copy link
Author

@bhaller Yes it was to address your comments, hopefully to a satisfactory extent. as well as some minor adjustments

@bhaller
Copy link
Contributor

bhaller commented Jan 15, 2025

OK great, I'll get to it soon-ish. Let me know if you get more impatient that you presently are. :->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants