-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Duplication_model.slim #33
base: master
Are you sure you want to change the base?
Conversation
First draft of SLiM model of single duplication. Where part of the genome is allocated to house the duplication once it is introduced, since modification of genome length is not possible in SLiM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Dante! Sorry for the delay in getting this done. Code review like this is a bit time-consuming so I needed a good block of undistracted time to do it. Overall the model looks great; you've done a nice job of implementing the idea. I think there are a handful of minor errors, and some cleanup and commenting to be done. Once you've pushed fixes from this review, I'll review it again including actually running the model and observing it in SLiMgui, etc., to make sure it seems to be working as intended. (You should be doing that too, of course! :->) For the next revision of this PR, please revise the README file to add a mention of this model. Also, please add a block comment at the top of the file that states what it does, and credits yourself in whatever way you want to be credited, and gives a date for when the model was written, stuff like that. A header comment that describes the file with credits. Sound good? Thanks a bunch for doing this! It's a good idea, and I'm surprised nobody (myself included) has done it before now!
@@ -0,0 +1,167 @@ | |||
initialize() { | |||
initializeSLiMOptions(nucleotideBased=T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For demonstration purposes, probably no reason for this to be nucleotide-based, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But if you want it to be, it seems harmless...)
models/Duplication_model.slim
Outdated
initializeSLiMOptions(nucleotideBased=T); | ||
defineConstant("L", 1010000); // Genome length | ||
defineConstant("DUP_LENGTH", 10000); // Duplication size | ||
defineConstant("DUP_START", L - DUP_LENGTH); // Start position of duplication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess so that L isn't implicitly hard-coded with DUP_LENGTH built into it I'd suggest (1) make L_BASE be 1000000, (2) make DUP_START be equal to L_BASE, and then (3) make L be L_BASE + DUP_LENGTH. Something like that.
models/Duplication_model.slim
Outdated
defineConstant("DUP_START", L - DUP_LENGTH); // Start position of duplication | ||
defineConstant("N", 500); // Population size | ||
|
||
initializeTreeSeq(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For demo purposes I'd also remove this. (I realize you probably want this in your model, I'm reviewing for SLiM-Extras purposes right now.) Tree-seq will probably be fairly confused by the gene duplication stuff anyway. If you intend to also submit an analysis script in Python that is smart enough to do the right thing with the way that the duplication events are recorded, then that's awesome, and in that case, leave tree-seq in. Otherwise, probably take it out and keep the recipe as simple as possible.
models/Duplication_model.slim
Outdated
initializeTreeSeq(); | ||
|
||
// Generate random nucleotides for the first L - DUP_LENGTH bases | ||
randomPart = sample(c("A", "T", "C", "G"), L - DUP_LENGTH - 1, replace=T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another place where L_BASE would be useful. Also, note that there is a randomNucleotides() function built into SLiM; I think you can use it here, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But if you take nucleotides out of the demo model, then this goes away anyway)
models/Duplication_model.slim
Outdated
m2.convertToSubstitution = F; | ||
|
||
// Define genomic element | ||
initializeGenomicElementType("g1", m1, 1.0, mmJukesCantor(1e-7)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you know this or not, but 1e-7 here is the alpha parameter to the JK model, and the realized mutation rate is 3*alpha, so your realized mutation rate is 3e-7. Is that what you want? (Don't blame me, blame Jukes and Cantor :->)
} | ||
if (gm1 & gm2) { | ||
//homozygot duplication | ||
return F; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct; you have not modified the breakpoints in this code branch.
models/Duplication_model.slim
Outdated
return F; | ||
} | ||
else { | ||
// heterozygote duplicated: resample to get an even # of breakpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, here one parental genome is duplicated and the other is not. So presumably recombination between them is not allowed? So you just want to filter out breakpoints in the duplication region, just like in case 1. If the copy strand, when you reach DUP_START, is on the non-duplicate strand you'll make a non-duplicate gamete; if it is on the duplicate strand, you'll make a duplicate gamete. That should just work, I think? So the code here seems fine, except for needing the same change that I commented on above for case 1.
But you have this comment above, "resample to get an even # of breakpoints". I don't know why you want to do that, and it doesn't look like the code here tries to do that. Please clarify.
models/Duplication_model.slim
Outdated
// If no breakpoints remain after filtering, return F (no recombination) | ||
if (length(breakpoints) == 0) { | ||
return F; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these lines, as above; incorrect and unnecessary
models/Duplication_model.slim
Outdated
randomPart = sample(c("A", "T", "C", "G"), L - DUP_LENGTH - 1, replace=T); | ||
|
||
// Define the segment to duplicate (last DUP_LENGTH bases of randomPart) | ||
duplicatedPart = randomPart[(length(randomPart) - DUP_LENGTH):length(randomPart) - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do exactly what you want it to do? The tricky thing is that, because of operator precedence, A:B-1 generates a sequence from A-1 to B-1, not a sequence from A to B-1 as people often expect. If this code does do what you intend it to do, then it would be a good idea to add parentheses to make that clear, as (A:B)-1, so the reader knows it isn't an error. The parens are unnecessary, and maybe you know what you're doing, but the precendence of : is such a common source of bugs that it is really better to make it very explicitly clear that you're doing what you intend to do. Check this carefully.
models/Duplication_model.slim
Outdated
// Initialize the ancestral sequence with the full sequence | ||
initializeAncestralNucleotides(fullSequence); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trim blank lines down to one or maybe 2 if you want section separation
Hi again. Pondering this, I realized there's an issue we hadn't thought about: fixed mutations at the time of duplication. This is particularly apparent with nucleotides, but it's an issue even without nucleotides too, really. Suppose the initial state of the model, with nucleotides is AAAA|AAAA, showing a duplicated region just four bases long, and ignoring all the bases to the left of the original region that gets duplicated. So AAAA is the original region, and your code replicates that in the ancestral sequence so we have AAAA|AAAA. Now we simulate the burn-in, and at some point a mutation in the original region fixes, so now we have AATA|AAAA. The T mutation has fixed, so it has been removed and turned into a substitution object. Now the duplication event occurs, and mutations get copied – but the T does not get copied, because it is not a mutation. What to do about this? The easy solution is to set convertToSubstitution to F for m1 also. Then the T mutation is still a mutation, and so it gets copied to the duplicated region correctly. That can make models very slow, though; they get bogged down with mutations that have fixed and are present in every genome. So a better solution would be preferable. I think it would work for your duplication event code to first duplicate substitutions within the original region. In other words, first get sim.substitutions, select the ones in the original region, and create new mutations for them in the duplicated region, just the way you do for mutations now. Then – it has to be after the substitutions – do the same for the mutations in the original region, as you do now. This works nicely in the nucleotide-based model because nucleotide-based mutations don't stack, so if a particular position has had successive fixation events, and maybe now has a segregating mutation on top of those past fixations, adding new mutations into the duplicated region in the correct order will leave you in the correct state. In a non-nucleotide model the mutations you add would stack, instead, which would not be what you'd want. You'd want to change the stacking policy, or otherwise jump through some hoops to address that problem. Setting a stacking policy of 'l' (that's a lowercase L) on the m2 mutation type, in a non-nucleotide model, would do the trick and seems perfectly reasonable. Anyhow, those are some additional thoughts. It would be good to figure out a way to test the model for correctness. In the nucleotide-based version (and the more I ponder these things, the more I think you ought to keep the model nucleotide-based, as the idea just works much more smoothly in that paradigm), you could do a check, at the end of the duplication event code, using the genome method nucleotides(). The nucleotide sequence for the original region versus the duplicated region ought to be identical, after the duplication. If they're not, throw an error. Sound good? I hope all this is making sense to you. |
This is an updated version of my duplication model in SLiM
problem with losing substitutions after restarting fixed by saving after duplication event have taken place
Hi @DanteWensby! I'm swamped as usual, but will try to get to this soon. Are you impatient to have it reviewed, or are you busy with other things too? |
|
@DanteWensby no worries, you didn't seem impatient. :-> Did you address my comments from the previous review? Or is this revision for a different purpose? |
@bhaller Yes it was to address your comments, hopefully to a satisfactory extent. as well as some minor adjustments |
OK great, I'll get to it soon-ish. Let me know if you get more impatient that you presently are. :-> |
First draft of SLiM model of single duplication. Where part of the genome is allocated to house the duplication once it is introduced, since modification of genome length is not possible in SLiM