Create Duplication_model.slim #33

DanteWensby · 2024-12-19T13:36:37Z

First draft of SLiM model of single duplication. Where part of the genome is allocated to house the duplication once it is introduced, since modification of genome length is not possible in SLiM

bhaller

Hi Dante! Sorry for the delay in getting this done. Code review like this is a bit time-consuming so I needed a good block of undistracted time to do it. Overall the model looks great; you've done a nice job of implementing the idea. I think there are a handful of minor errors, and some cleanup and commenting to be done. Once you've pushed fixes from this review, I'll review it again including actually running the model and observing it in SLiMgui, etc., to make sure it seems to be working as intended. (You should be doing that too, of course! :->) For the next revision of this PR, please revise the README file to add a mention of this model. Also, please add a block comment at the top of the file that states what it does, and credits yourself in whatever way you want to be credited, and gives a date for when the model was written, stuff like that. A header comment that describes the file with credits. Sound good? Thanks a bunch for doing this! It's a good idea, and I'm surprised nobody (myself included) has done it before now!

bhaller · 2024-12-21T00:13:37Z

models/Duplication_model.slim

@@ -0,0 +1,167 @@
+initialize() {
+    initializeSLiMOptions(nucleotideBased=T);


For demonstration purposes, probably no reason for this to be nucleotide-based, right?

(But if you want it to be, it seems harmless...)

bhaller · 2024-12-21T00:16:03Z

models/Duplication_model.slim

+    initializeSLiMOptions(nucleotideBased=T);
+    defineConstant("L", 1010000);              // Genome length
+    defineConstant("DUP_LENGTH", 10000);       // Duplication size
+    defineConstant("DUP_START", L - DUP_LENGTH);  // Start position of duplication


Hmm, I guess so that L isn't implicitly hard-coded with DUP_LENGTH built into it I'd suggest (1) make L_BASE be 1000000, (2) make DUP_START be equal to L_BASE, and then (3) make L be L_BASE + DUP_LENGTH. Something like that.

bhaller · 2024-12-21T00:18:06Z

models/Duplication_model.slim

+    defineConstant("DUP_START", L - DUP_LENGTH);  // Start position of duplication
+    defineConstant("N", 500);                  // Population size
+
+    initializeTreeSeq();


For demo purposes I'd also remove this. (I realize you probably want this in your model, I'm reviewing for SLiM-Extras purposes right now.) Tree-seq will probably be fairly confused by the gene duplication stuff anyway. If you intend to also submit an analysis script in Python that is smart enough to do the right thing with the way that the duplication events are recorded, then that's awesome, and in that case, leave tree-seq in. Otherwise, probably take it out and keep the recipe as simple as possible.

bhaller · 2024-12-21T00:19:00Z

models/Duplication_model.slim

+    initializeTreeSeq();
+
+    // Generate random nucleotides for the first L - DUP_LENGTH bases
+    randomPart = sample(c("A", "T", "C", "G"), L - DUP_LENGTH - 1, replace=T);


This is another place where L_BASE would be useful. Also, note that there is a randomNucleotides() function built into SLiM; I think you can use it here, yes?

(But if you take nucleotides out of the demo model, then this goes away anyway)

bhaller · 2024-12-21T00:22:09Z

models/Duplication_model.slim

+    m2.convertToSubstitution = F;
+
+    // Define genomic element
+    initializeGenomicElementType("g1", m1, 1.0, mmJukesCantor(1e-7));


Not sure if you know this or not, but 1e-7 here is the alpha parameter to the JK model, and the realized mutation rate is 3*alpha, so your realized mutation rate is 3e-7. Is that what you want? (Don't blame me, blame Jukes and Cantor :->)

bhaller · 2024-12-21T01:20:06Z

models/Duplication_model.slim

+    }
+    if (gm1 & gm2) {
+        //homozygot duplication
+        return F;


This is correct; you have not modified the breakpoints in this code branch.

bhaller · 2024-12-21T01:25:13Z

models/Duplication_model.slim

+        return F;
+    }
+        else {
+        // heterozygote duplicated: resample to get an even # of breakpoints


So, here one parental genome is duplicated and the other is not. So presumably recombination between them is not allowed? So you just want to filter out breakpoints in the duplication region, just like in case 1. If the copy strand, when you reach DUP_START, is on the non-duplicate strand you'll make a non-duplicate gamete; if it is on the duplicate strand, you'll make a duplicate gamete. That should just work, I think? So the code here seems fine, except for needing the same change that I commented on above for case 1.

But you have this comment above, "resample to get an even # of breakpoints". I don't know why you want to do that, and it doesn't look like the code here tries to do that. Please clarify.

bhaller · 2024-12-21T01:25:31Z

models/Duplication_model.slim

+        // If no breakpoints remain after filtering, return F (no recombination)
+        if (length(breakpoints) == 0) {
+            return F;
+        }


remove these lines, as above; incorrect and unnecessary

bhaller · 2024-12-21T01:30:37Z

models/Duplication_model.slim

+    randomPart = sample(c("A", "T", "C", "G"), L - DUP_LENGTH - 1, replace=T);
+
+    // Define the segment to duplicate (last DUP_LENGTH bases of randomPart)
+    duplicatedPart = randomPart[(length(randomPart) - DUP_LENGTH):length(randomPart) - 1];


Does this do exactly what you want it to do? The tricky thing is that, because of operator precedence, A:B-1 generates a sequence from A-1 to B-1, not a sequence from A to B-1 as people often expect. If this code does do what you intend it to do, then it would be a good idea to add parentheses to make that clear, as (A:B)-1, so the reader knows it isn't an error. The parens are unnecessary, and maybe you know what you're doing, but the precendence of : is such a common source of bugs that it is really better to make it very explicitly clear that you're doing what you intend to do. Check this carefully.

bhaller · 2024-12-21T01:31:11Z

models/Duplication_model.slim

+    // Initialize the ancestral sequence with the full sequence
+    initializeAncestralNucleotides(fullSequence);
+
+


trim blank lines down to one or maybe 2 if you want section separation

bhaller · 2024-12-21T02:26:59Z

Hi again. Pondering this, I realized there's an issue we hadn't thought about: fixed mutations at the time of duplication. This is particularly apparent with nucleotides, but it's an issue even without nucleotides too, really. Suppose the initial state of the model, with nucleotides is AAAA|AAAA, showing a duplicated region just four bases long, and ignoring all the bases to the left of the original region that gets duplicated. So AAAA is the original region, and your code replicates that in the ancestral sequence so we have AAAA|AAAA. Now we simulate the burn-in, and at some point a mutation in the original region fixes, so now we have AATA|AAAA. The T mutation has fixed, so it has been removed and turned into a substitution object. Now the duplication event occurs, and mutations get copied – but the T does not get copied, because it is not a mutation. What to do about this?

The easy solution is to set convertToSubstitution to F for m1 also. Then the T mutation is still a mutation, and so it gets copied to the duplicated region correctly. That can make models very slow, though; they get bogged down with mutations that have fixed and are present in every genome. So a better solution would be preferable. I think it would work for your duplication event code to first duplicate substitutions within the original region. In other words, first get sim.substitutions, select the ones in the original region, and create new mutations for them in the duplicated region, just the way you do for mutations now. Then – it has to be after the substitutions – do the same for the mutations in the original region, as you do now.

This works nicely in the nucleotide-based model because nucleotide-based mutations don't stack, so if a particular position has had successive fixation events, and maybe now has a segregating mutation on top of those past fixations, adding new mutations into the duplicated region in the correct order will leave you in the correct state. In a non-nucleotide model the mutations you add would stack, instead, which would not be what you'd want. You'd want to change the stacking policy, or otherwise jump through some hoops to address that problem. Setting a stacking policy of 'l' (that's a lowercase L) on the m2 mutation type, in a non-nucleotide model, would do the trick and seems perfectly reasonable.

Anyhow, those are some additional thoughts. It would be good to figure out a way to test the model for correctness. In the nucleotide-based version (and the more I ponder these things, the more I think you ought to keep the model nucleotide-based, as the idea just works much more smoothly in that paradigm), you could do a check, at the end of the duplication event code, using the genome method nucleotides(). The nucleotide sequence for the original region versus the duplicated region ought to be identical, after the duplication. If they're not, throw an error.

Sound good? I hope all this is making sense to you.

This is an updated version of my duplication model in SLiM

problem with losing substitutions after restarting fixed by saving after duplication event have taken place

bhaller · 2025-01-14T20:55:33Z

Hi @DanteWensby! I'm swamped as usual, but will try to get to this soon. Are you impatient to have it reviewed, or are you busy with other things too?

DanteWensby · 2025-01-15T08:43:50Z

Hi @DanteWensby! I'm swamped as usual, but will try to get to this soon. Are you impatient to have it reviewed, or are you busy with other things too?
@bhaller
I am busy with other things too, so no need to reviewee any time soon. I'm sorry if I made the impression that i was impatient, i just did some minor adjustments. :)

bhaller · 2025-01-15T12:29:22Z

@DanteWensby no worries, you didn't seem impatient. :-> Did you address my comments from the previous review? Or is this revision for a different purpose?

DanteWensby · 2025-01-15T12:44:25Z

@bhaller Yes it was to address your comments, hopefully to a satisfactory extent. as well as some minor adjustments

bhaller · 2025-01-15T12:49:34Z

OK great, I'll get to it soon-ish. Let me know if you get more impatient that you presently are. :->

Create Duplication_model.slim

a78a17b

First draft of SLiM model of single duplication. Where part of the genome is allocated to house the duplication once it is introduced, since modification of genome length is not possible in SLiM

bhaller reviewed Dec 21, 2024

View reviewed changes

bhaller marked this pull request as draft December 21, 2024 01:41

Update Duplication_model.slim

576a1dd

This is an updated version of my duplication model in SLiM

DanteWensby marked this pull request as ready for review January 3, 2025 14:25

DanteWensby added 2 commits January 14, 2025 10:07

Update Duplication_model.slim

ed9f6b6

Update Duplication_model.slim

4f3ab00

problem with losing substitutions after restarting fixed by saving after duplication event have taken place

DanteWensby requested a review from bhaller January 14, 2025 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Duplication_model.slim #33

Create Duplication_model.slim #33

DanteWensby commented Dec 19, 2024

bhaller left a comment

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller Dec 21, 2024

bhaller commented Dec 21, 2024

bhaller commented Jan 14, 2025

DanteWensby commented Jan 15, 2025 •

edited

Loading

bhaller commented Jan 15, 2025

DanteWensby commented Jan 15, 2025

bhaller commented Jan 15, 2025

		@@ -0,0 +1,167 @@
		initialize() {
		initializeSLiMOptions(nucleotideBased=T);

		// Initialize the ancestral sequence with the full sequence
		initializeAncestralNucleotides(fullSequence);

Create Duplication_model.slim #33

Are you sure you want to change the base?

Create Duplication_model.slim #33

Conversation

DanteWensby commented Dec 19, 2024

bhaller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhaller commented Dec 21, 2024

bhaller commented Jan 14, 2025

DanteWensby commented Jan 15, 2025 • edited Loading

bhaller commented Jan 15, 2025

DanteWensby commented Jan 15, 2025

bhaller commented Jan 15, 2025

DanteWensby commented Jan 15, 2025 •

edited

Loading