selective sweep tests in inbred lines #105

fersall · 2023-10-30T16:47:34Z

Hello, I am doing a whole genome scan for signals of positive selection in two populations of inbred lines. I have performed the xp-ehh and xp-nsl tests. The jobs did finish without any error except these two types of warning messages in the log file.

WARNING: Reached chromosome edge before EHH decayed below 0.05. Skipping calculation at position 5666 id: .

WARNING: Reached a gap of 516027bp > 200000bp. Skipping calculation at position 1680607 id: .

Since these samples are inbred lines the long haplotypes distances may be very large. Currently I am using default values for --cutoff and for --max-gap. So, my question is whether should I increase the value for those parameters in order to solve that issue?

Thanks

The text was updated successfully, but these errors were encountered:

szpiech · 2023-10-31T16:06:49Z

Hello, You could certainly increase both of these parameters, and you should encounter fewer of these warnings. I usually hesitate to increase the --max-gap parameter unless you have generally sparse data. Perhaps this is the case for you, since you mention you're using highly inbred lines I imagine there are fewer variants called (and they are probably farther apart). You could theoretically also use --trunc-ok, which would have the effect of integrating the scores at the sites that otherwise encounter those warnings, but I typically don't recommend this option unless you are evaluating short simulated regions or you absolutely need a score at an otherwise filtered site. Hope this helps, Zachary

…

On Mon, Oct 30, 2023 at 12:47 PM fersall ***@***.***> wrote: Hello, I am using doing a whole genome scan for signals of positive selection in two populations of inbred lines. I have performed the xp-ehh and xp-nsl tests. The jobs did finish without any error except these two types of warning messages in the log file. WARNING: Reached chromosome edge before EHH decayed below 0.05. Skipping calculation at position 5666 id: . WARNING: Reached a gap of 516027bp > 200000bp. Skipping calculation at position 1680607 id: . Since these samples are inbred lines the long haplotypes distances may be very large. Currently I am using default values for --cutoff and for --max-gap. So, my question is whether should I increase the value for those parameters in order to solve that issue? Thanks — Reply to this email directly, view it on GitHub <#105>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAKRQRB3UXPBI4XPUV5Y3LYB7K3FAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DQOBQHAYTAOA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

fersall · 2023-11-01T15:53:21Z

Thanks a lot for your comments Zachary! I did increase the --cutoff to 0.10 and certainly I reduced the number of those warnings from 432 to 431.
I did not change --max-gap parameter since my genome wide SNP data set is quite dense, ~800K filtered by position to keep a minimal distance of 1K between each marker.
Also, my XPEHH values range from -0.67 to 0.50 across the genome (please see my attached plot)

. In the literature it is considered as signature of positive selection those values higher than 2. So, this short range (of xpehh values). Is it normal? Or could it be due to lack of power of the test to detect signal of selection on my data?

Best

Fernando

fersall · 2023-11-01T15:55:05Z

Sorry, closed the post my mistake!

fersall · 2023-11-01T21:39:40Z

Hello again, I just realized that the built in program norm has to be executed after running --xpehh. I understood from the manual that the results were already standardized with that program. So, I just ran it and my results looks much better now! Is it a general rule to consider a positive signature of selection those values higher than 2? Or is it mostly arbitrary?

Also, I understand that it is much powerful to look for windows of consecutive SNPs with higher EHH scores rather than consider each SNP individually. Is there a way to achieve that in Selscan?

Thanks a lot for you help and I am sorry for asking many newbie questions!

szpiech · 2023-11-03T13:53:48Z

Hi, Glad you found norm! So with norm you can also use the --bp-wins flag to examine clusters of extreme scores either <-2 or >2. I wouldn't take a score ~2 alone very seriously, but if you found 90% of scores in a region

2 this is more compelling. Since you're using an XP stat, there will be

two "directions" to consider, putative selection in the "ref" population (negative scores) and putative selection in the "non-ref" population (positive scores). Let me know if you have any issues. Zachary

…

On Wed, Nov 1, 2023 at 5:39 PM fersall ***@***.***> wrote: Hello again, I just realized that the built in program norm has to be executed after running --xpehh. I understood from the manual that the results were already standardized with that program. So, I just ran it and my results looks much better now! Is it a general rule to consider a positive signature of selection those values higher than 2? Or is it mostly arbitrary? Also, I understand that it is much powerful to look for windows of consecutive SNPs with higher EHH scores rather than consider each SNP individually. Is there a way to achieve that in Selscan? Thanks a lot for you help and I am sorry for asking many newbie questions! — Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAKRQWRXGZGRIKDL7ZMWLLYCK6SNAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZG4ZDQMRZHE> . You are receiving this because you commented.Message ID: ***@***.***>

fersall · 2023-11-07T14:33:34Z

Hi Zachary, I do have a question regarding the 2 directions to consider that you mentioned. First I want to give you a little more context. My "ref" population includes old lines (lets say founders lines), while the "non-ref" one includes modern lines. In case of putative selection in the "ref" population (negative scores). One would expect that the focal allele would increase in frequency in the "non-ref" population because of they were selected for their beneficial effect on that population. While in the case of putative selection in the "non-ref" population (positive scores) it would be the opposite. Am I correct?

szpiech · 2023-11-07T16:39:56Z

Hello, I think you made a typo, so I will fix it for clarity below. In case of putative selection in the "ref" population (negative scores). One would expect that the focal allele would increase in frequency in the *"ref"* population because they were selected for their beneficial effect on that population. While in the case of putative selection in the "non-ref" population (positive scores) it would be the opposite. So, with this typo fixed, yes this is correct.

…

-Zachary

On Tue, Nov 7, 2023 at 9:33 AM fersall ***@***.***> wrote: Hi Zachary, I do have a question regarding the 2 directions to consider that you mentioned. First I want to give you a little more context. My "ref" population include old lines (lets say founders lines), while the "non-ref" one include modern lines. In case of putative selection in the "ref" population (negative scores). One would expect that the focal allele would increase in frequency in the "non-ref" population because of they were selected for their beneficial effect on that population. While in the case of putative selection in the "non-ref" population (positive scores) it would be the opposite. Am I correct? — Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAKRQVRHUEDZ24ZF2VQG7LYDJBEVAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJYGY3DKNBSGY> . You are receiving this because you commented.Message ID: ***@***.***>

fersall · 2023-11-20T22:26:53Z

Hi Zachary,
I run this command norm --xpehh --bp-win --winsize 100000 --files *.xpehh.out and I got the output

1	100001	0	-1	-1	-1	-1	NA	NA
100001	200001	0	-1	-1	-1	-1	NA	NA
200001	300001	34	0	0.0588235	100	5	-0.544703	-2.06226
300001	400001	32	0	0.53125	100	5	-0.573806	-2.86754
400001	500001	32	0	0	100	100	0.00109769	-0.850957
500001	600001	40	0	0	100	100	-0.287803	-1.1771

I found out in one of the questions you answered previously that the headers of each column correspond to:

<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

And also, that <frac scores gt threshold> corresponds to "the fraction of XP-EHH scores =>2". But I am not clear what is <frac scores lt threshold>, Do they correspond to the reference and non-ref pop, respectively? I am sorry if that is a dumb question! I just want to make sure I am interpreting the results correctly.

Thanks

szpiech · 2023-11-21T15:57:04Z

Hi, No worries, this one is <=-2. Zachary Le lun. 20 nov. 2023 à 5:27 PM, fersall ***@***.***> a écrit :

…

Hi Zachary, I run this command norm --xpehh --bp-win --winsize 100000 --files *.xpehh.out and I got the output 1 100001 0 -1 -1 -1 -1 NA NA 100001 200001 0 -1 -1 -1 -1 NA NA 200001 300001 34 0 0.0588235 100 5 -0.544703 -2.06226 300001 400001 32 0 0.53125 100 5 -0.573806 -2.86754 400001 500001 32 0 0 100 100 0.00109769 -0.850957 500001 600001 40 0 0 100 100 -0.287803 -1.1771 I found out in one of your questions answered previously that the headers of each column correspond to: <win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score> And also, that <frac scores *gt* threshold> corresponds to "the fraction of XP-EHH scores =>2". But I am not clear what is <frac scores *lt* threshold>, Do they correspond to the reference and non-ref pop, respectively? I am sorry if that is a dumb question! I just want to make sure I am interpreting the results correctly. Thanks — Reply to this email directly, view it on GitHub <#105 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABAKRQRIVOK4NZDGFKQTTX3YFPKLRAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJZHEYDIOJQGQ> . You are receiving this because you commented.Message ID: ***@***.***>

fersall closed this as completed Nov 1, 2023

fersall reopened this Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selective sweep tests in inbred lines #105

selective sweep tests in inbred lines #105

fersall commented Oct 30, 2023 •

edited

Loading

szpiech commented Oct 31, 2023 via email

fersall commented Nov 1, 2023 •

edited

Loading

fersall commented Nov 1, 2023

fersall commented Nov 1, 2023

szpiech commented Nov 3, 2023 via email

fersall commented Nov 7, 2023 •

edited

Loading

szpiech commented Nov 7, 2023 via email

fersall commented Nov 20, 2023 •

edited

Loading

szpiech commented Nov 21, 2023 via email

selective sweep tests in inbred lines #105

selective sweep tests in inbred lines #105

Comments

fersall commented Oct 30, 2023 • edited Loading

szpiech commented Oct 31, 2023 via email

fersall commented Nov 1, 2023 • edited Loading

fersall commented Nov 1, 2023

fersall commented Nov 1, 2023

szpiech commented Nov 3, 2023 via email

fersall commented Nov 7, 2023 • edited Loading

szpiech commented Nov 7, 2023 via email

fersall commented Nov 20, 2023 • edited Loading

szpiech commented Nov 21, 2023 via email

fersall commented Oct 30, 2023 •

edited

Loading

fersall commented Nov 1, 2023 •

edited

Loading

fersall commented Nov 7, 2023 •

edited

Loading

fersall commented Nov 20, 2023 •

edited

Loading