-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
selective sweep tests in inbred lines #105
Comments
Hello,
You could certainly increase both of these parameters, and you should
encounter fewer of these warnings. I usually hesitate to increase the
--max-gap parameter unless you have generally sparse data. Perhaps this is
the case for you, since you mention you're using highly inbred lines I
imagine there are fewer variants called (and they are probably farther
apart).
You could theoretically also use --trunc-ok, which would have the effect of
integrating the scores at the sites that otherwise encounter those
warnings, but I typically don't recommend this option unless you are
evaluating short simulated regions or you absolutely need a score at an
otherwise filtered site.
Hope this helps,
Zachary
…On Mon, Oct 30, 2023 at 12:47 PM fersall ***@***.***> wrote:
Hello, I am using doing a whole genome scan for signals of positive
selection in two populations of inbred lines. I have performed the xp-ehh
and xp-nsl tests. The jobs did finish without any error except these two
types of warning messages in the log file.
WARNING: Reached chromosome edge before EHH decayed below 0.05. Skipping
calculation at position 5666 id: .
WARNING: Reached a gap of 516027bp > 200000bp. Skipping calculation at
position 1680607 id: .
Since these samples are inbred lines the long haplotypes distances may be
very large. Currently I am using default values for --cutoff and for
--max-gap. So, my question is whether should I increase the value for
those parameters in order to solve that issue?
Thanks
—
Reply to this email directly, view it on GitHub
<#105>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAKRQRB3UXPBI4XPUV5Y3LYB7K3FAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DQOBQHAYTAOA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sorry, closed the post my mistake! |
Hello again, I just realized that the built in program Also, I understand that it is much powerful to look for windows of consecutive SNPs with higher EHH scores rather than consider each SNP individually. Is there a way to achieve that in Selscan? Thanks a lot for you help and I am sorry for asking many newbie questions! |
Hi,
Glad you found norm! So with norm you can also use the --bp-wins flag to
examine clusters of extreme scores either <-2 or >2. I wouldn't take a
score ~2 alone very seriously, but if you found 90% of scores in a region
2 this is more compelling. Since you're using an XP stat, there will be
two "directions" to consider, putative selection in the "ref" population
(negative scores) and putative selection in the "non-ref" population
(positive scores).
Let me know if you have any issues.
Zachary
…On Wed, Nov 1, 2023 at 5:39 PM fersall ***@***.***> wrote:
Hello again, I just realized that the built in program norm has to be
executed after running --xpehh. I understood from the manual that the
results were already standardized with that program. So, I just ran it and
my results looks much better now! Is it a general rule to consider a
positive signature of selection those values higher than 2? Or is it mostly
arbitrary?
Also, I understand that it is much powerful to look for windows of
consecutive SNPs with higher EHH scores rather than consider each SNP
individually. Is there a way to achieve that in Selscan?
Thanks a lot for you help and I am sorry for asking many newbie questions!
—
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAKRQWRXGZGRIKDL7ZMWLLYCK6SNAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZG4ZDQMRZHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Zachary, I do have a question regarding the 2 directions to consider that you mentioned. First I want to give you a little more context. My "ref" population includes old lines (lets say founders lines), while the "non-ref" one includes modern lines. In case of putative selection in the "ref" population (negative scores). One would expect that the focal allele would increase in frequency in the "non-ref" population because of they were selected for their beneficial effect on that population. While in the case of putative selection in the "non-ref" population (positive scores) it would be the opposite. Am I correct? |
Hello,
I think you made a typo, so I will fix it for clarity below.
In case of putative selection in the "ref" population (negative scores).
One would expect that the focal allele would increase in frequency in the
*"ref"* population because they were selected for their beneficial effect
on that population. While in the case of putative selection in the
"non-ref" population (positive scores) it would be the opposite.
So, with this typo fixed, yes this is correct.
…-Zachary
On Tue, Nov 7, 2023 at 9:33 AM fersall ***@***.***> wrote:
Hi Zachary, I do have a question regarding the 2 directions to consider
that you mentioned. First I want to give you a little more context. My
"ref" population include old lines (lets say founders lines), while the
"non-ref" one include modern lines. In case of putative selection in the
"ref" population (negative scores). One would expect that the focal allele
would increase in frequency in the "non-ref" population because of they
were selected for their beneficial effect on that population. While in the
case of putative selection in the "non-ref" population (positive scores) it
would be the opposite. Am I correct?
—
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAKRQVRHUEDZ24ZF2VQG7LYDJBEVAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJYGY3DKNBSGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Zachary,
I found out in one of the questions you answered previously that the headers of each column correspond to:
And also, that <frac scores gt threshold> corresponds to "the fraction of XP-EHH scores =>2". But I am not clear what is <frac scores lt threshold>, Do they correspond to the reference and non-ref pop, respectively? I am sorry if that is a dumb question! I just want to make sure I am interpreting the results correctly. Thanks |
Hi,
No worries, this one is <=-2.
Zachary
Le lun. 20 nov. 2023 à 5:27 PM, fersall ***@***.***> a écrit :
… Hi Zachary,
I run this command norm --xpehh --bp-win --winsize 100000 --files
*.xpehh.out and I got the output
1 100001 0 -1 -1 -1 -1 NA NA
100001 200001 0 -1 -1 -1 -1 NA NA
200001 300001 34 0 0.0588235 100 5 -0.544703 -2.06226
300001 400001 32 0 0.53125 100 5 -0.573806 -2.86754
400001 500001 32 0 0 100 100 0.00109769 -0.850957
500001 600001 40 0 0 100 100 -0.287803 -1.1771
I found out in one of your questions answered previously that the headers
of each column correspond to:
<win start> <win end> <# scores in win> <frac scores gt threshold> <frac
scores lt threshold> <approx percentile for gt threshold wins> <approx
percentile for lt threshold wins> <max score> <min score>
And also, that <frac scores *gt* threshold> corresponds to "the fraction
of XP-EHH scores =>2". But I am not clear what is <frac scores *lt*
threshold>, Do they correspond to the reference and non-ref pop,
respectively? I am sorry if that is a dumb question! I just want to make
sure I am interpreting the results correctly.
Thanks
—
Reply to this email directly, view it on GitHub
<#105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAKRQRIVOK4NZDGFKQTTX3YFPKLRAVCNFSM6AAAAAA6WKXI56VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJZHEYDIOJQGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hello, I am doing a whole genome scan for signals of positive selection in two populations of inbred lines. I have performed the xp-ehh and xp-nsl tests. The jobs did finish without any error except these two types of warning messages in the log file.
WARNING: Reached chromosome edge before EHH decayed below 0.05. Skipping calculation at position 5666 id: .
WARNING: Reached a gap of 516027bp > 200000bp. Skipping calculation at position 1680607 id: .
Since these samples are inbred lines the long haplotypes distances may be very large. Currently I am using default values for
--cutoff
and for--max-gap
. So, my question is whether should I increase the value for those parameters in order to solve that issue?Thanks
The text was updated successfully, but these errors were encountered: