Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selective sweep tests in inbred lines #105

Open
fersall opened this issue Oct 30, 2023 · 9 comments
Open

selective sweep tests in inbred lines #105

fersall opened this issue Oct 30, 2023 · 9 comments

Comments

@fersall
Copy link

fersall commented Oct 30, 2023

Hello, I am doing a whole genome scan for signals of positive selection in two populations of inbred lines. I have performed the xp-ehh and xp-nsl tests. The jobs did finish without any error except these two types of warning messages in the log file.

WARNING: Reached chromosome edge before EHH decayed below 0.05. Skipping calculation at position 5666 id: .

WARNING: Reached a gap of 516027bp > 200000bp. Skipping calculation at position 1680607 id: .

Since these samples are inbred lines the long haplotypes distances may be very large. Currently I am using default values for --cutoff and for --max-gap. So, my question is whether should I increase the value for those parameters in order to solve that issue?

Thanks

@szpiech
Copy link
Owner

szpiech commented Oct 31, 2023 via email

@fersall
Copy link
Author

fersall commented Nov 1, 2023

Thanks a lot for your comments Zachary! I did increase the --cutoff to 0.10 and certainly I reduced the number of those warnings from 432 to 431.
I did not change --max-gap parameter since my genome wide SNP data set is quite dense, ~800K filtered by position to keep a minimal distance of 1K between each marker.
Also, my XPEHH values range from -0.67 to 0.50 across the genome (please see my attached plot)
fem_allChr_xp-ehh_4r2_ed
. In the literature it is considered as signature of positive selection those values higher than 2. So, this short range (of xpehh values). Is it normal? Or could it be due to lack of power of the test to detect signal of selection on my data?

Best

Fernando

@fersall fersall closed this as completed Nov 1, 2023
@fersall fersall reopened this Nov 1, 2023
@fersall
Copy link
Author

fersall commented Nov 1, 2023

Sorry, closed the post my mistake!

@fersall
Copy link
Author

fersall commented Nov 1, 2023

Hello again, I just realized that the built in program norm has to be executed after running --xpehh. I understood from the manual that the results were already standardized with that program. So, I just ran it and my results looks much better now! Is it a general rule to consider a positive signature of selection those values higher than 2? Or is it mostly arbitrary?

Also, I understand that it is much powerful to look for windows of consecutive SNPs with higher EHH scores rather than consider each SNP individually. Is there a way to achieve that in Selscan?

Thanks a lot for you help and I am sorry for asking many newbie questions!

@szpiech
Copy link
Owner

szpiech commented Nov 3, 2023 via email

@fersall
Copy link
Author

fersall commented Nov 7, 2023

Hi Zachary, I do have a question regarding the 2 directions to consider that you mentioned. First I want to give you a little more context. My "ref" population includes old lines (lets say founders lines), while the "non-ref" one includes modern lines. In case of putative selection in the "ref" population (negative scores). One would expect that the focal allele would increase in frequency in the "non-ref" population because of they were selected for their beneficial effect on that population. While in the case of putative selection in the "non-ref" population (positive scores) it would be the opposite. Am I correct?

@szpiech
Copy link
Owner

szpiech commented Nov 7, 2023 via email

@fersall
Copy link
Author

fersall commented Nov 20, 2023

Hi Zachary,
I run this command norm --xpehh --bp-win --winsize 100000 --files *.xpehh.out and I got the output

1	100001	0	-1	-1	-1	-1	NA	NA
100001	200001	0	-1	-1	-1	-1	NA	NA
200001	300001	34	0	0.0588235	100	5	-0.544703	-2.06226
300001	400001	32	0	0.53125	100	5	-0.573806	-2.86754
400001	500001	32	0	0	100	100	0.00109769	-0.850957
500001	600001	40	0	0	100	100	-0.287803	-1.1771

I found out in one of the questions you answered previously that the headers of each column correspond to:

<win start> <win end> <# scores in win> <frac scores gt threshold> <frac scores lt threshold> <approx percentile for gt threshold wins> <approx percentile for lt threshold wins> <max score> <min score>

And also, that <frac scores gt threshold> corresponds to "the fraction of XP-EHH scores =>2". But I am not clear what is <frac scores lt threshold>, Do they correspond to the reference and non-ref pop, respectively? I am sorry if that is a dumb question! I just want to make sure I am interpreting the results correctly.

Thanks

@szpiech
Copy link
Owner

szpiech commented Nov 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants