Skip to content

Commit

Permalink
cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
szpiech committed Mar 22, 2014
1 parent 2b70228 commit 3b6162c
Show file tree
Hide file tree
Showing 9 changed files with 373 additions and 145 deletions.
121 changes: 94 additions & 27 deletions README
Original file line number Diff line number Diff line change
@@ -1,52 +1,119 @@
selscan is an implementation of some haplotype based scans for selection including iHS and XP-EHH.
selscan -- a program to calculate EHH-based scans for positive selection in
genomes.

A typical iHS scan would be run something like:
Copyright (C) 2014 Zachary A Szpiech

./selscan --ihs --alt --hap <haplotypes> --map <mapfile> --out <outfile>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Other options are below, currently listed without much explanation. Some options have not been tested, but iHS and XPEHH are working correctly.
Source code and binaries can be found at
https://www.github.com/szpiech/selscan

--alt <bool> If --ihs is set, this flag will calculate the exact homozygosity of the sample instead of using allele frequencies.
If --sumsq or --sqsum or --sumfreq is set, this flag will change ratio from H_c/H to H_c/(H_r + 1).
selscan currently implements EHH, iHS, and XP-EHH.

--cutoff <double> The EHH decay cutoff.
Citations:

--filter <double> If a site has a MAF below this value, the program will not use it as a core snp.
ZA Szpiech and RD Hernandez (201X) selscan: an efficient multi-threaded program
to calculate EHH-based scans for positive selection. Journal, xx: xx-xx.
PC Sabeti, et al. (2007) Genome-wide detection and characterization of positive
selection in human populations. Nature, 449: 913–918.
BF Voight, et al. (2006) A map of recent positive selection in the human
genome. PLoS Biology, 4: e72.
PC Sabeti, et al. (2002) Detecting recent positive selection in the human
genome from haplotype structure. Nature, 419: 832–837.

--gap-scale <int> Gap scale parameter. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GAP.
To calculate EHH:

--garud <bool> Set this flag to calculate the Garud et al. statistic.
./selscan --ehh <locusID> --hap <haplotypes> --map <mapfile> --out <outfile>

--hap <string> A hapfile with one row per individual, and one column per variant. Variants should be coded 0/1/-9.
Output file: <outfile>.ehh.<locusID>[.alt].out
Format: <physicalPos> <geneticPos> <'1' EHH> <'0' EHH>

--hapfreq <double> The rare haplotype frequency cutoff.
To calculate iHS:

--help <bool> Prints this help dialog.
./selscan --ihs --hap <haplotypes> --map <mapfile> --out <outfile>

--ihp <bool> Set this flag to calculate iHP.
Output file: <outfile>.ihs[.alt].out
Format: <locusID> <physicalPos> <'1' freq> <ihh1> <ihh0> <unstandardized iHS>

--ihs <bool> Set this flag to calculate iHS.
To calculate XP-EHH:

--map <string> A mapfile with one row per variant site. Formatted <chr#> <locusID> <genetic pos> <physical pos>.
./selscan --xpehh --hap <pop1 haplotypes> --ref <pop2 haplotypes> --map <mapfile> --out <outfile>

--max-gap <int> Maximum allowed gap between two snps.
Output file: <outfile>.ihs[.alt].out
Format: <locusID> <physicalPos> <geneticPos> <popA '1' freq> <ihhA> <popB '1' freq> <ihhB> <unstandardized XPEHH>

--out <string> The basename for all output files.
Examples:

--query <string> Query a specific locus instead of scanning the whole dataset.
selscan --ihs --hap example.hap --map example.map --out example
selscan --ehh Locus4077 --hap example.hap --map example.map --out example
selscan --xpehh --hap example2.pop1.hap --ref example2.pop2.hap --map example2.map --out example2

--qwin <int> The length of the query window in each direction from the query locus.
----------Command Line Arguments----------

--ref <string> A hapfile with one row per individual, and one column per variant. Variants should be coded 0/1/-9. This is the reference population for XP-EHH
calculations. Ignored otherwise.
--hap <string>: A hapfile with one row per individual, and one column per
variant. Variants should be coded 0/1/-9.
Default: __hapfile1

--sqsum <bool> Set this flag to calculate the square of the summed allele frequencies * ratio.
--map <string>: A mapfile with one row per variant site.
Formatted <chr#> <locusID> <genetic pos> <physical pos>.
Default: __mapfile

--sumfreq <bool> Set this flag to calculate the sum of the allele frequencies * ratio.
--ref <string>: A hapfile with one row per individual, and one column per
variant. Variants should be coded 0/1/-9. This is the 'reference'
population for XP-EHH calculations. Ignored otherwise.
Default: __hapfile2

--sumsq <bool> Set this flag to calculate the sum of squared allele frequencies * ratio.
--out <string>: The basename for all output files.
Default: outfile

--threads <int> The number of threads to spawn during the calculation. Partitions locus calculations across threads.
--ehh <string>: Calculate EHH of the '1' and '0' haplotypes at the specified
locus. Output: <physical dist> <genetic dist> <'1' EHH> <'0' EHH>
Default: __NO_LOCUS__

--xpehh <bool> Set this flag to calculate XP-EHH.
--ihs <bool>: Set this flag to calculate iHS.
Default: false

--xpehh <bool>: Set this flag to calculate XP-EHH.
Default: false

--ehh-win <int>: When calculating EHH, this is the length of the window in bp
in each direction from the query locus.
Default: 100000

--cutoff <double>: The EHH decay cutoff.
Default: 0.05

--gap-scale <int>: Gap scale parameter in bp.
If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then
the genetic distance is scaled by GAP_SCALE/GAP.
Default: 20000

--maf <double>: If a site has a MAF below this value, the program will not use
it as a core snp.
Default: 0.05

--max-gap <int>: Maximum allowed gap in bp between two snps.
Default: 200000

--threads <int>: The number of threads to spawn during the calculation.
Partitions loci across threads.
Default: 1

--alt <bool>: Set this flag to calculate homozygosity based on haplotype
frequencies in the observed data.
Default: false

--help <bool>: Prints this help dialog.
Default: false
26 changes: 17 additions & 9 deletions src/norm.cpp
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
/*
* Normalize iHS scores across multiple files (i.e. chromosomes)
* Simple CLI: ./norm <bins> <outfile1> ... <outfile22>
* First:
* Read in all data needed only for calculating E[X] and E[X^2] of the whole data
* Second:
* Load each file separately and normalize, output results by simply adding a column with the normed score
*/

/* norm -- a program for downstream analysis of iHS scores calculated by selscan
Copyright (C) 2014 Zachary A Szpiech
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/

#include <iostream>
#include <fstream>
Expand Down
67 changes: 59 additions & 8 deletions src/param_t.cpp
Original file line number Diff line number Diff line change
@@ -1,14 +1,37 @@
/* param_t -- an interface for command line processing
Copyright (C) 2014 Zachary A Szpiech
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/

#include "param_t.h"
#include <cstdlib>
#include <cstdio>

using namespace std;

bool param_t::addFlag(string flag, bool value, string label, string description)
{
if (help.count(flag) == 0)
{
string buffer;
if(value) buffer = "true";
else buffer = "false";
argb[flag] = value;
help[flag] = "<bool> " + description;
help[flag] = "<bool>: " + description + "\n\tDefault: " + buffer;
labels[flag] = label;
}
else
{
Expand All @@ -23,8 +46,13 @@ bool param_t::addFlag(string flag, double value, string label, string descriptio
{
if (help.count(flag) == 0)
{
string buffer;
char charBuffer[100];
sprintf(charBuffer, "%.2f", value);
buffer = charBuffer;
argd[flag] = value;
help[flag] = "<double> " + description;
help[flag] = "<double>: " + description + "\n\tDefault: " + buffer;
labels[flag] = label;
}
else
{
Expand All @@ -39,8 +67,13 @@ bool param_t::addFlag(string flag, int value, string label, string description)
{
if (help.count(flag) == 0)
{
string buffer;
char charBuffer[100];
sprintf(charBuffer, "%d", value);
buffer = charBuffer;
argi[flag] = value;
help[flag] = "<int> " + description;
help[flag] = "<int>: " + description + "\n\tDefault: " + buffer;
labels[flag] = label;
}
else
{
Expand All @@ -55,8 +88,13 @@ bool param_t::addFlag(string flag, char value, string label, string description)
{
if (help.count(flag) == 0)
{
string buffer;
char charBuffer[100];
sprintf(charBuffer, "%c", value);
buffer = charBuffer;
argch[flag] = value;
help[flag] = "<char> " + description;
help[flag] = "<char>: " + description + "\n\tDefault: " + buffer;
labels[flag] = label;
}
else
{
Expand All @@ -72,7 +110,8 @@ bool param_t::addFlag(string flag, string value, string label, string descriptio
if (help.count(flag) == 0)
{
args[flag] = value;
help[flag] = "<string> " + description;
help[flag] = "<string>: " + description + "\n\tDefault: " + value;
labels[flag] = label;
}
else
{
Expand All @@ -93,7 +132,8 @@ bool param_t::addListFlag(string flag, string value, string label, string descr
if (help.count(flag) == 0)
{
listargs[flag].push_back(value);
help[flag] = "<string> " + description;
help[flag] = "<string1> ... <stringN>: " + description + "\n\tDefault: " + value;
labels[flag] = label;
}
else
{
Expand All @@ -113,11 +153,16 @@ void param_t::printHelp()
{
map<string, string>::iterator it;

cerr << "\n++++++++++Command Line Arguments++++++++++\n\n";
cerr << preamble << endl;

cerr << "----------Command Line Arguments----------\n\n";

for (it = help.begin(); it != help.end(); it++)
{
cerr << (*it).first << " " << (*it).second << "\n";
if (labels[(*it).first].compare("SILENT") != 0)
{
cerr << (*it).first << " " << (*it).second << "\n\n";
}
}

return;
Expand Down Expand Up @@ -352,3 +397,9 @@ vector<string> param_t::getStringListFlag(string flag)

cerr << "ERROR: There are no string list flags named " << flag << "\n";
}

void param_t::setPreamble(string str)
{
preamble = str;
return;
}
23 changes: 23 additions & 0 deletions src/param_t.h
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
/* param_t -- an interface for command line processing
Copyright (C) 2014 Zachary A Szpiech
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef __PARAM_T_H__
#define __PARAM_T_H__

Expand Down Expand Up @@ -37,6 +54,8 @@ class param_t

vector<string> getStringListFlag(string flag);

void setPreamble(string str);

param_t();


Expand All @@ -53,9 +72,13 @@ class param_t
map<string, string> help;
map<string, bool> isSet;

map<string, string> labels;

bool goodDouble(string str);
bool goodInt(string str);
bool goodChar(string str);

string preamble;
};

#endif
17 changes: 17 additions & 0 deletions src/selscan-data.cpp
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
/* selscan -- a program to calculate EHH-based scans for positive selection in genomes
Copyright (C) 2014 Zachary A Szpiech
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include "selscan-data.h"

//reads in map data and also does basic checks on integrity of format
Expand Down
18 changes: 18 additions & 0 deletions src/selscan-data.h
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
/* selscan -- a program to calculate EHH-based scans for positive selection in genomes
Copyright (C) 2014 Zachary A Szpiech
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/

#ifndef __XP_IHH_DATA_H__
#define __XP_IHH_DATA_H__
#include <string>
Expand Down
Loading

0 comments on commit 3b6162c

Please sign in to comment.