You can access comprehensive documentation for using gwas-sumstat-tools at this link: GWAS SumStats Tools Documentation.
There are four commands, read
, format
validate
and gen_meta
(gen_meta
function is currently only accessible to internal GWAS catalog users.)
read
is for:
- Previewing a data file: no options
- Extracting the field headers:
-h
- Extracting all the metadata:
-M
- Extacting specific field, value pairs from the metada:
-m <field name>
format
is for:
- Converting sumstats data file to the standard format - gwas-ssf. This is not guaranteed to return a valid standard file, because manadatory data fields could be missing in the input.
- Generate a configuration file, which serves as a blueprint for the formatting options.
- Test the configuration file on the first five rows of the input file.
- Apply the configuration file to the entire input file and generate formatted output file
[!NOTE] It is memory efficient and will take approx. 30s per 1 million records
gen_meta
is for:
- Generate metadata for a data file:
-m
- Read metadata in from existing file:
--meta-in <file>
- Create metadata from the GWAS Catalog (internal use, requires authenticated API):
-g
- Edit/add the values to the metadata:
-e
with--<FIELD>=<VALUE>
- Read metadata in from existing file:
validate
is for:
- Validating a summary statistic file using a dynamically generated schema
- python >= 3.9 and <3.12
$ pip3 install gwas-sumstats-tools
The following Docker command is the equivalent to running gwas-ssf
.
$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest
Just append any subcommands or arguments e.g.:
$ docker run -it -v ${PWD}:/application ebispot/gwas-sumstats-tools:latest validate
$ gwas-ssf [OPTIONS] COMMAND [ARGS]...
Options:
--help
: Show this message and exit.
Commands:
validate
: Validate a sumstats fileformat
: Format a sumstats filegen_meta
: generate meta-yaml fileread
: Read a sumstats file
Validate a sumstats file
Usage:
$ gwas-ssf validate [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file. Must be TSV (may be gzipped) [required]
Options:
-e, --errors-out
: Output erros to a csv file, .err.csv.gz-z, --p-zero
: Force p-values of zero to be allowable. Takes precedence over inferred value (-i)-m, --min-rows
: Minimum rows acceptable for the file [default: 100000]-i, --infer-from-metadata
: Infer validation options from the metadata file -meta.yaml. E.g. a populated field for analysis software makes p-values of zero allowable.--help
: Show this message and exit.
Read (preview) a sumstats file
Usage:
$ gwas-ssf read [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file [required]
Options:
-h, --get-header
: Just return the headers of the file [default: False]--meta-in PATH
: Specify a metadata file to read in, defaulting to -meta.yaml-M, --get-all-metadata
: Return all metadata [default: False]-m, --get-metadata TEXT
: Get metadata for the specified fields e.g. `-m genomeAssembly -m isHarmonised--help
: Show this message and exit.
Format a sumstats file and creating a new one. Add/edit metadata.
Usage:
$ gwas-ssf format [OPTIONS] FILENAME
Arguments:
FILENAME
: Input sumstats file. Must be TSV or CSV and may be gzipped [required]
Options:
- Options for reading the input file
-d, --delimiter Text
: Specify the delimiter in the file, if not specified, we can automatically detect the delimiter as whitespace if your file is *.txt, comma if your file is *.csv, or tab if your file is *.tsv.gz. Otherwise, please specify the delimiter which can help to recognise the column correctly-r, --remove_comments Text
: Remove the lines starts with the given character
- Options for generating configuration file
-g, --generate_config Boolean
: To generate the configuration file for the file needed to be formatted--config_out Path
:Specify the configure JSON output file
- Options for applying configuration file
-o, --ss-out PATH
: Output sumstats file-a, --apply_config Boolean
: Apply the given configuration file to the file-t, -test_config Boolean
: Test the given configuration file to the first 5 rows of the file--config_in Path
: Specify a configure JSON file to read in-f, --analysis_software Text
: Specify the analysis software used for generating the summary statistics data-s, --minimal2standard
: Try to convert a valid, minimally formatted file to the standard format.This assumes the file at least hasp_value
combined with rsid invariant_id
field orchromosome
andbase_pair_location
. Validity of the new file is not guaranteed because mandatory data could be missing from the original file. [default: False]
- Options for batch applying configuration file
-b, --batch_apply Boolean
: Apply configuration files to a batch of summary statistics files--lsf Boolean
:Running the batch process via submitting jobs via LSF--slurm Boolean
:Running the batch process via submitting job via Slurm
Generate a meta-yaml file for the existing sumstats file OR edit the existing meta-yaml file.
Usage:
$ gwas-ssf gen_meta [OPTIONS] FILENAME
Example:
# Generate a meta-yaml file from GWAS API (-g) with customised fields (-e --file_type=pre-gwas-ssf) for GCST90278188.tsv files
$ gwas-ssf gen_meta --meta-out GCST90278188.tsv-meta.yaml -g GCST90278188.tsv -e --file_type=pre-gwas-ssf
Arguments:
FILENAME
: Input sumstats file. Must be TSV or CSV and may be gzipped [required]
Options:
--meta-out PATH
: Specify the metadata output file-g, --meta-gwas
: Populate metadata from GWAS Catalog [default: False]-e, --meta-edit
: Enable metadata edit mode. Then provide params to edit in the--<FIELD>=<VALUE>
format e.g.--GWASID=GCST123456
to edit/add that value [default: False]--help
: Show this message and exit.
This repository uses poetry for dependency and packaging management.
To run the tests:
- install poetry
git clone https://github.com/EBISPOT/gwas-sumstats-tools.git
cd gwas-sumstats-tools
python3 -m venv env
pip install poetry
poetry install
poetry run pytest -s
To make a change: branch from master -> PR to master -> poetry version -> git add pyproject.toml -> git commit -> git tag -> git push origin master --tags If all the tests pass, this will publish to pypi.
A simple toolkit for reading and formatting GWAS sumstats files from the GWAS Catalog. Built with: