Skip to content

Commit

Permalink
Merge pull request #184 from timothymillar/atomize
Browse files Browse the repository at this point in the history
Beta v0.10.0
  • Loading branch information
timothymillar authored Sep 18, 2024
2 parents d05d748 + 4abb69c commit 29935f7
Show file tree
Hide file tree
Showing 108 changed files with 8,423 additions and 960 deletions.
9 changes: 4 additions & 5 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,17 @@ name: Python package

on:
push:
branches: [ master ]
branches: [ master, "call-pedigree"]
pull_request:
branches: [ master ]
branches: [ master, "call-pedigree"]

jobs:
build:

runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
python-version: ["3.10", "3.11"]

steps:
- uses: actions/checkout@v2
Expand All @@ -32,8 +32,7 @@ jobs:
uses: pre-commit/[email protected]
- name: Build and install mchap
run: |
python setup.py sdist
pip install dist/mchap-*.tar.gz
pip install .
- name: Test with pytest (bounds checked)
env:
NUMBA_BOUNDSCHECK: 1
Expand Down
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,24 @@
## Unreleased


## Beta v0.10.0

New Features:
- New experimental `atomize` tool for splitting haplotypes into basis SNVs #72.
- New experimental `call-pedigree` tool fo pedigree informed genotype calling.
- Optionally specify just the `INFO` or `FORMAT` variant of a optional VCF field.
- Use `setuptools_scm` for versioning #179.

VCF Changes:
- Renamed `PHQ` and `PHPM` to `SQ` and `SPM` for clarity.
- Added `INFO/UAN` field for number of unique alleles called #174.
- Added `INFO/MCI` field for proportion of sample with Markov Chain incongruence.
- Added optional fields #174:
* `INFO/AOPSUM` (sum of `FORMAT/AOP`).
* `INFO/ACP` and `FORMAT/ACP`.
* `INFO/SNVDP` and `FORMAT/SNVDP`.


## Beta v0.9.3

Bug Fixes:
Expand Down
14 changes: 13 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,17 @@ frequencies (estimated from the mean of individual frequencies), but no genotype
Example notebook
----------------

An `example notebook`_ demonstrating genotype calling with MCHap in a bi-parental population.
See the `example notebook`_ demonstrating genotype calling with MCHap in a bi-parental population.

Experimental features
---------------------

\:warning: **WARNING: The following tools are highly experimental!!!** :warning:

- ``mchap call-pedigree``: for pedigree informed genotype calling.
- ``mchap atomize``: for converting micro-haplotype calls to phased sets of SNVs.

See the `experimental notebook`_ demonstrating the `call-pedigree` tool as presented at the 2024 `Tools for Polyploids`_ workshop.

Funding
-------
Expand All @@ -80,3 +90,5 @@ The development of MCHap was partially funded by the "Tools for Polyploids" Spec
.. _`MCHap assemble documentation`: docs/assemble.rst
.. _`MCHap call documentation`: docs/call.rst
.. _`example notebook`: docs/example/bi-parental.ipynb
.. _`experimental notebook`: docs/example/bi-parental-pedigree.ipynb
.. _`Tools for Polyploids`: https://www.polyploids.org/
25 changes: 17 additions & 8 deletions cli-assemble-help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,21 @@ options:
The chosen field determines tha sample ids required in
other input files e.g. the --sample-list argument.
--report [REPORT ...]
Extra fields to report within the output VCF: AFPRIOR
= prior allele frequencies; AFP = posterior mean
allele frequencies; AOP = posterior probability of
allele occurring at any copy number; GP = genotype
posterior probabilities; GL = genotype likelihoods.
Extra fields to report within the output VCF. The
INFO/FORMAT prefix may be omitted to return both
variations of the named field. Options include:
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
Posterior allele counts; INFO/AFP = Posterior mean
allele frequencies; INFO/AOP = Posterior probability
of allele occurring across all samples; INFO/AOPSUM =
Posterior estimate of the number of samples containing
an allele; INFO/SNVDP = Read depth at each SNV
position; FORMAT/ACP: Posterior allele counts;
FORMAT/AFP: Posterior mean allele frequencies;
FORMAT/AOP: Posterior probability of allele occurring;
FORMAT/GP: Genotype posterior probabilities;
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
depth at each SNV position
--cores CORES Number of cpu cores to use (default = 1).
--mcmc-chains MCMC_CHAINS
Number of independent MCMC chains per assembly
Expand All @@ -133,9 +143,8 @@ options:
--mcmc-seed MCMC_SEED
Random seed for MCMC (default = 42).
--mcmc-chain-incongruence-threshold MCMC_CHAIN_INCONGRUENCE_THRESHOLD
Posterior phenotype probability threshold for
identification of incongruent posterior modes (default
= 0.60).
Posterior probability threshold for identification of
incongruent posterior modes (default = 0.60).
--mcmc-fix-homozygous MCMC_FIX_HOMOZYGOUS
Fix alleles that are homozygous with a probability
greater than or equal to the specified value (default
Expand Down
15 changes: 15 additions & 0 deletions cli-atomize-help.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
usage: Split MCHap haplotype calls into phased blocks of basis SNVs.
[-h] haplotypes

positional arguments:
haplotypes VCF file containing haplotype variants to be atomized. This file
must contain INFO/SNVPOS. The INFO/DP and FORMAT/DP fields will
be calculated from FORMAT/SNVDP if present in the input VCF
file. The INFO/ACP and FORMAT/DS fields will be calculated from
FORMAT/ACP or FORMAT/AFP if either is present in the input VCF
file. Note that the FORMAT/ACP or FORMAT/AFP fields from the
input VCF file will be normalized in the event that they do not
sum to ploidy or one respectively.

options:
-h, --help show this help message and exit
20 changes: 15 additions & 5 deletions cli-call-exact-help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,19 @@ options:
The chosen field determines tha sample ids required in
other input files e.g. the --sample-list argument.
--report [REPORT ...]
Extra fields to report within the output VCF: AFPRIOR
= prior allele frequencies; AFP = posterior mean
allele frequencies; AOP = posterior probability of
allele occurring at any copy number; GP = genotype
posterior probabilities; GL = genotype likelihoods.
Extra fields to report within the output VCF. The
INFO/FORMAT prefix may be omitted to return both
variations of the named field. Options include:
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
Posterior allele counts; INFO/AFP = Posterior mean
allele frequencies; INFO/AOP = Posterior probability
of allele occurring across all samples; INFO/AOPSUM =
Posterior estimate of the number of samples containing
an allele; INFO/SNVDP = Read depth at each SNV
position; FORMAT/ACP: Posterior allele counts;
FORMAT/AFP: Posterior mean allele frequencies;
FORMAT/AOP: Posterior probability of allele occurring;
FORMAT/GP: Genotype posterior probabilities;
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
depth at each SNV position
--cores CORES Number of cpu cores to use (default = 1).
25 changes: 17 additions & 8 deletions cli-call-help.txt
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,21 @@ options:
The chosen field determines tha sample ids required in
other input files e.g. the --sample-list argument.
--report [REPORT ...]
Extra fields to report within the output VCF: AFPRIOR
= prior allele frequencies; AFP = posterior mean
allele frequencies; AOP = posterior probability of
allele occurring at any copy number; GP = genotype
posterior probabilities; GL = genotype likelihoods.
Extra fields to report within the output VCF. The
INFO/FORMAT prefix may be omitted to return both
variations of the named field. Options include:
INFO/AFPRIOR = Prior allele frequencies; INFO/ACP =
Posterior allele counts; INFO/AFP = Posterior mean
allele frequencies; INFO/AOP = Posterior probability
of allele occurring across all samples; INFO/AOPSUM =
Posterior estimate of the number of samples containing
an allele; INFO/SNVDP = Read depth at each SNV
position; FORMAT/ACP: Posterior allele counts;
FORMAT/AFP: Posterior mean allele frequencies;
FORMAT/AOP: Posterior probability of allele occurring;
FORMAT/GP: Genotype posterior probabilities;
FORMAT/GL: Genotype likelihoods; FORMAT/SNVDP: Read
depth at each SNV position
--cores CORES Number of cpu cores to use (default = 1).
--mcmc-chains MCMC_CHAINS
Number of independent MCMC chains per assembly
Expand All @@ -124,6 +134,5 @@ options:
--mcmc-seed MCMC_SEED
Random seed for MCMC (default = 42).
--mcmc-chain-incongruence-threshold MCMC_CHAIN_INCONGRUENCE_THRESHOLD
Posterior phenotype probability threshold for
identification of incongruent posterior modes (default
= 0.60).
Posterior probability threshold for identification of
incongruent posterior modes (default = 0.60).
Loading

0 comments on commit 29935f7

Please sign in to comment.