Skip to content

Commit

Permalink
Oneamplicon (#2) Added input data and minor fixes
Browse files Browse the repository at this point in the history
* update info for 1amp

* Add newest reference dataset without UTR

* add new initdir

* fix calculation
  • Loading branch information
talnor authored Feb 24, 2022
1 parent 6d05945 commit 774f217
Show file tree
Hide file tree
Showing 404 changed files with 259,704 additions and 13 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ detail [here](data/README.md).
Shiver initilisation directories are included in this repository. Information on these are
available [here](data/README.md). To create your own initilisation directory, run the following command:
```
nextflow run main.nf --init -profile slurm,singularity --primers <primers.fasta> --adapters <adapters.fasta> --config <shiver_config.sh> --references <references.fasta>
nextflow run main.nf --init -profile slurm,singularity --primers <primers.fasta> --adapters <adapters.fasta> --config <shiver_config.sh> --references <references.fasta> --outdir <outdir>
```

## Usage
Expand Down
6 changes: 3 additions & 3 deletions bin/calculate_eti.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
def get_region_data(base_frequency_file, pos_start, start_base, pos_fin, step):
"""Return base count data for region of interest"""
with open(base_frequency_file, "r") as f:
column_names = [
headers = [
"HXB2_position",
"Reference_position",
"Reference_Base",
Expand All @@ -66,7 +66,7 @@ def get_region_data(base_frequency_file, pos_start, start_base, pos_fin, step):
"gap": float,
"N": float,
}
all_data = pd.read_csv(f, header=0, names=column_names)
all_data = pd.read_csv(f, header=0, names=headers)
all_data.astype(columns)
positions_of_interest = range(pos_start + start_base - 1, pos_fin + 1, step)
pol = all_data[all_data["HXB2_position"].isin(str(i) for i in positions_of_interest)]
Expand Down Expand Up @@ -130,7 +130,7 @@ def remove_low_coverage_positions(pol, min_cov):

def calculate_theta(pol, diversity_threshold):
"""Calculate theta value for each position"""
major_base_frequency = pol[["A", "C", "G", "T"]].max(axis=1)
major_base_frequency = pol[["freq_A", "freq_C", "freq_G", "freq_T"]].max(axis=1)
pol["freq_major_base"] = major_base_frequency.values
theta_array = np.where((1 - pol["freq_major_base"]) > diversity_threshold, 1, 0)
pol["theta"] = theta_array.tolist()
Expand Down
54 changes: 46 additions & 8 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Below follows a description of the files that are made available in this directo
| ----------- | ----------- | ----------- |
| 1 | original_config.sh | Default settings used in Shiver |
| 2 | shiver_config_BQ30_notrimming.sh | TIME-study settings |
| 3 | config_BQ30.sh | Older settings |

#### Configuration file 2

Expand All @@ -53,16 +54,35 @@ The following options in Shiver are altered. For the full list of options see th
| mpileupOptions | --min-BQ 30 | --min-BQ 5 | Higher quality threshold for individual bases |
| deduplicate | true | false | Remove read pairs marked as duplicates? This can cause loss of diversity in the reads due to true biological variation as well sequencing error. |

#### Configuration file 3

The following options in Shiver are altered. For the full list of options see the default config.

| Parameter | Value | Default | Description |
| ----------- | ----------- | ----------- | ----------- |
| mpileupOptions | --min-BQ 30 | --min-BQ 5 | Higher quality threshold for individual bases |
| deduplicate | true | false | Remove read pairs marked as duplicates? This can cause loss of diversity in the reads due to true biological variation as well sequencing error. |


### Shiver init directory
| Version | InitDir | Description |
| ----------- | ----------- | ----------- |
| 1 | InitDirShiver190405_BQ30 | |
| 1 | InitDirShiver220128_BQ30_1amp | 1 amplicon primers, 2020 references |
| 1 | InitDirShiver220223_BQ30_1amp | 1 amplicon primers, 2020 references, no UTRs |
| 2 | InitDirShiver220128_BQ30_1amp | 1 amplicon primers, 2020 references |
| 3 | InitDirShiver190405_BQ30 | |
| 4 | InitDirShiver191022_BQ30_PANHIV | 1 amplicon primers, 2018 references |

#### Shiver init directory 1

**Name**: InitDirShiver190405_BQ30
**Created**: 2019-04-05
**Name**: InitDirShiver220223_BQ30_1amp
**Created**: 2022-02-23

| Content | Description |
| ----------- | ----------- |
| Primer | primers_1_amplicon_PCR1_190620.fasta |
| Adapter | NexteraPE-PE.fa |
| Configurations | shiver_config_BQ30_notrimming.sh |
| References | HIV1_COM_2020_547-9592_DNA.fasta |

#### Shiver init directory 2

Expand All @@ -71,17 +91,35 @@ The following options in Shiver are altered. For the full list of options see th

| Content | Description |
| ----------- | ----------- |
| Primer | primers_1_amplicon_PCR1_190620.fasta
| Primer | primers_1_amplicon_PCR1_190620.fasta |
| Adapter | NexteraPE-PE.fa |
| Configurations | shiver_config_BQ30_notrimming.sh |
| References | HIV1_COM_2020_genome_DNA.fasta' |
| References | HIV1_COM_2020_genome_DNA.fasta |

#### Shiver init directory 3

**Name**: InitDirShiver190405_BQ30
**Created**: 2019-04-05

#### Shiver init directory 4

**Name**: InitDirShiver191022_BQ30_PANHIV
**Created**: 2019-10-22

| Content | Description |
| ----------- | ----------- |
| Primer | primers_1_amplicon_PCR1-2_190620.fasta |
| Adapter | NexteraPE-PE.fa |
| Configurations | config_BQ30.sh |
| References | HIV1_COM_2017_547-9592_DNA_2018Compendium.fasta |

### References to use in Shiver alignments
Reference compendiums with representative genomes can be downloaded from the
[LANL HIV database](http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html).

| Reference file | Description |
| ------------------------------ | -------------------------------------------------------------------- |
| HIV1_COM_2020_genome_DNA.fasta | Represenative genome alignment with references from 2020 and earlier |

| HIV1_COM_2020_genome_DNA.fasta | Represenative genome alignment with references from 2020 and earlier. |
| HIV1_COM_2020_547-9592_DNA.fasta | Represenative genome alignment with references from 2020 and earlier. Genomic positions 547-9592 included. |
| HIV1_COM_2017_547-9592_DNA_2018Compendium.fasta | Represenative genome alignment with references from 2018 and earlier. Genomic positions 547-9592 included. |

Loading

0 comments on commit 774f217

Please sign in to comment.