Update Dev to v1.2.0 (#59)

* Create codeql-analysis.yml (#25) * Create codeql.yml (#29) * Dthoward96 workflow correction (#30) * Delete .github/workflows/codeql-analysis.yml * Delete .github/workflows/python-package-mamba.yml * Add files via upload (#31) * bug fix process.py (#32) Missing a return for config file function and correcting a error message to print the correct df * Create GHCR_docker.yml (#33) * Create GHCR_docker.yml * Update GHCR_docker.yml Correct changes for master branch * FTP folder bug fix submit.py (#34) Some FTP accounts have the folder structure /submit/Production/ instead of /Production/. This fix automatically corrects for this difference in folder structure. * update template metadata required fields + check submitting databases are valid * Update process.py removed the gs-sequence_name specified for flu * Create docker_test_build.yml (#41) automatic test builds dockerfile on pull request. This will prevent merging to master if Dockerfile fails to build correctly. * Dthoward96 org id patch (#42) * Update cov_config.yaml remove org_id from examples * Update flu_config.yaml remove org_id from examples * Update create.py remove org_id from xml creation * Dthoward96 bugfix (#45) * Update report.py Bug fix to allow for other submit folders in FTP of ncbi * Update process.py bug fix for capitalization of folder name * Add files via upload (#49) Changes requirement from only isolate to require either strain or isolate for BioSample and GenBank * create.py duplicate strain name bug (#50) Fixes issue that creates duplicate strain columns when using src-strain * Bug fix process.py (#51) Bug fix for upload log. When only one database was submitted it, it would convert the database name into a list and error out. * Bug fix for gisaid name overwriting genbank name for fasta file (#53) * Bug fix for gisaid name overwriting genbank name for fasta file * bug fix fasta file creation (#55) Fix issue where it requires fasta file for non-fasta submission * Bug fix create.py (#56) Correct issue where organism and collection date was being added to the comment.cmt file for genbank submissions * Table2asn bug fixes (#57) * Table2asn bug fixes Resolve issue with table2asn not allowing multiple sequences in the fasta file and for the table2asn sendmail function not properly grabbing the sqn file. * Update submit.py * V1.2.0 Update (#58) * Version updates * Env update * Delete .github/workflows/python-package-mamba.yml removing test yml * pandera schema update * Delete gisaid_cli/poxCLI directory * bug fixes * Delete FLU_test directory * Delete OTHER_species directory * Delete POX_species directory * mypy validation added * mypy integration * Shiny Update * Seqsender v1.2.0 website updates * Update README.md * Update README.md * shiny website updates * Seqsender shiny updates * V1.2.0 Prod Update --------- Co-authored-by: rchau88 <[email protected]> Co-authored-by: snu3 <[email protected]>
CDCgov · Aug 6, 2024 · 2d57f7f · 2d57f7f
1 parent 6d76293
commit 2d57f7f
Show file tree

Hide file tree

Showing 1,397 changed files with 563,525 additions and 33,834 deletions.
diff --git a/.github/workflows/GHCR_docker.yml b/.github/workflows/GHCR_docker.yml
@@ -0,0 +1,42 @@
+name: Create and publish docker image to GHCR
+
+on:
+  push:
+    branches: [ "master" ]
+
+env:
+  REGISTRY: ghcr.io
+  IMAGE_NAME: ${{ github.repository }}
+
+jobs:
+  build-and-push-image:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v3
+
+      - name: Log into container registry
+        uses: docker/login-action@343f7c4344506bcbf9b4de18042ae17996df046d
+        with:
+          registry: ${{ env.REGISTRY }}
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Extract Docker metadata
+        id: meta
+        uses: docker/metadata-action@96383f45573cb7f253c731d3b3ab81c87ef81934
+        with:
+          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
+          tags: type=ref,event=branch
+
+      - name: Build and push Docker image
+        uses: docker/build-push-action@0565240e2d4ab88bba5387d719585280857ece09
+        with:
+          context: .
+          push: true
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
diff --git a/.github/workflows/codeql-analysis.yml → .github/workflows/codeql.yml b/.github/workflows/codeql-analysis.yml → .github/workflows/codeql.yml
@@ -13,60 +13,72 @@ name: "CodeQL"
 
 on:
   push:
-    branches: [ master ]
+    branches: [ "master" ]
   pull_request:
-    # The branches below must be a subset of the branches above
-    branches: [ master ]
+    branches: [ "master" ]
   schedule:
-    - cron: '40 12 * * 5'
+    - cron: '43 3 * * 5'
 
 jobs:
   analyze:
     name: Analyze
-    runs-on: ubuntu-latest
+    # Runner size impacts CodeQL analysis time. To learn more, please see:
+    #   - https://gh.io/recommended-hardware-resources-for-running-codeql
+    #   - https://gh.io/supported-runners-and-hardware-resources
+    #   - https://gh.io/using-larger-runners
+    # Consider using larger runners for possible analysis time improvements.
+    runs-on: ${{ (matrix.language == 'swift' && 'macos-latest') || 'ubuntu-latest' }}
+    timeout-minutes: ${{ (matrix.language == 'swift' && 120) || 360 }}
     permissions:
+      # required for all workflows
+      security-events: write
+
+      # only required for workflows in private repositories
       actions: read
       contents: read
-      security-events: write
 
     strategy:
       fail-fast: false
       matrix:
         language: [ 'python' ]
-        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
+        # CodeQL supports [ 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift' ]
+        # Use only 'java-kotlin' to analyze code written in Java, Kotlin or both
+        # Use only 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
         # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
 
     steps:
     - name: Checkout repository
-      uses: actions/checkout@v3
+      uses: actions/checkout@v4
 
     # Initializes the CodeQL tools for scanning.
     - name: Initialize CodeQL
-      uses: github/codeql-action/init@v2
+      uses: github/codeql-action/init@v3
       with:
         languages: ${{ matrix.language }}
         # If you wish to specify custom queries, you can do so here or in a config file.
         # By default, queries listed here will override any specified in a config file.
         # Prefix the list here with "+" to use these queries and those in the config file.
-        
-        # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
+
+        # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
         # queries: security-extended,security-and-quality
 
-        
-    # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
+
+    # Autobuild attempts to build any compiled languages (C/C++, C#, Go, Java, or Swift).
     # If this step fails, then you should remove it and run the build manually (see below)
     - name: Autobuild
-      uses: github/codeql-action/autobuild@v2
+      uses: github/codeql-action/autobuild@v3
 
     # ℹ️ Command-line programs to run using the OS shell.
     # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
 
-    #   If the Autobuild fails above, remove it and uncomment the following three lines. 
+    #   If the Autobuild fails above, remove it and uncomment the following three lines.
     #   modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
 
     # - run: |
-    #   echo "Run, Build Application using script"
-    #   ./location_of_script_within_repo/buildscript.sh
+    #     echo "Run, Build Application using script"
+    #     ./location_of_script_within_repo/buildscript.sh
 
     - name: Perform CodeQL Analysis
-      uses: github/codeql-action/analyze@v2
+      uses: github/codeql-action/analyze@v3
+      with:
+        category: "/language:${{matrix.language}}"
diff --git a/.github/workflows/docker_test_build.yml b/.github/workflows/docker_test_build.yml
@@ -0,0 +1,18 @@
+name: Build test Docker image
+
+on:
+  push:
+    branches: [ "master" ]
+  pull_request:
+    branches: [ "master" ]
+
+jobs:
+
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+    - name: Build the Docker image
+      run: docker build . --file Dockerfile 
diff --git a/.github/workflows/python-package-mamba.yml b/.github/workflows/python-package-mamba.yml
diff --git a/README.Rmd b/README.Rmd
@@ -26,13 +26,23 @@ github_pages_url <- description$GITHUB_PAGES
 
 <p style="font-size: 16px;"><em>Public Database Submission Pipeline</em></p>
 
-**Beta Version**: `r version`. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome! 
+**Beta Version**: v1.2.0. This pipeline is currently in Beta testing, and issues could appear during submission. Please use it at your own risk. Feedback and suggestions are welcome! 
 
 **General Disclaimer**: This repository was created for use by CDC programs to collaborate on public health related projects in support of the [CDC mission](https://www.cdc.gov/about/organization/mission.htm).  GitHub is not hosted by the CDC, but is a third party website used by CDC and its partners to share information and collaborate on software. CDC use of GitHub does not imply an endorsement of any one particular service, product, or enterprise.
 
+# [Documentation](https://dthoward96.github.io/seqsender_test_website/)
+
 ## Overview
 
-``r program`` is a Python program that is developed to automate the process of generating necessary submission files and batch uploading them to <ins>NCBI archives</ins> (such as **BioSample**, **SRA**, and **Genbank**) and <ins>GISAID databases</ins> (e.g. **EpiFlu** and **EpiCoV**). Presently, the pipeline is capable of uploading **Influenza A Virus** (FLU) and **SARS-COV-2** (COV) data. However, the dynamic nature of this pipeline can allow for additional uploads of other organisms in future updates or requests.
+``r program`` is a Python program that is developed to automate the process of generating necessary submission files and batch uploading them to <ins>NCBI archives</ins> (such as **BioSample**, **SRA**, and **Genbank**) and <ins>GISAID databases</ins> (e.g. **EpiFlu**, **EpiCoV**, **EpiPox**, **EpiArbo**). Presently, the pipeline is capable of uploading **Influenza A Virus** (FLU), **SARS-COV-2** (COV), **Monkeypox** (POX), **Arbovirus** (ARBO), and a wide variety of other organisms. If you'd like to have ``r program`` support your virus create a issue.
+
+## Contacts
+
+| Role       | Contact |
+| ---------- | ------- |
+| Creator    | [Dakota Howard](https://github.com/dthoward96), [Reina Chau](https://github.com/rchau88) |
+| Maintainer | [Dakota Howard](https://github.com/dthoward96) |
+| Back-Up    | [Reina Chau](https://github.com/rchau88), [Brian Lee](https://github.com/leebrian) |
 
 ## Prerequisites
 
@@ -48,15 +58,15 @@ github_pages_url <- description$GITHUB_PAGES
 
 4. Refer to this page for information regarding requirements for GenBank submissions via FTP only. This page applies only for COVID and Influenza [NCBI GenBank FTP Submissions](https://submit.ncbi.nlm.nih.gov/sarscov2/genbank/#step5) For further questions contact <a href="mailto:[email protected]">[email protected]</a> to discuss requirements for submissions.
 
-5. Coordinate a NCBI namespace name (**spuid_namespace**) that will be used with Submitter Provided Unique Identifiers (**spuid**) in the submission. The liaison of **spuid_namespace** and **spuid** is used to report back assigned accessions as well as for cross-linking objects within submission. The values of **spuid_namespace** are up to the submitter to decide but they must be unique and well-coordinated prior to make a submission. For more information about these two fields, see [BioSample](`r github_pages_url`/articles/biosample_submission.html#metadata) / [SRA](`r github_pages_url`/articles/sra_submission.html#metadata) / [GENBANK](`r github_pages_url`/articles/genbank_submission.html#metadata) metadata requirements.
+5. Coordinate a NCBI namespace name (**spuid_namespace**) that will be used with Submitter Provided Unique Identifiers (**spuid**) in the submission. The liaison of **spuid_namespace** and **spuid** is used to report back assigned accessions as well as for cross-linking objects within submission. The values of **spuid_namespace** are up to the submitter to decide but they must be unique and well-coordinated prior to make a submission.
 
 - **GISAID Submissions**
 
-``r program`` makes use of GISAID's Command Line Interface tools to bulk uploading meta- and sequence-data to GISAID databases. Presently, the pipeline only allows upload to EpiFlu (**Influenza A Virus**) and EpiCoV (**SARS-COV-2**) databases. Before uploading, submitter needs to 
+``r program`` makes use of GISAID's Command Line Interface tools to bulk uploading meta- and sequence-data to GISAID databases. Presently, the pipeline supports upload to EpiFlu (**Influenza A Virus**), EpiCoV (**SARS-COV-2**), EpiPox (**Monkeypox**), and EpiArbo (**Arbovirus**). Before uploading, submitter needs to 
 
 1. Have a GISAID account. To sign up, visit [GISAID Platform](https://gisaid.org/). 
 
-2. Request a client-ID for EpiFlu or EpiCoV database in order to use its CLI tool. The CLI utilizes the client-ID along with the username and password to authenticate the database prior to make a submission. To obtain a client-ID, please email <a href="mailto:[email protected]" >[email protected]</a> to request. _**Important note**: If submitter would like to upload a "test" submission first to familiarize themselves with the submission process prior to make a real submission, one should additionally request a test client-id to perform such submissions._
+2. Request a client-ID for your specified Epi(Flu/CoV/Pox/Arbo) database in order to use its CLI tool. The CLI utilizes the client-ID along with the username and password to authenticate the database prior to make a submission. To obtain a client-ID, please email <a href="mailto:[email protected]" >[email protected]</a> to request. _**Important note**: If submitter would like to upload a "test" submission first to familiarize themselves with the submission process prior to make a real submission, one should additionally request a test client-id to perform such submissions._
 
 3. Download the <a href="`r github_pages_url`/articles/images/fluCLI_download.png" target="_blank">EpiFlu</a> or <a href="`r github_pages_url`/articles/images/covCLI_download.png" target="_blank">EpiCoV</a> CLI from the **GISAID platform** and stored them in the destination of choice prior to perform a batch upload.
 
@@ -65,33 +75,9 @@ Here is a quick look of where to store the downloaded **GISAID CLI** package.
 ![](man/figures/gisaid_cli_dir.png)
 
 
+## Code Attributions
 
-## Requirement Files
-
-Before submitters can perform a batch submission using ``r program``, they must make sure the requirement files (such as *config.yaml*, *metadata.csv*, *sequence.fasta*, *raw reads*, etc.) are already prepared and stored in a submission directory of choice.
-
-(a) To prep for FLU submissions, select one of the databases below to get started:
-
-> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
-> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
-> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_flu_submission.html" target="_blank">GISAID</a> <br>
-<!-- > <a href="`r github_pages_url`/articles/multiple_databases_flu_submission.html" target="_blank">Multiple databases</a> -->
-
-(b) To prep for COV submissions, select one of the databases below to get started:
-
-> <a href="`r github_pages_url`/articles/biosample_submission.html" target="_blank">BioSample</a> <br>
-> <a href="`r github_pages_url`/articles/sra_submission.html" target="_blank">SRA</a> <br>
-> <a href="`r github_pages_url`/articles/genbank_submission.html" target="_blank">Genbank</a> <br>
-> <a href="`r github_pages_url`/articles/gisaid_cov_submission.html" target="_blank">GISAID</a> <br>
-<!-- > <a href="`r github_pages_url`/articles/multiple_databases_cov_submission.html" target="_blank">Multiple databases</a> -->
-
-## Quick Start
-
-- [How to run seqsender locally](`r github_pages_url`/articles/local_installation.html)
-- [How to run seqsender with Docker](`r github_pages_url`/articles/docker_installation.html)
-- [How to run seqsender with Compose](`r github_pages_url`/articles/compose_installation.html)
-- [How to run seqsender with Singularity](`r github_pages_url`/articles/singularity_installation.html)
+Dakota Howard and Reina Chau for majority of the code base with input and testing from [colleagues](`r github_pages_url`/authors.html). 
 
 ## Public Domain Standard Notice