Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #90

Merged
merged 118 commits into from
Mar 22, 2024
Merged

Develop #90

merged 118 commits into from
Mar 22, 2024

Conversation

alubneuski
Copy link
Contributor

No description provided.

rbreslow and others added 30 commits March 9, 2022 10:07
In addition, scrub hardcoded credentials. Not importing everything from the EC2
instance, since there are files with PII and other helper scripts. We can bring
those back in one at a time.

Co-authored-by: Ariana <[email protected]>
Create a reproducible and portable development environment
Add pull request template and contributing guidelines
Store processed UUIDs with SQLAlchemy
Also, log more when downloading Orthanc studies and reuse already-downloaded ZIP files.
Sources cbtn-all CSV from s3, changed inputs to affected functions to the pandas df instead of the local file path. Update to use boto3 with default profile
Update the source of cbtn-all CSV to be from s3
* Changed deprecated "fw import" command to "fw ingest"
* Removed all references to JPGs (no longer needed)
* Added copy of NIfTIs-to-check to S3
Update study description modification in structure_nifti_files in
order to handle edge case.
The Flywheel CLI is now available in Nixpkgs, so we don't need to grab
it via a special input.
Update to use fw ingest command for Flywheel upload
I made the following changes to support the eventual deployment of the
ETL on AWS:

- Update `download_unpack_copy` to expect a single UUID as input rather
  than a list
- Add `--raw` option to the `check` command that prints a newline-
  delimited list of UUIDs to process
- Separate the validate, upload-to-Flywheel, and backup NIfTI steps into
  subcommands
- Rename `run_pipeline` command → `run`
- Update `run` to expect a list of UUIDs as input rather than generating
  the list on its own

Also, I removed the `delete_local` command because the working
directories on EC2 will be ephemeral, and it seems simple enough to run
`rm -r $your_program_directory` on your local machine.
…tionality

Split up the functionality of run_pipeline
To produce a container image using Nix, I had to re-package the Python
project in a way that satisfied Nix. I found myself falling down a
rabbit hole, re-packaging the Python project using Poetry and trying to
use poetry2nix to turn the project into a Nix derivation. We use a lot
of Python dependencies that depend on C libraries (like `pyparsing`,
`python-magic`, `pillow`, etc.), and I had to patch each one to fit the
Nix way of making shared libraries available to packages.

This research was OK because we intentionally baked in some time for
exploring Nix, but I began to feel like my wheels were spinning, and I
wouldn’t arrive at a workable solution in time for our early May
deadline.

I decided to pivot to a more traditional approach using Docker and
Docker Compose to manage the development environment and produce the
container image.

This approach feels a lot more straightforward. As a result, I can
deliver faster, and we better minimize the divergence between
development and production. However, we lose the reproducible builds and
native performance of Nix. Also, we now have to deal with some remote
debugger trickery with PyCharm.

I made the following changes to support this switch:

- Replace `flake.nix` with a `Dockerfile` for building a container image
  containing all of the CLI’s dependencies (e.g., the AWS CLI, dcm2niix,
  and the Flywheel CLI).
- Package the Python project using `setuptools` to wrap the application
  in an `image-deid-etl` executable.
- Add service definitions for `docker-compose.yml` that mimic how we’d
  run the container on AWS.
- Wrap the `upload2fw` step in a fake user home directory so that the
  Flywheel CLI can consume credentials from the environment.
- Install the `psycopg2` and PostgreSQL client libraries and update the
  database schema to support PostgreSQL.
Switch to a container-based development environment
- Bootstrap a Terraform project that consumes the VPC created by CHOP IS
  and creates an RDS instance and Bastion host.
- Utilize our Terraform container image and the `infra` wrapper script
  to enable consistent and portable Terraform execution.
afamiliar and others added 29 commits September 28, 2022 17:28
Handle broken DICOMS due to JPEG2000 compression
…sion-label

Keep sessions acquired on same day separate by appending study-time to Flywheel session label
Catch sessions with no valid NIfTIs (even if has valid DICOMs)
…andling

Add catch for invalid input UUIDs
Add Step Functions workflow that wraps the ETL
in generating session label for a study & moving nifti files to output folder
remove dependency on 'accession number' field
📌 Update rollbar/github-deploy-action 2.1.1 -> 2.1.2
@alubneuski alubneuski merged commit 566c170 into master Mar 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants