Skip to content

Commit

Permalink
feat(pg-dd): add subcommands for loading, dumping, uploading and down…
Browse files Browse the repository at this point in the history
…loading
  • Loading branch information
mmalenic committed Nov 19, 2024
1 parent d337dc6 commit 3395e7f
Show file tree
Hide file tree
Showing 12 changed files with 281 additions and 101 deletions.
1 change: 0 additions & 1 deletion lib/workload/stateless/stacks/pg-dd/.dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ deploy
.env
.env.example
.gitignore
Makefile
README.md
data
.ruff_cache
3 changes: 2 additions & 1 deletion lib/workload/stateless/stacks/pg-dd/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ PG_DD_DATABASE_METADATA_MANAGER=metadata_manager
PG_DD_DATABASE_SEQUENCE_RUN_MANAGER=sequence_run_manager
PG_DD_DATABASE_WORKFLOW_MANAGER=workflow_manager
PG_DD_DATABASE_FILEMANAGER=filemanager
PG_DD_DATABASE_FILEMANAGER_SQL="select * from s3_object"
PG_DD_DATABASE_FILEMANAGER_SQL_DUMP="select * from s3_object order by sequencer limit 10000"
PG_DD_DATABASE_FILEMANAGER_SQL_LOAD="s3_object"
3 changes: 2 additions & 1 deletion lib/workload/stateless/stacks/pg-dd/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.env
data
.ruff_cache
.ruff_cache
response.json
15 changes: 15 additions & 0 deletions lib/workload/stateless/stacks/pg-dd/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# This Dockerfile is intended to be used as part of a Docker Compose setup.
# When running this microservice from the Docker Compose root, this Dockerfile
# will build the image, install dependencies, and start the server

FROM public.ecr.aws/docker/library/python:3.12

ARG POETRY_VERSION=1.8
RUN pip install "poetry==${POETRY_VERSION}"

WORKDIR /app

COPY . .
RUN poetry install --no-root

ENTRYPOINT ["/bin/bash", "-c", "make", "local"]
4 changes: 2 additions & 2 deletions lib/workload/stateless/stacks/pg-dd/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ lint: install
check: lint
@poetry run ruff check .

local: install
@poetry run local
cli: install
@poetry run cli

clean:
rm -rf data && rm -rf .ruff_cache
21 changes: 11 additions & 10 deletions lib/workload/stateless/stacks/pg-dd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,16 @@ rows of the filemanager database.

This function can be configured by setting the following environment variables, see [.env.example][env-example] for an example:

| Name | Description | Type |
|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|
| `PG_DD_URL` | The database URL to dump databases from. | Postgres connection string |
| `PG_DD_SECRET` | The secret name or ARN to fetch the database URL from. This is only used in the Lambda function, and overrides `PG_DD_URL`. | `string` |
| `PG_DD_DATABASE_<DATABASE_NAME>` | A name of the database to dump records from where `<DATABASE_NAME>` represents the target database. Specify this multiple times to use dump from multiple databases. | `string` |
| `PG_DD_DATABASE_<DATABASE_NAME>_SQL` | Custom SQL code to execute when dumping database records for `<DATABASE_NAME>`. This is optional, and by default all records from all tables are dumped. Specify this is a list of SQL statements to generate a corresponding CSV file. | `string[]` or undefined |
| `PG_DD_BUCKET` | The bucket to dump data to. This is required when deploying the Lambda function. | `string` or undefined |
| `PG_DD_PREFIX` | The bucket prefix to use when writing to a bucket. This is optional. | `string` or undefined |
| `PG_DD_DIR` | The local filesystem directory to dump data to when running this command locally. This is not used on the deployed Lambda function. | filesystem directory or undefined |
| Name | Description | Type |
|-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------|
| `PG_DD_URL` | The database URL to dump databases from. | Postgres connection string |
| `PG_DD_SECRET` | The secret name or ARN to fetch the database URL from. This is only used in the Lambda function, and overrides `PG_DD_URL`. | `string` |
| `PG_DD_DATABASE_<DATABASE_NAME>` | A name of the database to dump records from where `<DATABASE_NAME>` represents the target database. Specify this multiple times to use dump from multiple databases. | `string` |
| `PG_DD_DATABASE_<DATABASE_NAME>_SQL_DUMP` | Custom SQL code to execute when dumping database records for `<DATABASE_NAME>`. This is optional, and by default all records from all tables are dumped. Specify this is a list of SQL statements to generate a corresponding CSV file. | `string[]` or undefined |
| `PG_DD_DATABASE_<DATABASE_NAME>_SQL_LOAD` | The name of the table to load into for `<DATABASE_NAME>`. This is required if loading data after dumping with `<PG_DD_DATABASE_DATABASE_NAME_SQL_DUMP>` to specify the table to load data into. | `string[]` or undefined |
| `PG_DD_BUCKET` | The bucket to dump data to. This is required when deploying the Lambda function. | `string` or undefined |
| `PG_DD_PREFIX` | The bucket prefix to use when writing to a bucket. This is optional. | `string` or undefined |
| `PG_DD_DIR` | The local filesystem directory to dump data to when running this command locally. This is not used on the deployed Lambda function. | filesystem directory or undefined |

## Local development

Expand All @@ -34,7 +35,7 @@ This project uses [poetry] to manage dependencies.
The pg-dd command can be run locally to dump data to a directory:

```
make local
make cli
```

Run the linter and formatter:
Expand Down
3 changes: 2 additions & 1 deletion lib/workload/stateless/stacks/pg-dd/deploy/stack.ts
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ export class PgDDStack extends Stack {
index: 'pg_dd/handler.py',
runtime: Runtime.PYTHON_3_12,
architecture: Architecture.ARM_64,
timeout: Duration.minutes(15),
timeout: Duration.minutes(5),
memorySize: 1024,
vpc: this.vpc,
vpcSubnets: {
Expand All @@ -108,6 +108,7 @@ export class PgDDStack extends Stack {
PG_DD_DATABASE_SEQUENCE_RUN_MANAGER: 'sequence_run_manager',
PG_DD_DATABASE_WORKFLOW_MANAGER: 'workflow_manager',
PG_DD_DATABASE_FILEMANAGER: 'filemanager',
PG_DD_DATABASE_FILEMANAGER_SQL: 'select * from s3_object order by sequencer limit 10000',
...(props.prefix && { PG_DD_PREFIX: props.prefix }),
},
});
Expand Down
43 changes: 43 additions & 0 deletions lib/workload/stateless/stacks/pg-dd/pg_dd/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
from pg_dd.pg_dd import PgDDLocal, PgDDS3
import click


@click.group()
def cli():
pass


@cli.command()
def download():
"""
Download S3 CSV dumps to the local directory.
"""
PgDDS3().download_local()


@cli.command()
def upload():
"""
Uploads local CSV dumps to S3.
"""
PgDDS3().write_to_bucket()


@cli.command()
def dump():
"""
Dump from the local database to CSV files.
"""
PgDDLocal().write_to_dir()


@cli.command()
def load():
"""
Load local CSV files into the database
"""
PgDDLocal().load_to_database()


if __name__ == "__main__":
cli()
14 changes: 0 additions & 14 deletions lib/workload/stateless/stacks/pg-dd/pg_dd/local.py

This file was deleted.

Loading

0 comments on commit 3395e7f

Please sign in to comment.