filemanager: deploy changes and fixes #79

mmalenic · 2023-12-20T02:07:06Z

Changes

Refactored deployment into constructs:
- Add RDS serverless v2 (you were right in using DatabaseCluster for this @brainstorm. (rds): support for Aurora Serverless V2 aws/aws-cdk#20197 is now fixed and cluster instances can be created with ClusterInstance.serverlessV2. @andrewpatto this is related to https://github.com/elsa-data/aws-infrastructure/blob/be2f0693df6bec1d90168d5441f910ce44094262/packages/stack/rds/serverless-base-database.ts#L58-L69
- Added database migration within CDK similar to https://github.com/aws-samples/amazon-rds-init-cdk. This has the advantage of applying migrations within CDK and rejecting the deployment if it fails.
- Make sccache optional and use env variables for fetching the binary to make it more cross platform.
Implement filemanager-migrate-lambda crate for database migrations using sqlx.
Make S3 event parsing less strict:
- Size and e_tag should be optional. Size is not present on s3:ObjectRemoved:* events.
- Avoid errors for unknown event types when deserializing.
The default storage class when calling HeadObject is Standard if it is not specified in the output.

…elete events

…s standard

…ject call

…Lambda function

andrewpatto · 2023-12-20T02:14:01Z

@mmalenic can you PR the DatabaseCluster fix onto the elsa infrastructure (if easy to do).. no rush

mmalenic · 2023-12-20T02:22:08Z

@mmalenic can you PR the DatabaseCluster fix onto the elsa infrastructure (if easy to do).. no rush

Yep, can do.

andrewpatto · 2023-12-20T05:18:24Z

lib/workload/stateful/filemanager/deploy/lib/filemanager_stack.ts

+    });
+    testBucket.addEventNotification(EventType.OBJECT_CREATED, new SqsDestination(queue));
+    testBucket.addEventNotification(EventType.OBJECT_REMOVED, new SqsDestination(queue));
+


Not for this PR - but what will be the general story about subscribing to buckets.. presumably the file manger is not going to be responsible for creating all the BYOB buckets (I mean they probably exist completely separate to orcabus).

Yes this will need to be different in the future. I'm not sure where the best spot to put this is though. From the filemanager's perspective it just needs access to the SQS queue.

aws/aws-cdk#2004

So @andrewpatto, if I understand that issue correctly, you'd re-define the CDK stack manually each time a new bucket is introduced?

I was thinking that perhaps getting rid altogether of S3 bucket notification definitions in CDK and do it more dynamically after the stack is deployed? Otherwise we'll have to re-deploy at each new bucket that filemanager needs to watch, not sure if that'll be practical 🤔

andrewpatto · 2023-12-20T05:21:37Z

lib/workload/stateful/filemanager/database/migrations/0001_add_object_table.sql

@@ -7,13 +7,13 @@ create table object (
    -- The name of the object.
    key varchar(1024) not null,
    -- The size of the object.
-    size int not null,
+    size int default null,
    -- A unique identifier for the object, if it is present.
    hash varchar(255) default null,


I know it isn't in this PR - but I am a bit suspicious about this modelling.
I think you'll end up in scenarios with multiple hashes (possibly needing to know this difference between them - so you can pass on a SHA-256 hash downstream say)

Also, this is a unique identifier for the "content" of the object..

Yes... I'm imagining some scenario where someone wants a SHA-256 rather than just a check to see if this object is the same? I think a simple solution is to just have multiple fields for some common checksum types, e.g. SHA-256, MD5, etc (or have a type fields which indicate the checksum algorithm). This also maps nicely to the HeadObject outputs. Although I don't know if it's promised what kind of checksum the AWS etag is, I think it's MD5?

etag is MD5 until the object gets above a certain size (which is definite for bioinformatics) - at which point it is not an MD5. Best to define it as its own special type "AWS-ETAG"

Hm, sounds like we'll have to tackle some sort of computationally lightweight hashing scheme (via some reliably unique subsampling perhaps?)... totally part of another PR, agree.

…t that useful a distinction for now

…ster

brainstorm

Thanks for introducing the migrations as a lambda, didn't think about that at all.

Let's discuss the other couple of issues/comments from Andrew, I think they should be tackled separate and are important to get right.

brainstorm · 2024-01-02T01:21:22Z

lib/workload/stateful/filemanager/deploy/lib/filemanager_stack.ts

+    });
+    testBucket.addEventNotification(EventType.OBJECT_CREATED, new SqsDestination(queue));
+    testBucket.addEventNotification(EventType.OBJECT_REMOVED, new SqsDestination(queue));
+


So @andrewpatto, if I understand that issue correctly, you'd re-define the CDK stack manually each time a new bucket is introduced?

I was thinking that perhaps getting rid altogether of S3 bucket notification definitions in CDK and do it more dynamically after the stack is deployed? Otherwise we'll have to re-deploy at each new bucket that filemanager needs to watch, not sure if that'll be practical 🤔

brainstorm · 2024-01-02T01:22:32Z

lib/workload/stateful/filemanager/database/migrations/0001_add_object_table.sql

@@ -7,13 +7,13 @@ create table object (
    -- The name of the object.
    key varchar(1024) not null,
    -- The size of the object.
-    size int not null,
+    size int default null,
    -- A unique identifier for the object, if it is present.
    hash varchar(255) default null,


Hm, sounds like we'll have to tackle some sort of computationally lightweight hashing scheme (via some reliably unique subsampling perhaps?)... totally part of another PR, agree.

mmalenic added 22 commits December 15, 2023 07:23

feat(filemanager): cross platform scripts

491960a

refactor(filemanager): make sccache optional

e6629e5

refactor(filemanager-deploy): add database construct

d9b4a42

refactor(filemanager-deploy): add lambda function construct

3343743

refactor(filemanager): use new constructs in stack

46a250a

refactor(filemanager): update default region

3cf928b

fix(filemanager): records field

64ff6cd

feat(filemanager): add basic migration lambda

dc84c20

feat(filemanager): migrate in Lambda function

f824c35

refactor(filemanager): merge migrations to obtain migrator struct

367663f

test(filemanager): migration test

147a893

refactor(filemanager): split lambda function code

c235c90

feat(filemanager): implement migrate lambda function cdk

5c6413d

feat(filemanager): create custom resource construct

32889e5

feat(filemanager): use cdk resource in stack

884be2f

feat(filemanager): make migration optional in stack

a2d661e

refactor(filemanager): allow public database

6749b79

fix(filemanager): make size and e_tag optional and remove size from d…

a5aae0e

…elete events

fix(filemanager): head object should not be called on removed events

1f29f7e

fix(filemanager): head object does not return storage class when it i…

a4ff710

…s standard

fix(filemanager): storage class can still be unknoen before a head ob…

ff9dbb0

…ject call

fix(filemanager): make object parsing less strict to avoid errors in …

205adb1

…Lambda function

mmalenic requested a review from brainstorm December 20, 2023 02:21

mmalenic requested a review from andrewpatto December 20, 2023 02:32

andrewpatto reviewed Dec 20, 2023

View reviewed changes

andrewpatto approved these changes Dec 20, 2023

View reviewed changes

andrewpatto reviewed Dec 20, 2023

View reviewed changes

mmalenic added the filemanager an issue relating to the filemanager label Dec 21, 2023

mmalenic self-assigned this Dec 21, 2023

mmalenic added 2 commits January 2, 2024 08:40

refactor(filemanager): remove aws directory from migrations as its no…

8647464

…t that useful a distinction for now

refactor(filemanager): remove any ip address from public database clu…

aa023cf

…ster

brainstorm approved these changes Jan 2, 2024

View reviewed changes

mmalenic merged commit 53370d5 into main Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filemanager: deploy changes and fixes #79

filemanager: deploy changes and fixes #79

mmalenic commented Dec 20, 2023

andrewpatto commented Dec 20, 2023

mmalenic commented Dec 20, 2023

andrewpatto Dec 20, 2023

mmalenic Dec 20, 2023 •

edited

Loading

andrewpatto Dec 21, 2023

brainstorm Jan 2, 2024

andrewpatto Dec 20, 2023

andrewpatto Dec 20, 2023

mmalenic Dec 20, 2023 •

edited

Loading

andrewpatto Dec 20, 2023

brainstorm Jan 2, 2024

brainstorm left a comment

brainstorm Jan 2, 2024

brainstorm Jan 2, 2024

filemanager: deploy changes and fixes #79

filemanager: deploy changes and fixes #79

Conversation

mmalenic commented Dec 20, 2023

Changes

andrewpatto commented Dec 20, 2023

mmalenic commented Dec 20, 2023

Choose a reason for hiding this comment

mmalenic Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmalenic Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brainstorm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmalenic Dec 20, 2023 •

edited

Loading

mmalenic Dec 20, 2023 •

edited

Loading