-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filemanager: deploy changes and fixes #79
Conversation
@mmalenic can you PR the DatabaseCluster fix onto the elsa infrastructure (if easy to do).. no rush |
Yep, can do. |
}); | ||
testBucket.addEventNotification(EventType.OBJECT_CREATED, new SqsDestination(queue)); | ||
testBucket.addEventNotification(EventType.OBJECT_REMOVED, new SqsDestination(queue)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR - but what will be the general story about subscribing to buckets.. presumably the file manger is not going to be responsible for creating all the BYOB buckets (I mean they probably exist completely separate to orcabus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this will need to be different in the future. I'm not sure where the best spot to put this is though. From the filemanager's perspective it just needs access to the SQS queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @andrewpatto, if I understand that issue correctly, you'd re-define the CDK stack manually each time a new bucket is introduced?
I was thinking that perhaps getting rid altogether of S3 bucket notification definitions in CDK and do it more dynamically after the stack is deployed? Otherwise we'll have to re-deploy at each new bucket that filemanager needs to watch, not sure if that'll be practical 🤔
@@ -7,13 +7,13 @@ create table object ( | |||
-- The name of the object. | |||
key varchar(1024) not null, | |||
-- The size of the object. | |||
size int not null, | |||
size int default null, | |||
-- A unique identifier for the object, if it is present. | |||
hash varchar(255) default null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it isn't in this PR - but I am a bit suspicious about this modelling.
I think you'll end up in scenarios with multiple hashes (possibly needing to know this difference between them - so you can pass on a SHA-256 hash downstream say)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is a unique identifier for the "content" of the object..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... I'm imagining some scenario where someone wants a SHA-256 rather than just a check to see if this object is the same? I think a simple solution is to just have multiple fields for some common checksum types, e.g. SHA-256, MD5, etc (or have a type fields which indicate the checksum algorithm). This also maps nicely to the HeadObject
outputs. Although I don't know if it's promised what kind of checksum the AWS etag is, I think it's MD5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etag is MD5 until the object gets above a certain size (which is definite for bioinformatics) - at which point it is not an MD5. Best to define it as its own special type "AWS-ETAG"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, sounds like we'll have to tackle some sort of computationally lightweight hashing scheme (via some reliably unique subsampling perhaps?)... totally part of another PR, agree.
…t that useful a distinction for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for introducing the migrations as a lambda, didn't think about that at all.
Let's discuss the other couple of issues/comments from Andrew, I think they should be tackled separate and are important to get right.
}); | ||
testBucket.addEventNotification(EventType.OBJECT_CREATED, new SqsDestination(queue)); | ||
testBucket.addEventNotification(EventType.OBJECT_REMOVED, new SqsDestination(queue)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @andrewpatto, if I understand that issue correctly, you'd re-define the CDK stack manually each time a new bucket is introduced?
I was thinking that perhaps getting rid altogether of S3 bucket notification definitions in CDK and do it more dynamically after the stack is deployed? Otherwise we'll have to re-deploy at each new bucket that filemanager needs to watch, not sure if that'll be practical 🤔
@@ -7,13 +7,13 @@ create table object ( | |||
-- The name of the object. | |||
key varchar(1024) not null, | |||
-- The size of the object. | |||
size int not null, | |||
size int default null, | |||
-- A unique identifier for the object, if it is present. | |||
hash varchar(255) default null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, sounds like we'll have to tackle some sort of computationally lightweight hashing scheme (via some reliably unique subsampling perhaps?)... totally part of another PR, agree.
Changes
DatabaseCluster
for this @brainstorm. (rds): support for Aurora Serverless V2 aws/aws-cdk#20197 is now fixed and cluster instances can be created withClusterInstance.serverlessV2
. @andrewpatto this is related to https://github.com/elsa-data/aws-infrastructure/blob/be2f0693df6bec1d90168d5441f910ce44094262/packages/stack/rds/serverless-base-database.ts#L58-L69filemanager-migrate-lambda
crate for database migrations using sqlx.s3:ObjectRemoved:*
events.HeadObject
isStandard
if it is not specified in the output.