Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filemanager: deploy changes and fixes #79

Merged
merged 24 commits into from
Jan 2, 2024
Merged

filemanager: deploy changes and fixes #79

merged 24 commits into from
Jan 2, 2024

Conversation

mmalenic
Copy link
Member

Changes

@andrewpatto
Copy link
Member

@mmalenic can you PR the DatabaseCluster fix onto the elsa infrastructure (if easy to do).. no rush

@mmalenic mmalenic requested a review from brainstorm December 20, 2023 02:21
@mmalenic
Copy link
Member Author

@mmalenic can you PR the DatabaseCluster fix onto the elsa infrastructure (if easy to do).. no rush

Yep, can do.

@mmalenic mmalenic requested a review from andrewpatto December 20, 2023 02:32
});
testBucket.addEventNotification(EventType.OBJECT_CREATED, new SqsDestination(queue));
testBucket.addEventNotification(EventType.OBJECT_REMOVED, new SqsDestination(queue));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR - but what will be the general story about subscribing to buckets.. presumably the file manger is not going to be responsible for creating all the BYOB buckets (I mean they probably exist completely separate to orcabus).

Copy link
Member Author

@mmalenic mmalenic Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this will need to be different in the future. I'm not sure where the best spot to put this is though. From the filemanager's perspective it just needs access to the SQS queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @andrewpatto, if I understand that issue correctly, you'd re-define the CDK stack manually each time a new bucket is introduced?

I was thinking that perhaps getting rid altogether of S3 bucket notification definitions in CDK and do it more dynamically after the stack is deployed? Otherwise we'll have to re-deploy at each new bucket that filemanager needs to watch, not sure if that'll be practical 🤔

@@ -7,13 +7,13 @@ create table object (
-- The name of the object.
key varchar(1024) not null,
-- The size of the object.
size int not null,
size int default null,
-- A unique identifier for the object, if it is present.
hash varchar(255) default null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it isn't in this PR - but I am a bit suspicious about this modelling.
I think you'll end up in scenarios with multiple hashes (possibly needing to know this difference between them - so you can pass on a SHA-256 hash downstream say)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is a unique identifier for the "content" of the object..

Copy link
Member Author

@mmalenic mmalenic Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... I'm imagining some scenario where someone wants a SHA-256 rather than just a check to see if this object is the same? I think a simple solution is to just have multiple fields for some common checksum types, e.g. SHA-256, MD5, etc (or have a type fields which indicate the checksum algorithm). This also maps nicely to the HeadObject outputs. Although I don't know if it's promised what kind of checksum the AWS etag is, I think it's MD5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etag is MD5 until the object gets above a certain size (which is definite for bioinformatics) - at which point it is not an MD5. Best to define it as its own special type "AWS-ETAG"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sounds like we'll have to tackle some sort of computationally lightweight hashing scheme (via some reliably unique subsampling perhaps?)... totally part of another PR, agree.

@mmalenic mmalenic added the filemanager an issue relating to the filemanager label Dec 21, 2023
@mmalenic mmalenic self-assigned this Dec 21, 2023
Copy link
Member

@brainstorm brainstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for introducing the migrations as a lambda, didn't think about that at all.

Let's discuss the other couple of issues/comments from Andrew, I think they should be tackled separate and are important to get right.

});
testBucket.addEventNotification(EventType.OBJECT_CREATED, new SqsDestination(queue));
testBucket.addEventNotification(EventType.OBJECT_REMOVED, new SqsDestination(queue));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @andrewpatto, if I understand that issue correctly, you'd re-define the CDK stack manually each time a new bucket is introduced?

I was thinking that perhaps getting rid altogether of S3 bucket notification definitions in CDK and do it more dynamically after the stack is deployed? Otherwise we'll have to re-deploy at each new bucket that filemanager needs to watch, not sure if that'll be practical 🤔

@@ -7,13 +7,13 @@ create table object (
-- The name of the object.
key varchar(1024) not null,
-- The size of the object.
size int not null,
size int default null,
-- A unique identifier for the object, if it is present.
hash varchar(255) default null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sounds like we'll have to tackle some sort of computationally lightweight hashing scheme (via some reliably unique subsampling perhaps?)... totally part of another PR, agree.

@mmalenic mmalenic merged commit 53370d5 into main Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filemanager an issue relating to the filemanager
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants