-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: de-duplicate events at the database level #88
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
8443bfa
feat(filemanager): add basic proc macro outline
mmalenic e172dd7
refactor(filemanager): improve equality checking
mmalenic 0fb20fb
Merge branch 'main' of github.com:umccr/orcabus into feat/event-order…
mmalenic 4f8e9d1
refactor(filemanager): fix test, merge with main
mmalenic 1eed111
feat(filemanager): add additional sequencer for s3_object
mmalenic 05882af
refactor(filemanager): move e_tag to s3_object, add bucket and key to…
mmalenic ff8a2c3
feat(filemanager): add sequencer check constraint to s3_object
mmalenic 94b15d1
feat(filemanager): add unique constraints
mmalenic d865b24
feat(filemanager): remove key and bucket reference from s3_object
mmalenic ee325fe
refactor(filemanager): update inserts with sequencer values
mmalenic 3ed4bad
refactor(filemanager): move more fields to s3_object, update queries
mmalenic 8c3c575
test(filemanager): fix tests according to new schema
mmalenic c20e946
test(filemanager): defer initializing foreign key and run inserts in …
mmalenic 80dd346
test(filemanager): duplicate events database test
mmalenic 727c3d9
test(filemanager): add complex duplicates test
mmalenic a04f108
refactor(filemanager): remove macros as its not used
mmalenic 9bcd3f8
Merge branch 'main' of github.com:umccr/orcabus into feat/event-order…
mmalenic ff9a5d4
style(filemanager): formatting
mmalenic 8908ea3
fix(filemanager): consider version id when de-duplicating as well
mmalenic 661490c
Merge branch 'main' of github.com:umccr/orcabus into feat/event-order…
mmalenic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
FROM postgres:16 | ||
FROM postgres:15 | ||
|
||
COPY migrations/ /docker-entrypoint-initdb.d/ |
19 changes: 3 additions & 16 deletions
19
lib/workload/stateful/filemanager/database/migrations/0001_add_object_table.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,9 @@ | ||
-- An general object table common across all storage types. | ||
create table object ( | ||
-- The unique id for this object. | ||
object_id uuid not null default gen_random_uuid() primary key, | ||
-- The bucket location. | ||
bucket varchar(255) not null, | ||
-- The name of the object. | ||
key varchar(1024) not null, | ||
object_id uuid not null primary key default gen_random_uuid(), | ||
-- The size of the object. | ||
size int default null, | ||
size integer default null, | ||
-- A unique identifier for the object, if it is present. | ||
hash varchar(255) default null, | ||
-- When this object was created. | ||
created_date timestamptz not null default now(), | ||
-- When this object was last modified. | ||
last_modified_date timestamptz not null default now(), | ||
-- When this object was deleted, a null value means that the object has not yet been deleted. | ||
deleted_date timestamptz default null, | ||
-- The date of the object and its id combined. | ||
portal_run_id varchar(255) not null | ||
-- provenance - history of all objects and how they move? | ||
checksum text default null | ||
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
27 changes: 27 additions & 0 deletions
27
...workload/stateful/filemanager/database/queries/ingester/aws/insert_s3_created_objects.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
-- Bulk insert of s3 objects. | ||
insert into s3_object ( | ||
s3_object_id, | ||
object_id, | ||
bucket, | ||
key, | ||
created_date, | ||
last_modified_date, | ||
e_tag, | ||
storage_class, | ||
version_id, | ||
created_sequencer | ||
) | ||
values ( | ||
unnest($1::uuid[]), | ||
unnest($2::uuid[]), | ||
unnest($3::text[]), | ||
unnest($4::text[]), | ||
unnest($5::timestamptz[]), | ||
unnest($6::timestamptz[]), | ||
unnest($7::text[]), | ||
unnest($8::storage_class[]), | ||
unnest($9::text[]), | ||
unnest($10::text[]) | ||
) on conflict on constraint created_sequencer_unique do update | ||
set number_duplicate_events = s3_object.number_duplicate_events + 1 | ||
returning object_id, number_duplicate_events; |
31 changes: 31 additions & 0 deletions
31
...workload/stateful/filemanager/database/queries/ingester/aws/insert_s3_deleted_objects.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
-- Bulk insert of s3 objects. | ||
insert into s3_object ( | ||
s3_object_id, | ||
object_id, | ||
bucket, | ||
key, | ||
-- We default the created date to a value event if this is a deleted event, | ||
-- as we are expecting this to get updated. | ||
created_date, | ||
deleted_date, | ||
last_modified_date, | ||
e_tag, | ||
storage_class, | ||
version_id, | ||
deleted_sequencer | ||
) | ||
values ( | ||
unnest($1::uuid[]), | ||
unnest($2::uuid[]), | ||
unnest($3::text[]), | ||
unnest($4::text[]), | ||
unnest($5::timestamptz[]), | ||
unnest($6::timestamptz[]), | ||
unnest($7::timestamptz[]), | ||
unnest($8::text[]), | ||
unnest($9::storage_class[]), | ||
unnest($10::text[]), | ||
unnest($11::text[]) | ||
) on conflict on constraint deleted_sequencer_unique do update | ||
set number_duplicate_events = s3_object.number_duplicate_events + 1 | ||
returning object_id, number_duplicate_events; | ||
6 changes: 0 additions & 6 deletions
6
lib/workload/stateful/filemanager/database/queries/ingester/aws/insert_s3_objects.sql
This file was deleted.
Oops, something went wrong.
6 changes: 3 additions & 3 deletions
6
...abase/queries/ingester/update_deleted.sql → ...e/queries/ingester/aws/update_deleted.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
-- Update the deleted time of s3 objects. | ||
update object | ||
-- Update the deleted time of objects. | ||
update s3_object | ||
set deleted_date = data.deleted_time | ||
from (select | ||
unnest($1::varchar[]) as key, | ||
unnest($2::varchar[]) as bucket, | ||
unnest($3::timestamptz[]) as deleted_time | ||
) as data | ||
where object.key = data.key and object.bucket = data.bucket; | ||
where s3_object.key = data.key and s3_object.bucket = data.bucket; |
13 changes: 4 additions & 9 deletions
13
lib/workload/stateful/filemanager/database/queries/ingester/insert_objects.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,7 @@ | ||
-- Bulk insert of objects | ||
insert into object (object_id, bucket, key, size, hash, created_date, last_modified_date, portal_run_id) | ||
insert into object (object_id, size, checksum) | ||
values ( | ||
unnest($1::uuid[]), | ||
unnest($2::varchar[]), | ||
unnest($3::varchar[]), | ||
unnest($4::int[]), | ||
unnest($5::varchar[]), | ||
unnest($6::timestamptz[]), | ||
unnest($7::timestamptz[]), | ||
unnest($8::varchar[]) | ||
); | ||
unnest($2::int[]), | ||
unnest($3::text[]) | ||
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't being used yet, however I think it will be for #73.