-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(filemanager): add correct ingest function permissions and re…
…name ingest_id tag
- Loading branch information
Showing
22 changed files
with
118 additions
and
105 deletions.
There are no files selected for viewing
2 changes: 2 additions & 0 deletions
2
lib/workload/stateless/stacks/filemanager/database/migrations/0002_s3_ingest_id.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
alter table s3_object add column ingest_id uuid; | ||
create index ingest_id_index on s3_object (ingest_id); |
1 change: 0 additions & 1 deletion
1
lib/workload/stateless/stacks/filemanager/database/migrations/0002_s3_move_id.sql
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 21 additions & 23 deletions
44
lib/workload/stateless/stacks/filemanager/docs/MOVED_OBJECTS.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,49 +1,47 @@ | ||
# Tracking moved objects | ||
|
||
The filemanager tracks records from S3 events, which do not describe how objects move from one location to another. Using | ||
S3 events alone, it's impossible to tell whether an object that has been deleted in one place and created in another is | ||
S3 events alone, it's not possible to tell whether an object that has been deleted in one place and created in another is | ||
the same object that has been moved, or two different objects. This is because S3 only tracks `Created` or `Deleted` | ||
events. | ||
|
||
To track moved objects, the filemanager has to store additional information on objects, that gets copied when the object | ||
is moved. The design involves using object tagging to store an identifier on all objects that is copied when the | ||
object is moved. This id can be used to track how object moves. | ||
|
||
When records are ingested, the filemanager first checks if the object contains the tag with the id field. If the tag is | ||
present, then the object has been moved, and the new record reuses that id. If not, a new id is generated and the object | ||
is tagged with it. Later, the database can be queried to find all record matching the id. This represents a sequence of moved | ||
objects. | ||
To track moved objects, the filemanager stores additional information in S3 tags, that gets copied when the object | ||
is moved. This allows the filemanager to track how objects move and also allows it to copy attributes when an object | ||
is moved/copied. | ||
|
||
## Tagging process | ||
|
||
The process of tagging is: | ||
|
||
* When an object record is ingested, the filemanager queries `Created` events for tags. | ||
* If the tags can be retrieved, the filemanager looks for a tag called `filemanager_id`. The key name can be | ||
* If the tags can be retrieved, the filemanager looks for a key called `ingest_id`. The key name can be | ||
configured in the environment variable `FILEMANAGER_INGESTER_TAG_NAME`. | ||
* The tag is parsed as a UUID, and stored in the `move_id` column of `s3_object` for that record. | ||
* The tag is parsed as a UUID, and stored in the `ingest_id` column of `s3_object` for that record. | ||
* If the tag does not exist, then a new UUID is generated, and the object is tagged on S3 by calling `PutObjectTagging`. | ||
The new tag is also stored in the `move_id` column. | ||
* The database is also queried for any records with the same `move_id` so that attributes can be copied to the new record. | ||
The new tag is also stored in the `ingest_id` column. | ||
* The database is also queried for any records with the same `ingest_id` so that attributes can be copied to the new record. | ||
|
||
This logic is enabled by default, but it can be switched off by setting `FILEMANAGER_INGESTER_TRACK_MOVES`. The filemanager | ||
API provides a way to query the database for records with a given `move_id`. | ||
API provides a way to query the database for records with a given `ingest_id`. | ||
|
||
## Design considerations | ||
|
||
Object tags on S3 are limited to 10 tags per object, where each tag can only store 258 unicode characters. This means that it | ||
is not possible a large amount of data or attributes in tags. Instead, filemanager stores a single UUID in the tag, which is | ||
linked to object records that store the attributes and data. | ||
Object tags on S3 are limited to 10 tags per object, and each tag can only store 258 unicode characters. The filemanager | ||
avoids storing a large amount of data by using a UUID as the value of the tag, which is linked to object records that | ||
store attributes and data. | ||
|
||
The object tagging process cannot be atomic, so there is a chance for concurrency errors to occur. Tagging can also | ||
fail due to database or network errors. The filemanager only tracks `move_id`s if it knows that a tag has been | ||
successfully created on S3, and successfully stored in the database. If tagging fails, or it's not enabled then the `move_id` | ||
fail due to database or network errors. The filemanager only tracks `ingest_id`s if it knows that a tag has been | ||
successfully created on S3, and successfully stored in the database. If tagging fails, or it's not enabled, then the `ingest_id` | ||
column will be null. | ||
|
||
The object tagging mechanism also doesn't differentiate between moved objects and copied objects with the same tags. | ||
If an object is copied with tags, the `ingest_id` will also be copied and the above logic will apply. | ||
|
||
## Alternative designs | ||
|
||
Alternatively, S3 object metadata could also be used to track moves using a similar mechanism. However, metadata can | ||
only be updated by deleting and recreated the object, so tagging was chosen instead. Another mechanism which could track | ||
moved objects is to compare object checksums or etags. This works but may also be limited if checksum is not present, or | ||
if the etag was computed using a different part-size. Both these approaches could be used in addition to object tagging | ||
to provide the filemanager more ways to track moves. | ||
only be updated by deleting and recreated the object. This process would be much more costly so tagging was chosen instead. | ||
Another approach is to compare object checksums or etags. However, this would also be limited if the checksum is not present, | ||
or if the etag was computed using a different part-size. Both these approaches could be used in addition to object tagging | ||
to provide more ways to track moves. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.