-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce globally unique table locations #67
Enforce globally unique table locations #67
Conversation
…e-globally-unique-table-location
AND typeCode = :table_code | ||
AND ( | ||
location LIKE CONCAT(:location, '%') | ||
OR :location LIKE CONCAT(location, '%') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the 2nd condition cause a full-table-scan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as written it unfortunately will.
Of course, this is all dependent on the metastore manager implementation (and in this case the backing database's implementation).
In any event if this check was always being done, I think there are some easy ways to optimize this. But since the check is optional, we may need to check every path. I'm still testing ways to improve performance of this optional check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this problem a bit. It's a tricky-fun one ;)
Just brain-dumping some thoughts:
There are a few things that play a role here - respectively w/ locations in general.
I think we can rely on /
as a separator. With that in mind, a location is a duple of "bucket" (S3/GCS bucket or ADSL fs-name) plus a list of path elements. If we distinguish parent directories from table directories, the check becomes easier. I.e. a separate metastore entity that is only used to track locations.
CREATE TABLE locations (
bucket TEXT NOT NULL, -- storage bucket, e.g. s3://bucket/
path TEXT NOT NULL, -- storage location path, e.g. my/path/to/my-table
kind TEXT NOT NULL, -- marker for "parent-directory" or "table-location"
entity_id TEXT NULL -- id of the table
);
When you want to add/check for a new location
like s3://bucket/my/path/my-table
, the following INSERT
s could do the trick:
INSERT INTO locations (bucket, path, kind, entity_id) VALUES ( 's3://bucket', `my`, `parent`, NULL) ON CONFLICT DO NOTHING;
INSERT INTO locations (bucket, path, kind, entity_id) VALUES ( 's3://bucket', `my/path`, `parent`, NULL) ON CONFLICT DO NOTHING;
INSERT INTO locations (bucket, path, kind, entity_id) VALUES ( 's3://bucket', `my/my-table`, `table`, '1234');
If the last one fails -> location already used -> fail hard. If any of the parents did not succeed, verify that those are all kind = 'parent'
.
I suspect, this needs some more thought around race conditions (two tables with conflicting locations).
queryString = IntStream.range(0, directoryList.size()) | ||
.mapToObj(i -> | ||
"SELECT location " + | ||
"FROM ModelEntityActive " + | ||
"WHERE location IS NOT NULL " + | ||
"AND typeCode = :table_code " + | ||
"AND location = :directory_" + i | ||
) | ||
.collect(Collectors.joining(" UNION ALL ")); | ||
|
||
queryString += " UNION ALL " + | ||
"SELECT location " + | ||
"FROM ModelEntityActive " + | ||
"WHERE location IS NOT NULL " + | ||
"AND typeCode = :table_code " + | ||
"AND location LIKE :locationPrefix"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not do a location IN (:directory_list)
query? I think EclipseLink must support lists as parameters.
@eric-maynard are you planning to rework on this PR? if not, any insight on why this work was halted would be appreciated. Thanks! |
Hi @mayankvadariya, thanks for your interest in this feature. There are a number of considerations here that contributed to me putting this work on hold. I'll walk through them here, but feel free to open up a discussion or thread on Zulip if you want to get into more detail. There's a lot of context here.
|
Description
This PR introduces a new flag,
ENFORCE_GLOBALLY_UNIQUE_TABLE_LOCATIONS
, which enforces that all newly-created tables must have a unique location which does not overlap with any other existing table.This PR additionally introduces another new flag,
ENFORCE_TABLE_LOCATIONS_INSIDE_NAMESPACE_LOCATIONS
, which can be used to disable the requirement a table must reside within a location which is a child of its namespace.Together, these two options can be used to create tables in essentially arbitrary locations within a catalog without violating the invariant that one table cannot be stored in another table's location.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Implemented a new test,
PolarisOverlappingTableTest
Checklist:
Please delete options that are not relevant.