-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29016: fixing cache handling for REST catalog; #5882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- add an event listener to invalidate cached tables impervious to source of change (direct HMS or REST); - added configuration option for event class handler; - lengthened default cache TTL;
nsCatalog = catalog; | ||
} | ||
|
||
public CATALOG hmsUnwrap() { | ||
return nsCatalog; | ||
} | ||
|
||
public void invalidateTable(String dbName, String tableName) { | ||
super.invalidateTable(TableIdentifier.of(dbName, tableName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can call l#60 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version of invalidateTable() is called by the listener and we do want the invalidation performed here. The override at l#60 does nothing on purpose; we do want all invalidations performed through the listener.
public void invalidateTable(TableIdentifier tableIdentifier) { | ||
if (LOG.isDebugEnabled()) { | ||
LOG.debug("Avoid invalidating table: {}", tableIdentifier); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can then do super.invalidateTable() here? Also, why not do nsCatalog.invalidateTable()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to try and have all actual invalidation performed through the listener; this override voids the default call on purpose.
|
* IcebergEventListener is a Hive Metastore event listener that invalidates the cache | ||
* of the HMSCachingCatalog when certain events occur, such as altering or dropping a table. | ||
*/ | ||
public class HMSEventListener extends MetaStoreEventListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to this API. Does this work even when we set up HMS with HA with metastore.thrift.uri.selection=RANDOM
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I would hope so, let me try and seek a definite answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HMS only triggers MetaStoreEventListener for 'local' events so HA would need to disable the cache (for now). A potential solution would be to actively poll events from each HMS instance (need a dedicated thread that polls the NotificationEventRequest and follows the same invalidation logic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I shared it with Jira, HiveCatalog might access two types of remote resources.
- (A) Pointers from table names to the metadata location in the backend RDBMS
- (B) The content of metadata.json on Amazon S3
I'm wondering if we can update HiveCatalog to allow caching (B). The metadata.json is immutable. So, we can safely cache it.
I'd say RDMBS for Iceberg pointers is fast enough, such as 10k RPS.
What changes were proposed in this pull request?
Adding an event listener to Metastore when hosting an Iceberg REST Catalog to invalidate tables (and namespaces) independently of the source of change.
Why are the changes needed?
Cache handling for REST Iceberg catalog may return stale tables.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit tests