Skip to content

HIVE-29016: fixing cache handling for REST catalog; #5882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

henrib
Copy link
Contributor

@henrib henrib commented Jun 20, 2025

  • add an event listener to invalidate cached tables impervious to source of change (direct HMS or REST);
  • added configuration option for event class handler;
  • lengthened default cache TTL;

What changes were proposed in this pull request?

Adding an event listener to Metastore when hosting an Iceberg REST Catalog to invalidate tables (and namespaces) independently of the source of change.

Why are the changes needed?

Cache handling for REST Iceberg catalog may return stale tables.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

- add an event listener to invalidate cached tables impervious to source of change (direct HMS or REST);
- added configuration option for event class handler;
- lengthened default cache TTL;
nsCatalog = catalog;
}

public CATALOG hmsUnwrap() {
return nsCatalog;
}

public void invalidateTable(String dbName, String tableName) {
super.invalidateTable(TableIdentifier.of(dbName, tableName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can call l#60 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version of invalidateTable() is called by the listener and we do want the invalidation performed here. The override at l#60 does nothing on purpose; we do want all invalidations performed through the listener.

public void invalidateTable(TableIdentifier tableIdentifier) {
if (LOG.isDebugEnabled()) {
LOG.debug("Avoid invalidating table: {}", tableIdentifier);
}
Copy link
Contributor

@saihemanth-cloudera saihemanth-cloudera Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can then do super.invalidateTable() here? Also, why not do nsCatalog.invalidateTable() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to try and have all actual invalidation performed through the listener; this override voids the default call on purpose.

Copy link

* IcebergEventListener is a Hive Metastore event listener that invalidates the cache
* of the HMSCachingCatalog when certain events occur, such as altering or dropping a table.
*/
public class HMSEventListener extends MetaStoreEventListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to this API. Does this work even when we set up HMS with HA with metastore.thrift.uri.selection=RANDOM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I would hope so, let me try and seek a definite answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HMS only triggers MetaStoreEventListener for 'local' events so HA would need to disable the cache (for now). A potential solution would be to actively poll events from each HMS instance (need a dedicated thread that polls the NotificationEventRequest and follows the same invalidation logic).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I shared it with Jira, HiveCatalog might access two types of remote resources.

  • (A) Pointers from table names to the metadata location in the backend RDBMS
  • (B) The content of metadata.json on Amazon S3

I'm wondering if we can update HiveCatalog to allow caching (B). The metadata.json is immutable. So, we can safely cache it.
I'd say RDMBS for Iceberg pointers is fast enough, such as 10k RPS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants