-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support persisting TableMetadata in the metastore #433
base: main
Are you sure you want to change the base?
Conversation
This will potentially reduce a lot of I/O overhead! Thanks for working on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at this! I think it's worth discussing whether metadata contents could be better stored within the TableLikeEntity itself.
|
||
@JsonIgnore | ||
public String getContent() { | ||
return getInternalPropertiesAsMap().get(CONTENT_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be stored in TableLikeEntity
's internalProperties instead of introducing a new entity type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of reasons why I chose not to put this in TableLikeEntity
- If the metadata is too large to fit in a string (happens on mysql), then it's problematic if the persistence layer rejects writes to TableLikeEntity. On the other hand, it's fairly innocuous if the persistence layer rejects a TableMetadata
- This properly captures the parent-child / 1:N relationship between tables and their metadata -- in theory, you can have multiple metadata files cached for a given table
- This better supports future extensions that structure the metadata. It's one thing to stick the JSON in the internalProperties of a TableLikeEntity, but it's another to start adding fields for the partition scheme, the schema, etc.
.getMetaStoreManager() | ||
.createEntityIfNotExists( | ||
callContext, | ||
PolarisEntity.toCoreList(resolvedEntities.getRawFullPath()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the first time we have something with a parentId that isn't a Namespace or a Catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CATALOG_ROLE
is technically a child of a catalog, and we also have FILE
PolarisMetaStoreManager.EntityResult metadataEntityResult = | ||
entityManager | ||
.getMetaStoreManager() | ||
.loadEntity(callContext, result.getCatalogId(), result.getId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using loadEntity means we won't benefit from the EntityCache
which means an extra roundtrip to the persistence store.
If we embed the contents directly into the main TableLikeEntity
then the EntityCache including its cache invalidation will just automatically work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should ideally just make the cache work for the TableMetadata
.
Technically the metadata location is the name, so we could try to resolve by name. However, that wouldn't let us lazily delete the table's old metadata here. So I think that is maybe an optimization to save for later.
polaris-service/src/main/java/org/apache/polaris/service/persistence/MetadataCacheManager.java
Outdated
Show resolved
Hide resolved
PolarisMetaStoreManager.ListEntitiesResult metadataResult = | ||
entityManager | ||
.getMetaStoreManager() | ||
.listEntities( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra listEntities round trip here could really hurt performance. Is this just because we're not making the creation of the metadata entity atomic with the underlying TableLikeEntity so multiple creations of metadata entities could happen at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's scoped very tightly to the path, it shouldn't be extremely expensive. Having said that, perhaps we should wrap the loadMetadata
call in a transaction to avoid the second list. WDYT?
Description
This adds a new flag
METADATA_CACHE_MAX_BYTES
which allows the catalog to store table metadata in the metastore and vend it from there when loadTable is called.Entries are cached based on the metadata location. Currently, the cache is implemented as a "lazy" cache where entries are only loaded into the cache when
loadTable
is called. The entire metadata.json content is cached.Features not included in this PR:
I'm planning to follow up with at least item (1), but the goal is to structure things in a way that will allow us to implement (2) and (3) in the future as well.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Existing tests vend table metadata correctly when caching is enabled.
Added a small test in
BasePolarisCatalogTest
to cover the basic semantics of cachingManual testing with eclipselink -- I observed the entities getting created in Postgres and saw large metadata being cached:
With MySQL, small metadata is persisted:
However large metadata may cause
internalproperties
to exceed the size limit and nothing will be cached. Calls still return safely.