Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist: Enable reference-cache by default #9760

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ as necessary. Empty sections will not end in the release notes.
Furthermore, Sentry integration can also be enabled and configured. And finally, it is now
possible to configure the log level for specific loggers, not just the root logger. The old
`logLevel` field is still supported, but will be removed in a future release.
- If you are not using k8s and you are running Nessie with multiple nodes, you must either
* configure `nessie.version.store.persist.cache-invalidations.service-names`, see
[docs reference](https://projectnessie.org/nessie-latest/configuration/#version-store-advanced-settings),
or
* disable the reference cache by setting `nessie.version.store.persist.reference-cache-ttl` to `PT0S`.
Comment on lines +59 to +63
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of enabling reference cache by default if users have to do additional config changes to run Nessie now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's only if you run multiple nodes without k8s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is reference cache critical for single-node deployments?

"Must" is a strong word. It looks like the new default is going to cause undue failures for naive users when they scale Nessie.


### Changes

Expand All @@ -69,6 +74,11 @@ as necessary. Empty sections will not end in the release notes.
* `py-io-impl=pyiceberg.io.fsspec.FsspecFileIO`
* `s3.signer=S3V4RestSigner` when S3 signing is being used
- Iceberg REST: No longer return `*FileIO` options from the Iceberg REST config endpoint
- The reference-cache is now enabled by default with a TTL of 5 minutes for both cached entries
and negative entries. Updated references are invalidated across all Nessie nodes, which works
out of the box in k8s setups. Other multi-node Nessie setups need to configure
`nessie.version.store.persist.cache-invalidations.service-names`, see
[docs reference](https://projectnessie.org/nessie-latest/configuration/#version-store-advanced-settings).

### Fixes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,15 +149,14 @@ public CacheBackend produceCacheBackend(
cacheConfig.meterRegistry(meterRegistry.get());
}

Optional<Duration> referenceCacheTtl = storeConfig.referenceCacheTtl();
Duration referenceCacheTtl = storeConfig.referenceCacheTtl();
Optional<Duration> referenceCacheNegativeTtl = storeConfig.referenceCacheNegativeTtl();

if (referenceCacheTtl.isPresent()) {
Duration refTtl = referenceCacheTtl.get();
LOGGER.warn(
"Reference caching is an experimental feature but enabled with a TTL of {}", refTtl);
cacheConfig.referenceTtl(refTtl);
cacheConfig.referenceNegativeTtl(referenceCacheNegativeTtl.orElse(refTtl));
if (referenceCacheTtl
.isPositive()) { // allow disabling the reference-cache by setting the TTL to 0
LOGGER.info("Reference caching is enabled with a TTL of {}", referenceCacheTtl);
cacheConfig.referenceTtl(referenceCacheTtl);
cacheConfig.referenceNegativeTtl(referenceCacheNegativeTtl.orElse(referenceCacheTtl));
}

String info = format("Using objects cache with %d MB", effectiveCacheSizeMB);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,8 +150,9 @@ public interface QuarkusStoreConfig extends StoreConfig {
OptionalInt cacheCapacityFractionAdjustMB();

@WithName(CONFIG_REFERENCE_CACHE_TTL)
@WithDefault(DEFAULT_REFERENCE_CACHE_TTL)
@Override
Optional<Duration> referenceCacheTtl();
Duration referenceCacheTtl();

@WithName(CONFIG_REFERENCE_NEGATIVE_CACHE_TTL)
@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,11 @@ nessie.version.store.type=IN_MEMORY
# Settings this value to 0 disables the fixed size object cache.
# Entirely disabling the cache is not recommended and will negatively affect performance.
#nessie.version.store.persist.cache-capacity-mb=0
#
# Reference cache default TTL is 5 minutes.
nessie.version.store.persist.reference-cache-ttl=PT5M
# negative cache TTL defaults to nessie.version.store.persist.reference-cache-ttl
# nessie.version.store.persist.reference-cache-negative-ttl=PT5M

## Transactional database configuration

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ public interface StoreConfig {
long DEFAULT_PREVIOUS_HEAD_TIME_SPAN_SECONDS = 5 * 60;

String CONFIG_REFERENCE_CACHE_TTL = "reference-cache-ttl";
String DEFAULT_REFERENCE_CACHE_TTL = "PT5M";

String CONFIG_REFERENCE_NEGATIVE_CACHE_TTL = "reference-cache-negative-ttl";

Expand Down Expand Up @@ -258,35 +259,30 @@ default long referencePreviousHeadTimeSpanSeconds() {
}

/**
* Defines the duration how long references shall be kept in the cache. Defaults to not cache
* references. If reference caching is enabled, it is highly recommended to also enable negative
* reference caching.
*
* <p>It is safe to enable this for single node Nessie deployments.
* Defines the duration how long references shall be kept in the cache. Defaults is {@value
* #DEFAULT_REFERENCE_CACHE_TTL}, set to {@code PT0M} to disable. If reference caching is enabled,
* it is highly recommended to also enable negative reference caching.
*
* <p>Recommended value is currently {@code PT5M} for distributed and high values like {@code
* PT1H} for single node Nessie deployments.
*
* <p><em>This feature is experimental except for single Nessie node deployments! If in doubt,
* leave this un-configured!</em>
*/
Optional<Duration> referenceCacheTtl();
@Value.Default
default Duration referenceCacheTtl() {
return Duration.parse(DEFAULT_REFERENCE_CACHE_TTL);
}

/**
* Defines the duration how long sentinels for non-existing references shall be kept in the cache
* (negative reference caching).
*
* <p>Defaults to {@code reference-cache-ttl}. Has no effect, if {@code reference-cache-ttl} is
* not configured. Default is not enabled. If reference caching is enabled, it is highly
* recommended to also enable negative reference caching.
* not a positive value. If reference caching is enabled, it is highly recommended to also enable
* negative reference caching.
*
* <p>It is safe to enable this for single node Nessie deployments.
*
* <p>Recommended value is currently {@code PT5M} for distributed and high values like {@code
* PT1H} for single node Nessie deployments.
*
* <p><em>This feature is experimental except for single Nessie node deployments! If in doubt,
* leave this un-configured!</em>
*/
Optional<Duration> referenceCacheNegativeTtl();

Expand Down Expand Up @@ -432,6 +428,6 @@ default Adjustable fromFunction(Function<String, String> configFunction) {
Adjustable withReferenceCacheTtl(Duration referenceCacheTtl);

/** See {@link StoreConfig#referenceCacheNegativeTtl()}. */
Adjustable withReferenceCacheNegativeTtl(Duration referencecacheNegativeTtl);
Adjustable withReferenceCacheNegativeTtl(Duration referenceCacheNegativeTtl);
}
}