Skip to content

Perf: Concurrent ConfigCache #12052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ccastanedaucf
Copy link
Contributor

@ccastanedaucf ccastanedaucf commented Jun 20, 2025

Fixes

Reduces CPU time in ConfigCache due to coarse-grained locking.

image

image

Context

Right now, ConfigCache uses a system of two backing Dictionary instances protected by a lock to ensure that any operations atomically update both collections, and that concurrent reads don't fail:

private IDictionary<int, BuildRequestConfiguration> _configurations;
private IDictionary<ConfigurationMetadata, int> _configurationIdsByMetadata;
private object _lockObject = new object();

Given that the most frequent operations here are reads, this causes a significant amount of lock contention as seen above.

In some cases the dictionaries are replaced together or modified in a way that can't be gated via a simple TryX method - so to replace with ConcurrentDictionary, we need some additional steps to ensure that an unexpected race condition doesn't cause the cache to go out of sync.

Changes Made

  • Methods which perform multiple operations on the dictionaries take a local reference so that they're guaranteed to be operating on the same instance (in the event that it gets swapped out mid-execution).
  • Both dictionaries are wrapped in a instance container so that methods which do a replace, clear, or serialize can do so atomically.
private record class Configurations : ITranslatable
{
    private ConcurrentDictionary<int, BuildRequestConfiguration> _byId;
    private ConcurrentDictionary<ConfigurationMetadata, BuildRequestConfiguration> _byMetadata;
}
  • Double adds / removes explicitly throw if an unrecoverable race is detected. Note that the current AddConfiguration() implementation already throws if an existing ID is added, so this isn't expected behavior.
    • e.g. ById.TryAdd() == true -> ById.TryRemove() == true -> ByMetadata.TryRemove() == false -> ByMetadata.TryAdd() == true

Additional Changes

  • Enumerations now go through the full KeyValuePair enumerator, as both Keys and Values on ConcurrentDictionary will create a new array and copy a snapshot, which is different than Dictionary behavior. This enumerator is threadsafe.
  • Translate just serializes the ID dictionary. Both can be derived based on BuildRequestConfiguration, so it doesn't really matter which one is picked.

@Copilot Copilot AI review requested due to automatic review settings June 20, 2025 21:40
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the lock-protected Dictionary pair in ConfigCache with a container of two ConcurrentDictionary instances to reduce lock contention on reads and ensure atomic swaps of both lookups.

  • Introduce a Configurations record wrapping ById and ByMetadata concurrent dictionaries.
  • Update add/remove/lookup methods to use TryAdd/TryRemove and local references instead of a global lock.
  • Adjust enumeration and Translate logic for thread-safe iteration and serialization.

Comment on lines 79 to 83
if (!configurations.ById.TryAdd(config.ConfigurationId, config)
|| !configurations.ByMetadata.TryAdd(new ConfigurationMetadata(config), config))
{
int configId = GetKeyForConfiguration(config);
ErrorUtilities.VerifyThrow(!_configurations.ContainsKey(configId), "Configuration {0} already cached", config.ConfigurationId);
_configurations.Add(configId, config);
_configurationIdsByMetadata.Add(new ConfigurationMetadata(config), configId);
ErrorUtilities.ThrowInternalError("Configuration {0} already cached", config.ConfigurationId);
}
Copy link
Preview

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In AddConfiguration, if ById.TryAdd succeeds but ByMetadata.TryAdd fails, the first add isn’t rolled back. This leaves the two dictionaries out of sync. Consider removing the entry from ById on failure, or perform the second add first and roll back accordingly.

Copilot uses AI. Check for mistakes.

{
configurationIdsCleared.Add(configId);
configuration.ClearCacheFile();
continue;
Copy link
Preview

Copilot AI Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ClearNonExplicitlyLoadedConfigurations, cleared IDs are never added to configurationIdsCleared, so the returned list is always empty. Call configurationIdsCleared.Add(configId) before the continue to record removed configurations.

Copilot uses AI. Check for mistakes.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question before I fully load this into my brain: did you consider changing the normal exclusive locks for a ReaderWriterLockSlim or similar?

@ccastanedaucf
Copy link
Contributor Author

Quick question before I fully load this into my brain: did you consider changing the normal exclusive locks for a ReaderWriterLockSlim or similar?

I only grepped one occurrence in the repo, so I figured it was either not available on framework or there was a specific reason 😝 Should be quick to try out and profile

@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Jun 20, 2025

I only grepped one occurrence in the repo, so I figured it was either not available on framework or there was a specific reason 😝 Should be quick to try out and profile

ReaderWriterLockSlim did not go well...

image

E.g. filtering out dictionary stuff on the indexer:
image

@@ -91,11 +89,14 @@ public void AddConfiguration(BuildRequestConfiguration config)
/// <param name="configId">The id of the configuration to remove.</param>
public void RemoveConfiguration(int configId)
{
lock (_lockObject)
Configurations configurations = _configurations;
if (configurations.ById.TryRemove(configId, out BuildRequestConfiguration config))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential disadvantage of a patten like this is that racing threads can potentially leave the caches in an inconsistent state. Consider this scenario:

AddConfiguration() and RemoveConfiguration() are called at the same time on different threads and the execution of the calls get interleaved like this:

configurations.ById.TryAdd(config.ConfigurationId, config)
configurations.ById.TryRemove(configId, out BuildRequestConfiguration config)
configurations.ByMetadata.TryRemove(new ConfigurationMetadata(config), out _)
configurations.ByMetadata.TryAdd(new ConfigurationMetadata(config), config)

Looks like this somewhat accounted for by ThrowInternalError(), and I'd guess that this scenario is unlikely in practice. However, it is something we should carefully consider since threading issues are a pain to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that's the same scenario I outlined above, luckily this actually can't happen right now though so it's mostly defending against a future refactor that accidentally causes a race.

RemoveConfiguration is only called by BuildRequestEngine, which gates everything through a single-threaded dataflow which is just used as a work queue. It doesn't remove from the shared global config cache either, just a local instance it uses to track unresolved configs. The global config cache that's shared via IComponentHost is add-only outside of cleanup operations which (even today) swap out the whole cache and throw on a double-add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants