Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Overhaul ContentCacheFlusher #5221

Merged
merged 8 commits into from
Aug 30, 2024

Conversation

dlubitz
Copy link
Contributor

@dlubitz dlubitz commented Aug 23, 2024

Cleanup ContentCacheFlusher

  • The ContentCacheFlusher is now only responsible for creating CacheTags and flushing them, based on a Flush**Request. No node traversing within the ContentCacheFlusher anymore.
  • Also I introduced a CacheFlushingStrategy to control the time of "real" flushing, but keeping the interface clean and equal for different cases (immediately or on shutdown).
  • The ContentCacheFlusher doesn't handle assets anymore. This has been moved into AssetChangeHandlerForCacheFlushing
  • New method to flush all caches of a workspace.

GraphProjectorCatchUpHookForCacheFlushing

  • Determining all parent nodes here, instead of the ContentCacheFlusher .
  • Listen to the Workspace*Discard events.
  • Handle cache flush requests for workspaces on Workspace*Discard events.
  • Flush cache also for the deleted node, and not just their parents.
  • Replace contentstream with workspaceName of events.

CacheTag

  • I introduced a new CacheTag which allows to flush all cache entries of a workspace.
  • This is needed to reset the cache of a user workspace after a discard event.

NodeCacheEntryIdentifier

  • Re-Reverted changes to the EntryIdentifier, which is now based on the workspaceName again.

AssetChangeHandlerForCacheFlushing

  • New slot for AssetService::assetUpdated signal, to handle asset changes.
  • Determining all parent nodes here, instead of the ContentCacheFlusher .

Tests
Added tests for:

  • Cache flush on partial discard of changes
  • Cache flush on descendant nodes (2 levels)

Resolves #5175 and related issues

@dlubitz dlubitz marked this pull request as ready for review August 27, 2024 13:56
Comment on lines +85 to +102
public function nodeTagForIdentifier(string $identifier, Node $contextNode): array
{
return CacheTag::forNodeAggregate(
$contextNode->contentRepositoryId,
$contextNode->workspaceName,
NodeAggregateId::fromString($identifier)
)->value;
return [
CacheTag::forNodeAggregate(
$contextNode->contentRepositoryId,
$contextNode->workspaceName,
NodeAggregateId::fromString($identifier)
)->value,
CacheTag::forNodeAggregate(
$contextNode->contentRepositoryId,
CacheTagWorkspaceName::ANY,
NodeAggregateId::fromString($identifier)
)->value,
CacheTag::forWorkspaceName(
$contextNode->contentRepositoryId,
$contextNode->workspaceName
)->value
];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to change the return type here, to be able to generate also the workspace tags for identifiers, which is breaking if someone relies on a string. But it works without any changes in the @cache.entryTags

Comment on lines 298 to 311
public function onBeforeBatchCompleted(): void
{
foreach ($this->flushNodeAggregateRequestsOnBeforeBatchCompleted as $index => $request) {
$this->contentCacheFlusher->flushNodeAggregate($request, CacheFlushingStrategy::IMMEDIATELY);
$this->flushNodeAggregateRequestsOnAfterCatchUp[$index] = $request;
}
$this->flushNodeAggregateRequestsOnBeforeBatchCompleted = [];

foreach ($this->flushWorkspaceRequestsOnBeforeBatchCompleted as $index => $request) {
$this->contentCacheFlusher->flushWorkspace($request, CacheFlushingStrategy::IMMEDIATELY);
$this->flushWorkspaceRequestsOnAfterCatchUp[$index] = $request;
}
$this->flushWorkspaceRequestsOnBeforeBatchCompleted = [];
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to discuss this "double-cache-flush" approach. See Sebastians description in the head of this file (eventual consistency) vs. performance considerations.

Comment on lines +288 to +291
// Prevent infinite loops
if (!$collectedParentNodeAggregateIds->contain($parentNodeAggregateId)) {
$collectedParentNodeAggregateIds = $collectedParentNodeAggregateIds->merge(NodeAggregateIds::create($parentNodeAggregateId));
$collectedParentNodeAggregateIds = $this->determineParentNodeAggregateIds($workspaceName, $parentNodeAggregateId, $collectedParentNodeAggregateIds);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to prevent infinite loops here, but just for a theroretical case. AFAIR we do not allow circular dependencies, BUT we have a test case which creates that case to test, that it doesn't happen. 🤷‍♂️

@@ -120,6 +155,27 @@ public function __construct(
) {
}

public function canHandle(EventInterface $event): bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a workaround, to reduce the subset of possible events, as we are not able to "detect" events, which provide a workspaceName. Also this brings a bit of clearity, which events this catchup hook actually expects.

Copy link
Member

@kdambekalns kdambekalns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 by 👀 – just a nitpick and a question.

Neos.Neos/Classes/Fusion/Cache/CacheFlushingStrategy.php Outdated Show resolved Hide resolved
*
* @throws NodeTypeNotFound
*/
public function registerAssetChange(AssetInterface $asset): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the tests, asset handling still works, but… why? 😳

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kitsunet kitsunet merged commit 916a482 into neos:9.0 Aug 30, 2024
9 checks passed
Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some post-merge comments as discussed

Comment on lines +201 to +202
WorkspaceWasDiscarded::class => null, // only needed for GraphProjectorCatchUpHookForCacheFlushing
WorkspaceWasPartiallyDiscarded::class => null, // only needed for GraphProjectorCatchUpHookForCacheFlushing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could clarify the comments slightly along the lines of:

// the following two events are not actually handled, but we need to include them in {@see canHandle()} in order to trigger the catch up hooks for those (i.e. {@see GraphProjectorCatchUpHookForCacheFlushing}). This can be removed with https://github.com/neos/neos-development-collection/issues/4992

@@ -70,7 +70,7 @@ Feature: ForkContentStream Without Dimensions

And the event NodePropertiesWereSet was published with payload:
| Key | Value |
| workspaceName | "user" |
| workspaceName | "user-test" |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test affected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was wrong the whole time, but the test did't fail.

Comment on lines +80 to +98
private function determineParentNodeAggregateIds(ContentRepository $contentRepository, WorkspaceName $workspaceName, NodeAggregateId $childNodeAggregateId, NodeAggregateIds $collectedParentNodeAggregateIds): NodeAggregateIds
{
$parentNodeAggregates = $contentRepository->getContentGraph($workspaceName)->findParentNodeAggregates($childNodeAggregateId);
$parentNodeAggregateIds = NodeAggregateIds::fromArray(
array_map(static fn (NodeAggregate $parentNodeAggregate) => $parentNodeAggregate->nodeAggregateId, iterator_to_array($parentNodeAggregates))
);

foreach ($parentNodeAggregateIds as $parentNodeAggregateId) {
// Prevent infinite loops
if (!$collectedParentNodeAggregateIds->contain($parentNodeAggregateId)) {
$collectedParentNodeAggregateIds = $collectedParentNodeAggregateIds->merge(NodeAggregateIds::create($parentNodeAggregateId));
$collectedParentNodeAggregateIds = $this->determineParentNodeAggregateIds($contentRepository, $workspaceName, $parentNodeAggregateId, $collectedParentNodeAggregateIds);
}
}


return $collectedParentNodeAggregateIds;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the ancestorNodeAggregateIds to be correct.
And, if we touch this again, I would suggest to streamline it a bit. Using iteration instead of recursion would make this easier to read and faster, e.g.:

    private function determineAncestorNodeAggregateIds(ContentRepository $contentRepository, WorkspaceName $workspaceName, NodeAggregateId $childNodeAggregateId): NodeAggregateIds
    {
        $contentGraph = $contentRepository->getContentGraph($workspaceName);
        /** @var array<NodeAggregate> $stack */
        $stack = iterator_to_array($contentGraph->findParentNodeAggregates($childNodeAggregateId));
        $ancestorNodeAggregateIds = [];
        do {
            $nodeAggregate = array_shift($stack);
            $ancestorNodeAggregateIds[] = $nodeAggregate->nodeAggregateId;
            array_push($stack, ...iterator_to_array($contentGraph->findParentNodeAggregates($nodeAggregate->nodeAggregateId)));
        } while ($stack !== []);
        return NodeAggregateIds::fromArray($ancestorNodeAggregateIds);
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw: Maybe it makes sense to add this method to the ContentGraphInterface...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed something similar with @kitsunet and @mhsdesign last week. findAncestorNodes on ContentGraphInterface. WDYT?

public WorkspaceName $workspaceName,
public NodeAggregateId $nodeAggregateId,
public NodeTypeName $nodeTypeName,
public NodeAggregateIds $parentNodeAggregateIds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be $ancestorNodeAggregateIds here as well

WorkspaceName $workspaceName,
NodeAggregateId $nodeAggregateId,
NodeTypeName $nodeTypeName,
NodeAggregateIds $parentNodeAggregateIds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here :)

@dlubitz dlubitz deleted the featue/overhaul-content-cache-flusher branch September 3, 2024 20:38
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overhaul ContentCacheFlusher
4 participants