Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce compiled normalizer cache #500

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

romm
Copy link
Member

@romm romm commented Mar 17, 2024

This commit introduces a new compilation step for the normalizer, which aims to bring huge performance gains when normalizing any value. It works by adding a static analysis step to the process, which will recursively analyse how the normalizer should perform for every value it can meet. This process results in a native PHP code entry, that can then be cached for further usage.

The PHP code is an anonymous class that implements the Transformer interface, and is responsible of recursively transforming an input to the same structure, but with only arrays and scalar values. It is done by adding methods to the class, where basically each method is used to transform a specific type.

Example #1: a method that transforms a string, with a registered custom transformer.

(new MapperBuilder())
    ->registerTransformer(fn (string $value) => strtoupper($value))
    ->normalizer(Format::array())
    ->normalize('some string');

// Generated method for a string value:
private function transform_string_xyz(mixed $value): mixed
{
    $next = fn () => $value;
    if (\is_string($value)) {
        $next = fn () => $this->transformers[0]($value, $next);
    }
    return $next();
}

Example #2: a method that transforms a property of a class, which is typed as a shaped array.

final readonly class SomeClass
{
    public function __construct(
        /** @var array{foo: string, bar: int} */
        public array $someArray,
    ) {}
}

// Generated method for the shaped array property:
private function transform_shaped_array_xyz(array $value): array
{
    $result = [];
    foreach ($value as $key => $item) {
        $result[$key] = match ($key) {
            'foo' => $this->transform_string_xyz($item),
            'bar' => $this->transform_int_xyz($item),
            default => $this->transform_mixed($item),
        };
    }
    return $result;
}

The entry point to this process can be found in CacheTransformer where the following steps occur:

  1. The value to be normalized is analyzed: an instance of Type is created, which will be the root type for the whole static analysis process.
  2. The TransformerDefinitionBuilder will recursively fetch all types that can be met during the normalization. For instance, if the root type is a class, then all its properties and their types will be included in the process. For each type, transformers (registered with MapperBuilder::registerTransformer) are analyzed: if a transformer matches a type, it will be part of the compilation process.
  3. For each type, a TypeFormatter will be called. These type formatters are responsible for manipulating the compiler which is used to create the PHP code that will normalize the value. How it is done really depends on the type and can be really simple (e.g. DateTimeFormatter) or quite complex (e.g. ClassFormatter).
  4. The PHP code is then written into a file and included when needed.

romm added 10 commits April 10, 2024 14:48
…piled-normalizer

# Conflicts:
#	src/Cache/KeySanitizerCache.php
#	src/Library/Settings.php
#	src/Normalizer/ArrayNormalizer.php
#	src/Normalizer/Formatter/JsonFormatter.php
#	src/Normalizer/Transformer/RecursiveTransformer.php
#	src/Normalizer/Transformer/ValueTransformersHandler.php
#	tests/Integration/Normalizer/NormalizerTest.php
@CuyZ CuyZ deleted a comment from Mediagone Oct 14, 2024
romm added 5 commits October 18, 2024 15:11
…piled-normalizer

# Conflicts:
#	src/Normalizer/Formatter/JsonFormatter.php
@romm romm force-pushed the feat/compiled-normalizer branch from 064279c to 9b0306f Compare January 18, 2025 13:42
romm added 4 commits January 18, 2025 14:43
# Conflicts:
#	src/Library/Container.php
#	src/Normalizer/Transformer/KeyTransformersHandler.php
#	src/Normalizer/Transformer/ValueTransformersHandler.php
…piled-normalizer

# Conflicts:
#	src/Normalizer/Formatter/JsonFormatter.php
Copy link
Contributor

@Baptouuuu Baptouuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not versed enough into the project to every details of the transformers so I skimmed a bit over it.

@@ -93,7 +123,7 @@ public function matches(Type $other): bool
}

if (! $other->subType) {
return false;
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was indeed. I planned to do it in another commit but forgot. Thanks for noticing! 😊

Copy link
Member Author

@romm romm Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #615

@@ -59,6 +60,8 @@ public function accepts(mixed $value): bool
return true;
}

public function compiledAccept(CompliantNode $node): CompliantNode {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be implemented ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It's one of the few things left to be done.

Comment on lines +54 to +57
return ! Polyfill::array_any(
$value,
fn (mixed $item, mixed $key) => ! $this->subType->accepts($item),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the double negative why not use array_all ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using array_all would mean looping over all of the values, whereas array_any will stop on the first invalid value. A small performance improvement, but less code readability maybe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_all stops iterating over the array if the callback returns false, the performance should be the same.

Node::functionCall(function_exists('array_any') ? 'array_any' : Polyfill::class . '::array_any', [
$node,
Node::shortClosure(
Node::logicalOr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the wrap in a logical or ? As I understand it so far, removing this will produce the same compiled code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

* @param array<mixed> $arguments
* @return array<Node>
*/
private function argumentNode(array $arguments): array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is recursive, it may be interesting to make it static to avoid capturing $this in the array_map closure for each recursion level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. What about making the closure static? Do you think one of these solutions would be best?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the method and closure should be static.

If only the closure is, it will still need to capture the object via the $self = $this; use ($self) approach. But this is more complex and still has the same problem.

public function markAsSure(): self
{
$self = clone $this;
$self->isSure = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the meaning of this flag, I'm not sure a fully grasp its extent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some documentation: d696f8b; is it enough to understand properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect 👌

@@ -12,7 +12,7 @@

use const JSON_THROW_ON_ERROR;

final class JsonFormatterTest extends TestCase
final class JsonStreamFormatterTest extends TestCase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rename this test since the JsonFormatter hasn't changed its name ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it came from a change that was not commited (or rolled back).

@@ -0,0 +1,103 @@
<?php // Generated by CuyZ\Valinor\Cache\FileSystemCache
return fn (array $transformers, CuyZ\Valinor\Normalizer\Transformer\Transformer $delegate) => new class ($transformers, $delegate) implements \CuyZ\Valinor\Normalizer\Transformer\Transformer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fully qualify the interface in implements but not as the argument ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 probably made sense at some point, not anymore. I changed it, thanks.

// To help updating the expectation file, uncomment the following line
// and run the test once. Then, comment it back. Remember to check that
// the updated content is correct.
// file_put_contents($expectedFile, file_get_contents($cacheFiles[0]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's relevant but it could be interesting to validate the cache file doesn't contain a syntax error.

Or are other tests on the compiled code enough ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests already make sure of that 😉

],
];

yield 'positive integer with non-negative-int transformer' => [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be interesting to also test non-*-int transformers with the 0 value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need moooore tests! 😎

Thanks for the suggestion.

@romm romm force-pushed the feat/compiled-normalizer branch from dd0a6c6 to 3877f1f Compare March 9, 2025 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants