Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic enter/exit visitor to replace BaseNode.traverse and friends #97

Closed
Pike opened this issue Feb 8, 2019 · 4 comments
Closed

Generic enter/exit visitor to replace BaseNode.traverse and friends #97

Pike opened this issue Feb 8, 2019 · 4 comments

Comments

@Pike
Copy link
Contributor

Pike commented Feb 8, 2019

The simple read-only visitor we landed in #96 is great for performance, but doesn't cut it for actual transformations. I'm having a local branch, that I hope is generic enough to replace traverse.

Not happy with the name of it, I'm using ContextVisitor, because it uses enter and exit, but I can't really use python context managers, because you can't pass arguments to __exit__.

I'm implementing .traverse() as part of the patch to fluent.syntax, but to make things more tangible, here's how transforms_from could look like. Compare with https://hg.mozilla.org/l10n/fluent-migration/file/797c19359d4b/fluent/migrate/helpers.py#l43 through line 123.

class IntoTranforms(FTL.ContextVisitor):
    IMPLICIT_TRANSFORMS = ("CONCAT",)
    FORBIDDEN_TRANSFORMS = ("PLURALS", "REPLACE", "REPLACE_IN_TEXT")

    def __init__(self, substitutions):
        self.substitutions = substitutions

    def generic_exit(self, node, props):
        return node.__class__(**props)

    def enter_Junk(self, node):
        anno = node.annotations[0]
        raise InvalidTransformError(
            "Transform contains parse error: {}, at {}".format(
                anno.message, anno.span.start))

    def enter_CallExpression(self, node):
        name = node.callee.id.name
        if name in self.IMPLICIT_TRANSFORMS:
            raise NotSupportedError(
                "{} may not be used with transforms_from(). It runs "
                "implicitly on all Patterns anyways.".format(name))
        if name in self.FORBIDDEN_TRANSFORMS:
            raise NotSupportedError(
                "{} may not be used with transforms_from(). It requires "
                "additional logic in Python code.".format(name))
        return True

    def exit_CallExpression(self, node, props):
        if node.callee.id.name != 'COPY':
            return self.generic_exit(node, props)
        args = (self.into_argument(arg) for arg in node.positional)
        kwargs = {
            arg.name.name: self.into_argument(arg.value)
            for arg in node.named}
        return COPY(*args, **kwargs)

    def exit_Placeable(self, node, props):
        if isinstance(props['expression'], Transform):
            return props['expression']
        return self.generic_exit(node, props)

    def exit_Pattern(self, node, props):
        # Replace the Pattern with CONCAT which is more accepting of its
        # elements. CONCAT takes PatternElements, Expressions and other
        # Patterns (e.g. returned from evaluating transforms).
        return CONCAT(*props['elements'])

    def into_argument(self, node):
        """Convert AST node into an argument to migration transforms."""
        if isinstance(node, FTL.StringLiteral):
            # Special cases for booleans which don't exist in Fluent.
            if node.value == "True":
                return True
            if node.value == "False":
                return False
            return node.value
        if isinstance(node, FTL.MessageReference):
            try:
                return self.substitutions[node.id.name]
            except KeyError:
                raise InvalidTransformError(
                    "Unknown substitution in COPY: {}".format(
                        node.id.name))
        else:
            raise InvalidTransformError(
                "Invalid argument passed to COPY: {}".format(
                    type(node).__name__))
Pike added a commit to Pike/python-fluent that referenced this issue Feb 8, 2019
@stasm
Copy link
Contributor

stasm commented Feb 11, 2019

I'm looking forward to retiring the traverse API. Thanks for spearheading this effort, @Pike!

The proposed ContextVisitor looks good functionality-wise, but I do wonder if the API could be streamlined a bit. In the current iteration, do enter_ methods control whether the visitor should descend into children, while exit_* methods perform transformations?

I really liked it how you based the read-only Visitor class on Python's ast.NodeVisitor. Not only does it leverage a good pattern but it also increases the chance that someone who's worked with Python's ast module before will instantly feel at home in fluent.syntax.

For a read-write visitor, I'm looking at Python's ast.NodeTransformer class. It looks like a good match for the use-case we have in fluent.migrate. What are the benefits of ContextVisitor over NodeTransformer? Are you open to considering the pattern implemented by the NodeTransformer instead?

@Pike
Copy link
Contributor Author

Pike commented Feb 11, 2019

Looking at their source [1], they're doing in-place modifications. That's an interesting way to get performance, as it avoids a lot of object creation and GC. One can't implement copy with that, though. And doing something like a compile or transform step wouldn't be idempotent, as they'd "destroy" the original AST. Probably can be tweaked in the transformer to fast-path transformed results.

Maybe that's OK? Maybe we'll just add an explicit .copy() if the need arises?

Clarification re enter_*, it can do more than just control descend, it would for example also track ancestor state. Like the .attr I have in my read-only-visitor patch in compare-locales.

[1] https://github.com/python/cpython/blob/2f1a317d5fdc45b9d714b067906f612f636ba08e/Lib/ast.py#L290-L311

@stasm
Copy link
Contributor

stasm commented Feb 12, 2019

Thanks for your thoughts, @Pike.

Maybe that's OK? Maybe we'll just add an explicit .copy() if the need arises?

To document our discussion on IRC, this sounds like a good path forward. It allows us to separate two concerns: transformation and copying. The old traverse API coupled them together without giving the user of the API the choice of whether to copy AST nodes or not.

For use-cases which require the original AST to remain intact, an explicit copy() gives us parity with the old traverse() API. Other use-cases might be OK with destructive visits of the NodeTransformer. The partial evaluator from #95 could take advantage of this pattern by transforming the parsed AST into a tree of callable objects.

This highlights an interesting difference between the partial evaluator (#95) and the runtime interpreter which landed in #81 (whichis similar to the one in fluent.js). The runtime interpreter must not corrupt the original AST during the process of formatting, while the partial evaluator can be made to do so (with a few caveats about caching).

@Pike
Copy link
Contributor Author

Pike commented Feb 12, 2019

I've got a local branch, that I still need to polish. But I did some perf tests, and while a clone is functionally equivalent, it really performs badly. Also, you need some trickery if you want to replicate that traverse post-processes lists by calling fun on it, which Transformer.visit doesn't expose by default.

The perf test I'm doing is the search-and-replace code we have in Pontoon, https://github.com/mozilla/pontoon/blob/78c56b63a1233db0a4754136b0b4f57506ec4341/pontoon/batch/utils.py#L53-L62, to show a somewhat realistic example. Somewhat realistic because it also hits the parser, which is probably gonna be dominant here.

Here are the perf results so far:

edit_traverse:	0.00835537862778
edit_transform:	0.00500656223297
edit_cloned:	0.0115759100914

edit_traverse is old-school, edit_transform is new-school, and almost twice as fast, and edit_cloned is new-school-mock-old-school, which is around a 50% perf hit.

With that in mind, I'm going to add a deprecation note to traverse, but not throw out the current implementation.

Pike added a commit to Pike/python-fluent that referenced this issue Feb 12, 2019
@Pike Pike closed this as completed in 23e649e Feb 13, 2019
Pike added a commit to Pike/python-fluent that referenced this issue May 20, 2020
In projectfluent#97, we deprecated `BaseNode.traverse`. It's now time to remove it.
The reason it was still around is mostly for a niche in performance,
but dedicated iterators are probably the way to go for that niche.
That's what fluent.runtime does, for example.
Pike added a commit that referenced this issue May 20, 2020
In #97, we deprecated `BaseNode.traverse`. It's now time to remove it.
The reason it was still around is mostly for a niche in performance,
but dedicated iterators are probably the way to go for that niche.
That's what fluent.runtime does, for example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants