Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows, Mac] Can't use call_if_inside/call_if_not_inside matcher decorators with multiprocessing #1181

Closed
kiri11 opened this issue Jul 31, 2024 · 1 comment · Fixed by #1204

Comments

@kiri11
Copy link
Contributor

kiri11 commented Jul 31, 2024

This issue has been reported before (#848, #629, #435), but I debugged it and found the root cause, so decided to create another one with more information.

Example traceback:

Traceback (most recent call last):
  File "libcst/codemod/_cli.py", line 285, in _execute_transform
    output_tree = transformer.transform_module(input_tree)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/codemod/_command.py", line 71, in transform_module
    tree = super().transform_module(tree)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/codemod/_codemod.py", line 108, in transform_module
    return self.transform_module_impl(tree_with_metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/codemod/_visitor.py", line 32, in transform_module_impl
    return tree.visit(self)
           ^^^^^^^^^^^^^^^^
  File "libcst/_nodes/module.py", line 89, in visit
    result = super(Module, self).visit(visitor)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/_nodes/base.py", line 227, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/_nodes/statement.py", line 1814, in _visit_and_replace_children
    body=visit_required(self, "body", self.body, visitor),
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/_nodes/internal.py", line 81, in visit_required
    result = node.visit(visitor)
             ^^^^^^^^^^^^^^^^^^^
  File "libcst/_nodes/base.py", line 236, in visit
    leave_result = visitor.on_leave(self, with_updated_children)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/matchers/_visitors.py", line 515, in on_leave
    if _should_allow_visit(
       ^^^^^^^^^^^^^^^^^^^^
  File "libcst/matchers/_visitors.py", line 425, in _should_allow_visit
    return _all_positive_matchers_true(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "libcst/matchers/_visitors.py", line 404, in _all_positive_matchers_true
    if all_matchers[matcher] is None:
       ~~~~~~~~~~~~^^^^^^^^^
KeyError: FunctionDef

Root cause

The matchers are dataclasses with __hash__ method returning their id. Their id changes in every process, so we can't compare their instances created in different processes.

Even though the same matcher was added to all_matchers, this was when transformer/visitor initialized in the main process. The check causing exception happens in the child process, which has its own copy of all matchers with another set of ids.

This might work on Linux because it uses fork() for multiprocessing by default.
But Windows and MacOS use spawn() with a fresh Python interpreter in each child process.
And it will break even on Linux in Python 3.14+ when Python changes the default way to start child processes in Linux.

The issue does not happen with leave/visit matcher decorators, because they are only used with matches, never compared directly.

Potential solutions

  1. _cli.py can re-initialize _matchers of the MatcherDecoratable transformer/visitor in each child process. Most straightforward way and easy to implement. MatcherDecoratable class can have a function just for that. But every other caller must know to do that in the multiprocessing context.

  2. Make MatcherDecoratableTransformer/MatcherDecoratableVisitor multiprocessing-safe, but how? Maybe delayed initialization of _matchers? But it could still be accessed before the spawn. Or maybe it could save the pid with which matchers were initialized and re-initialize them if pid changes? Seems complicated.

  3. Another way that I don't see? Open for discussion.

kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 2, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 3, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
@kiri11 kiri11 changed the title [Windows, Mac, Linux in future] Can't use call_if_inside/call_if_not_inside matcher decorators with multiprocessing [Windows, Mac] Can't use call_if_inside/call_if_not_inside matcher decorators with multiprocessing Sep 3, 2024
kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 3, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 4, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 4, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 4, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
kiri11 added a commit to kiri11/LibCST that referenced this issue Sep 4, 2024
To support multiprocessing on Windows/macOS
Issue Instagram#1181
@kiri11
Copy link
Contributor Author

kiri11 commented Sep 5, 2024

I decided to go with the second approach (delayed initialization). Matchers are populated right before we need them, so they will end up in the same process.

zsol pushed a commit that referenced this issue Sep 25, 2024
…hem late (#1204)

* Add is_property check

Skip properties to prevent exceptions

* Delayed initialization of matchers

To support multiprocessing on Windows/macOS
Issue #1181

* Add a test for matcher decorators with multiprocessing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant