Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for lazy loading and imports of some expensive subpackages and modules to speed up Perun startup time #259

Merged
merged 6 commits into from
Oct 6, 2024

Conversation

JiriPavela
Copy link
Collaborator

@JiriPavela JiriPavela commented Sep 30, 2024

This PR implements the mechanism proposed in #223 for the most expensive modules that were being imported during perun --version, perun import, and perun showdiff.

@JiriPavela JiriPavela changed the title Add support for lazy loading and imports of expensive subpackages and modules Add support for lazy loading and imports of some expensive subpackages and modules to speed up Perun startup time Oct 5, 2024
@JiriPavela JiriPavela marked this pull request as ready for review October 5, 2024 21:40
@JiriPavela JiriPavela requested a review from tfiedor October 5, 2024 21:40
Copy link
Collaborator

@tfiedor tfiedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If I am looking right, view is not fully lazy loaded and it was quite a source of the problems (with holoviews/bokeh being quite huge). Is that WIP?
  2. Can you show some numbers? Like running perun help, perun status, perun init and maybe perun showdiff before and after? Just few runs and few numbers so we can compare how much we have saved, because it did come with a cost (worse readability, bigger complexity, and I fear it might be harder for IDEs to suggest).

Well done anyway.

@@ -94,7 +100,7 @@ def store_model_counts(analysis: list[dict[str, Any]]) -> None:
@click.option(
"--regression_models",
"-r",
type=click.Choice(regression_models.get_supported_models()),
type=click.Choice(perun.utils.structs.postprocess_public.get_supported_models()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why full name? This seems like automatic change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the catch. Fixed.

@@ -832,6 +834,8 @@ def set_optimization(_: click.Context, param: click.Argument, value: str) -> str
:param value: value of the parameter
:return: the value
"""
from perun.collect.trace.optimizations.optimization import Optimization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended? Again it looks automatic and imports in functions are shuned by the linting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually intended. Optimization is a global object in the optimizations module and it is too complicated to refactor it right now, as many more modules and packages would have to be lazy imported. So this is a temporary solution, similarly to how many other packages do import their nested modules in lazy_get_cli_commands, e.g., view, collect, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add fixme there then, so we do not forget.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many more places where local imports are used in the codebase right now before we decided to adopt the lazy-loader approach, so this is something I definitely intend to cover in subsequent PRs regardless. However, for peace of mind, I added a TODO comment there.

@@ -0,0 +1,29 @@
from __future__ import annotations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_public sounds little bit weird. Maybe just call check_structs, etc. since we have common_structs and hence it will be uniform?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

@@ -0,0 +1,2 @@
recursive-include perun *.pyi
include perun/py.typed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this py.typed file? It is some stub? Can we add some comment why it is here and why it is empty? Can it contain like comment # Empty file needed for lazy_loading?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a requirement of PEP 561 and described also in mypy documentation. Packages that distribute both runtime and type stub files (.pyi files) need to contain a py.typed file as well to indicate support for type hints. The MANIFEST.in file is then needed for sdist distribution to include the .pyi and py.typed files. As the PEP does not specify what the py.typed files should contain and it is easy enough to find an explanation for the file online, I'd just keep it empty.

@JiriPavela
Copy link
Collaborator Author

  1. If I am looking right, view is not fully lazy loaded and it was quite a source of the problems (with holoviews/bokeh being quite huge). Is that WIP?

Right now, view modules are being lazy loaded the naive way, i.e., by local imports in lazy_get_cli_commands. As this PR is meant to be more of a hotfix, I didn't touch code that already achieved the same goal, albeit in another way. Subsequent PRs will aim to port the entire codebase to lazy_loader approach.

  1. Can you show some numbers? Like running perun help, perun status, perun init and maybe perun showdiff before and after? Just few runs and few numbers so we can compare how much we have saved, because it did come with a cost (worse readability, bigger complexity, and I fear it might be harder for IDEs to suggest).

I agree with the worse readability and bigger complexity. That is sadly the price of Python not having a native support for lazy loading. IDE, intellisense or autocomplete should not be affected, that's what the __getattr__, __dir__, __all__ = lazy.attach_stub(... in __init__.py files is there for. Nonetheless, some IDEs still run into trouble with it, but imports in the form of from perun import check as check usually solve it (see similar bugs with some other packages), although pylint complains about such import statements.

As for the numbers, I measured five runs of different commands and chose the median values:

  • perun --version: 2.414s vs 0.287s (8.4x speedup)
  • perun --help: 2.404s vs 0.294s (8.2x speedup)
  • perun init: 2.416s vs 0.296s (8.2x speedup)
  • perun import: 2.504s vs 0.539s (4.6x speedup)
  • perun showdiff (nontrivial input that contains a lot of processing): 4.478s vs 2.503s (1.8x speedup)

@JiriPavela JiriPavela merged commit 3a24630 into Perfexionists:devel Oct 6, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants