Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add advanced source transformations to reduce type checking overhead #111

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jepler
Copy link
Member

@jepler jepler commented Jan 2, 2024

Add advanced source transformations to reduce type checking overhead

The new 'munge' module performs transformations on the source code. It uses the AST (abstract syntax tree) representation of Python code to recognize some idioms such as if STATIC_TYPING: and transforms them into alternatives that have zero overhead in mpy-compiled files (e.g., if STATIC_TYPING: is transformed into if 0:, which is eliminated at compile time due to mpy-cross constant-propagation and dead branch elimination)

The code assumes the input file is black-formatted. In particular, it would malfunction if an if-statement and its body are on the same line: if STATIC_TYPING: print("boo") would be incorrectly munged.

@jepler
Copy link
Member Author

jepler commented Jan 2, 2024

This fails on the community bundle due to invalid syntax that is accepted by circuitpython but not python3. Before now, library code was not actually required to be valid python3.

I filed a PR with the one affected lib but it's probably not likely to see timely action:
https://github.com/dastels/circuitPython_dotstar_featherwing/pull/1/files

@tannewt
Copy link
Member

tannewt commented Jan 17, 2024

Neat!

jepler added a commit to adafruit/CircuitPython_Community_Bundle that referenced this pull request Jun 16, 2024
The original maintainer has not responded to a needed PR over several
months, blocking adafruit/circuitpython-build-tools#111
@jepler
Copy link
Member Author

jepler commented Jun 17, 2024

Writing zero-overhead type checking for CircuitPython

We love type hints! They improve documentation as well as developer experience in Python-aware IDEs. Unfortunately, these additions can sometimes increase the size of the "mpy" files as well as runtime RAM usage. However, together with recent improvements in circuitpython-build-tools, almost all of the overhead can be eliminated during the bundle build process, providing that the type checks are written in the correct style.

Now, circuitpython-build-tools rewriting (or "munging") process does some specified transformations at the top level of the file (not inside functions or nested inside if/try/etc blocks):

  • from __future__ import ... is removed anywhere it appears
  • A try/except block that tries to import from typing (import typing, import typing as ..., from typing import ...) as its first statement is removed, but its first except: (or except ImportError or except Exception) block is executed in its place as an if 1: block
  • An if testing sys.implementation.name for equality/inequality with circuitpython is transformed into if 1: or if 0:
  • Testing if TYPE_CHECKING is transformed to if 0:

The mpy-cross process, or the byte-compiling process on a circuitpython device, can intelligently avoid doing most work within if <constant>: blocks, including not permanently storing string identifiers used only within the blocks or including them in the mpy file.

OK, given those transformations, what's the proper way to write type hints:

  • If desired, use from __future__ import annotations at the top of your file. Any from __future__ import statement is eliminated.
  • Place all typing-related imports in a single try/except block that begins import typing, import typing as ..., or from typing import ...
  • Have an except ImportError: block that either just says pass or, if you must refer to TYPE_CHECKING elsewhere, it should say TYPE_CHECKING=const(0). (no need to from micropython import const)
  • use if TYPE_CHECKING: freely

Almost all type annotations can be modified to follow the above rules.

The only important thing that the author knows of that is not zero-overhead is providing a do-nothing implementation of typing.cast which returns its val argument, which costs just a few bytes of bytecode.

The new 'munge' module performs transformations on the source code.
It uses the AST (abstract syntax tree) representation of Python code
to recognize some idioms such as `if STATIC_TYPING:` and transforms
them into alternatives that have zero overhead in mpy-compiled files
(e.g., `if STATIC_TYPING:` is transformed into `if 0:`, which is eliminated
at compile time due to mpy-cross constant-propagation and dead branch
elimination)

The code assumes the input file is black-formatted. In particular, it
would malfunction if an if-statement and its body are on the same line:
`if STATIC_TYPING: print("boo")` would be incorrectly munged.
@jepler jepler force-pushed the sophisticated-munge-typing-overhead branch from 2d263bb to 3002d23 Compare June 17, 2024 02:34
@jepler jepler force-pushed the sophisticated-munge-typing-overhead branch 2 times, most recently from 6416c08 to ac0e070 Compare June 17, 2024 02:43
@dhalbert
Copy link
Contributor

dhalbert commented Jun 17, 2024

This looks very nice. I have a worry: will this change the line numbers reported when there is an exception in the .mpy file, etc?

@jepler
Copy link
Member Author

jepler commented Jun 17, 2024

No, the code takes care to preserve line numbers.

@jepler
Copy link
Member Author

jepler commented Jun 17, 2024

the test case (note the lines are cut off):

    2                                      |  
    3   try:                               |  
    4       from typing import TYPE_CHECK  |  
    5   except ImportError:                |  if 1:
    6       pass                           |      pass
    7                                      |  
    8   try:                               |  
    9       from typing import TYPE_CHECK  |  
   10   except ImportError:                |  if 1:
   11       pass                           |      pass
   12                                      |  
   13                                      |  
   14   try:                               |  
   15       import typing                  |  
   16   except:                            |  if 1:
   17       pass                           |      pass
   18                                      |  
   19   try:                               |  
   20       import typing as T             |  
   21   except:                            |  if 1:
   22       pass                           |      pass
   23                                      |  
   24   __version__ = "0.0.0-auto"         |  __version__ = "1.2.3"
   25                                      |  
   26   if sys.implementation.name == "ci  |  if 1:
   27       print("is circuitpython")      |      print("is circuitpython")
   28                                      |  
   29   if sys.implementation.name != "ci  |  if 0:
   30       print("not circuitpython (1)"  |      print("not circuitpython (1)"
   31                                      |  
   32   if not sys.implementation.name ==  |  if 0:
   33       print("not circuitpython (2)"  |      print("not circuitpython (2)"

this script can either show the munged version of a Python program,
or show the diff.

Typical output:
```diff
$ circuitpython-munge src/bundle/libraries/helpers/requests/adafruit_requests.py --diff
--- src/bundle/libraries/helpers/requests/adafruit_requests.py
+++ src/bundle/libraries/helpers/requests/adafruit_requests.munged.py
@@ -33,7 +33,7 @@

 """

-__version__ = "0.0.0+auto.0"
+__version__ = "munged-version"
 __repo__ = "https://github.com/adafruit/Adafruit_CircuitPython_Requests.git"

 import errno
@@ -41,7 +41,7 @@

 import json as json_module

-if not sys.implementation.name == "circuitpython":
+if 0:
     from ssl import SSLContext
     from types import ModuleType, TracebackType
     from typing import Any, Dict, Optional, Tuple, Type, Union
```
@jepler
Copy link
Member Author

jepler commented Jun 17, 2024

It's lengthy, but here are the diffs for the adafruit bundle: https://gist.githubusercontent.com/jepler/8e9e477e84d65a81da36aa0db2eb4864/raw/db8354aea74f313e6e97b7543a06986cc59980ac/munge_changes.patch

It's nigh impossible to review that much repetitive junk but I can say I didn't spot anything problematic at least

@jepler
Copy link
Member Author

jepler commented Jun 17, 2024

Usage: circuitpython-munge [OPTIONS] INPUT [OUTPUT]

Options:
  --diff / --no-diff
  --munged-version TEXT
  --help                 Show this message and exit.

@jepler
Copy link
Member Author

jepler commented Jun 17, 2024

the core will need to learn how to "munge" files when it builds in frozen modules. the circuitpython-munge script would be helpful for that.

@tannewt tannewt requested a review from dhalbert June 21, 2024 17:39
@dhalbert
Copy link
Contributor

I forgot to follow up on this. Do we need a bunch of fix-ups on libraries, especially community ones, before we merge this?

@jepler
Copy link
Member Author

jepler commented Jul 25, 2024

I did my best to spot check the results, and didn't find any erroneous transformations. However, it's entirely possible there are some and I just didn't see them.

It's possible some libraries (both adafruit bundle & community bundle) can be improved to follow one of these specific formats but that can happen second.

If this waits until I return that's fine as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants