You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current JIT (Just-In-Time) infrastructure is fragile because it embeds C++ template code as Python strings. This design introduces redundancy and complicates maintenance, since we must keep two copies of the same C++/Python binding code:
One version stored as a Python string for JIT.
Another version stored in .cpp files for AOT (Ahead-Of-Time) compilation.
Whenever we change the interface, both copies must be updated, which is error-prone. Additionally, we cannot leverage syntax highlighting or other development tools (like linting) for the embedded JIT code.
Proposed Solution
Per discussions with @lsrcz, we realized that we are not using advanced Jinja features in our JIT code. The few string substitutions that we do require can be implemented with C macros. By switching to a macro-based approach, we can unify the JIT and AOT code bases:
Shared C++ Source
Use the same .cpp and .h files for both AOT and JIT modes.
Macro-Generated Headers
JIT Mode: Generate a header file that defines constant expressions for each JIT instance.
AOT Mode: Generate a header file that dispatches between different parameters or attention variants.
Variant and Parameter Definitions
Both attention variant definitions and additional parameters can be expressed as macros generated by Python. This approach allows us to maintain a single source of truth for all configurations while still supporting both compilation modes.
This macro-based design ensures we have only one copy of the code, which simplifies maintenance, reduces errors, and makes debugging easier by allowing syntax highlighting and other compiler-based tooling.
Implementation Details
Below are two illustrative headers: one for JIT mode (showing generated constants) and one for AOT mode (showing parameter dispatch). Both rely on the same C++ code for the core functionality.
ATTENTION_VARIANT can be overridden to provide different attention implementations. ADDITIONAL_PARAMS_DECL can be defined or left empty depending on the mode (JIT or AOT).
Conclusion
By removing the Python-string-based Jinja templates and transitioning to a C macro–based approach, we eliminate duplicate code for JIT and AOT, reduce the possibility of interface mismatch, and regain the benefits of compiler tooling (e.g., syntax highlighting, linting). We believe this unified approach will be easier to maintain, more robust, and simpler to extend in the future.
Background and Motivation
Our current JIT (Just-In-Time) infrastructure is fragile because it embeds C++ template code as Python strings. This design introduces redundancy and complicates maintenance, since we must keep two copies of the same C++/Python binding code:
.cpp
files for AOT (Ahead-Of-Time) compilation.Whenever we change the interface, both copies must be updated, which is error-prone. Additionally, we cannot leverage syntax highlighting or other development tools (like linting) for the embedded JIT code.
Proposed Solution
Per discussions with @lsrcz, we realized that we are not using advanced Jinja features in our JIT code. The few string substitutions that we do require can be implemented with C macros. By switching to a macro-based approach, we can unify the JIT and AOT code bases:
Shared C++ Source
Use the same
.cpp
and.h
files for both AOT and JIT modes.Macro-Generated Headers
Variant and Parameter Definitions
Both attention variant definitions and additional parameters can be expressed as macros generated by Python. This approach allows us to maintain a single source of truth for all configurations while still supporting both compilation modes.
This macro-based design ensures we have only one copy of the code, which simplifies maintenance, reduces errors, and makes debugging easier by allowing syntax highlighting and other compiler-based tooling.
Implementation Details
Below are two illustrative headers: one for JIT mode (showing generated constants) and one for AOT mode (showing parameter dispatch). Both rely on the same C++ code for the core functionality.
JIT Header
AOT Header
In this approach:
ATTENTION_VARIANT
can be overridden to provide different attention implementations.ADDITIONAL_PARAMS_DECL
can be defined or left empty depending on the mode (JIT or AOT).Conclusion
By removing the Python-string-based Jinja templates and transitioning to a C macro–based approach, we eliminate duplicate code for JIT and AOT, reduce the possibility of interface mismatch, and regain the benefits of compiler tooling (e.g., syntax highlighting, linting). We believe this unified approach will be easier to maintain, more robust, and simpler to extend in the future.
cc @lsrcz, @hyhieu for visibility and feedback.
The text was updated successfully, but these errors were encountered: