JSIR is a next-generation JavaScript analysis tool. At its core is an MLIR-based high-level intermediate representation, which supports both dataflow analysis and lossless conversion back to source. This unique design makes it suitable for source-to-source transformation.
JSIR is used at Google for analyzing and detecting malicious JavaScript files, protecting products like Ads, Android, and Chrome. Some example use cases are:
-
Signal extraction
JSIR is used for extracting syntactical and behavioral signals, which are fed into downstream security systems.
-
JSIR is used for detecting suspicious information flows, by utilizing its dataflow analysis capability.
-
Decompilation
JSIR is used for decompiling the Hermes bytecode all the way to JavaScript code, by utilizing its ability to be fully lifted back to source code.
-
Deobfuscation:
JSIR is used for deobfuscating JavaScript by utilizing its source-to-source transformation capability.
See our latest paper on how we combine the Gemini LLM and JSIR for deobfuscation.
Driven by the diverse use cases of malicious JavaScript analysis and detection, JSIR needs to achieve two seemingly conflicting goals:
-
It needs to be high-level enough to be lifted back to the AST, in order to support source-to-source transformation and decompilation.
-
It needs to be low-level enough to facilitate dataflow analysis, in order to support taint analysis, constant propagation, etc..
To achieve these goals, JSIR defines two dialects:
-
JSHIR:
This is a high-level IR that uses MLIR regions to accurately model control flow structures.
-
JSLIR:
This is a low-level IR that uses CFGs to represent branching behaviors. JSLIR adds extra operations to annotate the kind of original control flow structures. This allows JSLIR to be fully converted back to JSHIR.
See intermediate_representation_design.md for details.
We have only tested clang
on Linux:
# Install clang:
sudo apt update
sudo apt install clang
We use the Bazel
build system. It is recommended to use Bazelisk
to manage
Bazel
versions:
# Install Bazelisk through npm:
sudo apt install npm
sudo npm install -g @bazel/bazelisk
Note: The build takes a lot of storage space. If you run out of space, Bazel will return a cryptic error.
LLVM takes a long time to fetch and build. We can test if LLVM is properly included by building a part of it:
# This will fetch LLVM and build its support library:
bazelisk build @llvm-project//llvm:Support
To build JSIR:
# Build everything:
bazelisk build //...
# Or, build a single target:
bazelisk build //maldoca/js/ir:jsir_gen
# Or, build all targets in a directory:
bazelisk build //maldoca/js/ir/...
To run test cases:
# Run all tests:
bazelisk test //...
# Or, run a specific test:
bazelisk test //maldoca/js/quickjs:quickjs_test
# Or, run all tests under a directory:
bazelisk test //maldoca/js/ir/conversion/...
Convert a JavaScript source file to JSHIR:
bazelisk run //maldoca/js/ir:jsir_gen --\
--input_file=$(pwd)/maldoca/js/ir/conversion/tests/if_statement/input.js \
--passes=source2ast,ast2hir
-
Adversarial JavaScript Analysis with MLIR
Talk at LLVM Developers' Meeting 2024
-
CASCADE: LLM-Powered JavaScript Deobfuscator at Google
Paper about combining LLM + JSIR for JavaScript deobfuscation
This is not an official Google product.