Polyglot Piranha is a flexible multilingual structural search/replace engine that allows users to apply chains of interdependent structural search/replace rules for deeper cleanups. Polyglot Piranha builds upon tree-sitter queries for expressing the structural search/replace rules.
This repository contains the Polyglot Piranha framework and pre-built cleanup rules that can be leveraged for deleting code related to stale feature flags.
This is the higher level architecture of Polyglot Piranha.
At its heart, Polyglot Piranha is a structural find/replacement (rewrite) engine and pre-build language specific cleanup rules like - like simplifying boolean expressions, simplifying if-else
statements, deleting empty class, deleting files with no type declarations, inline local variables, and many more.
A user provides :
- A set (or, a graph) of structural find/replace rules
- Path to the code base
- Arguments to modify Piranha's behavior (like deleting associated comments).
When Piranha applies the set (or graph) of user defined rules, it triggers the pre-built language specific cleanup rules to do a deep cleanup. Below we can see an automatically generated graph for the Java pre-built cleanup rules.
Let's take an example, where we know for a fact that the expression exp.isTreated("SHOW_MENU")
always returns true
(i.e. the feature Show Menu is treated)
public String fooBar(boolean x) {
if(exp.isTreated("SHOW_MENU")|| x){
String menu = getMenu();
return menu;
}
return "";
}
To cleanup this code with Piranha, a user would have to write one rule to update the expressions like exp.isTreated("SHOW_MENU")
to true
and hook it to the pre-built boolean simplification rules. It would result in :
public String fooBar(boolean x) {
String menu = getMenu();
return menu;
}
Note how, user only specified the seed rule to update the expression to true, and Piranha simplified the disjunction (exp.isTreated("SHOW_MENU")|| x
=> true
), then removed the stale if condition and finally deleted the unreachable return statement (return "";
).
Let's say a user writes a piranha rule to delete an unused enum case (let's say LOW
). However, this enum case "co-incidentally" is the only enum case in this enum declaration.
enum Level {
LOW,
}
If the user hooks up this enum case deletion rule to the pre-built rules, it would not only delete the enum case (LOW
), but also the consequent empty enum declaration and also optionally delete the consequently empty compilation unit.
Let's take a canonical example of replacing Arrays.asList
with Collections.singletonList
, when possible.
This task involves two steps (i) Replacing the expression (ii) Adding the import statement for Collections
if absent (Assuming google java format takes care of the unused imports 😄).
However, Piranha does not contain pre-built rules to add such a custom import statements.
import java.util.ArrayList;
import java.util.Arrays;
+ import java.util.Collections;
class Character{
String name;
List<String> friends;
List<String> enemies;
Character(String name) {
this.name = name;
this.friends = new ArrayList<>();
- this.enemies = Arrays.asList(this.name);
+ this.enemies = Collections.singletonList(this.name);
}
}
For such a scenario a developer could first write a seed rule for replacing the expression and then craft a custom "cleanup" rule (that would be triggered by the seed rule) to add the import statement if absent within the same file.
Note a user can also craft a set of rules that trigger no other rule, i.e. use piranha as a simple structural find/replace tool
If you end up implementing a cleanup rule that could be useful for the community, feel free to make a PR to add it into the pre-built language specific rules
Polyglot Piranha can be used as a python library or as a command line tool.
pip install polyglot-piranha
Currently, we support one simple API (execute_piranha
), a simple python wrapper around Polyglot Piranha's CLI.
We believe this makes it easy to incorporate Piranha in "pipelining".
from polyglot_piranha import execute_piranha, PiranhaArguments
piranha_arguments = PiranhaArguments(
path_to_codebase = "...",
path_to_configurations = "...",
language= "java",
substitutions = {},
dry_run = False,
cleanup_comments = True
)
piranha_summary = execute_piranha(piranha_arguments)
The API execute_piranha
accepts a PiranhaArguments
An object of PiranhaArguments can be instantiated with the following arguments:
- (required)
path_to_codebase
(str
): Path to source code folder - (required)
path_to_configuration
(str
) : A directory containing files namedrules.toml
andedges.toml
rules.toml
: piranha rules expresses the specific AST patterns to match and replacement patterns for these matches (in-place). These rules can also specify the pre-built language specific cleanups to trigger.edges.toml
: expresses the flow between the rules
- (required)
language
(str
) : Target language (java
,py
,kt
,swift
,py
,ts
andtsx
) - (required)
substitutions
(dict
): Substitutions to instantiate the initial set of feature flag rules - (optional)
dry_run
(bool
) : Disables in-place rewriting of code - (optional)
cleanup_comments
(bool
) : Enables deletion of associated comments - (optional)
cleanup_comments_buffer
(usize
): The number of lines to consider for cleaning up the comments - (optional)
number_of_ancestors_in_parent_scope
(usize
): The number of ancestors considered whenPARENT
rules - (optional)
delete_file_if_empty
(bool
): User option that determines whether an empty file will be deleted - (optional)
delete_consecutive_new_lines
(bool
) : Replaces consecutive\n
s with a single\n
[Piranha_Output]
: a PiranhaOutputSummary
for each file touched or analyzed by Piranha. It contains useful information like, matches found (for match-only rules), rewrites performed, and content of the file after the rewrite. The content is particularly useful when dry_run
is passed as true
.
Get platform-specific binary from releases or build it from source following the below steps:
- Install Rust
git clone https://github.com/uber/piranha.git
cd piranha
cargo build --release
(cargo build --release --no-default-features
for macOS)- Binary will be generated under
target/release
Polyglot Piranha
A refactoring tool that eliminates dead code related to stale feature flags
Usage: polyglot_piranha [OPTIONS] --path-to-codebase <PATH_TO_CODEBASE> --path-to-configurations <PATH_TO_CONFIGURATIONS> -l <LANGUAGE>
Options:
-c, --path-to-codebase <PATH_TO_CODEBASE>
Path to source code folder or file
--include [<INCLUDE>...]
Paths to include (as glob patterns)
--exclude [<EXCLUDE>...]
Paths to exclude (as glob patterns)
-t, --code-snippet <CODE_SNIPPET>
Code snippet to transform [default: ]
-s <SUBSTITUTIONS>
These substitutions instantiate the initial set of rules. Usage : -s stale_flag_name=SOME_FLAG -s namespace=SOME_NS1
-f, --path-to-configurations <PATH_TO_CONFIGURATIONS>
Directory containing the configuration files - `rules.toml` and `edges.toml` (optional)
-j, --path-to-output-summary <PATH_TO_OUTPUT_SUMMARY>
Path to output summary json file
-l <LANGUAGE>
The target language [possible values: java, swift, py, kt, go, tsx, ts]
--delete-file-if-empty
User option that determines whether an empty file will be deleted
--delete-consecutive-new-lines
Replaces consecutive `\n`s with a `\n`
--global-tag-prefix <GLOBAL_TAG_PREFIX>
the prefix used for global tag names [default: GLOBAL_TAG.]
--number-of-ancestors-in-parent-scope <NUMBER_OF_ANCESTORS_IN_PARENT_SCOPE>
The number of ancestors considered when `PARENT` rules [default: 4]
--cleanup-comments-buffer <CLEANUP_COMMENTS_BUFFER>
The number of lines to consider for cleaning up the comments [default: 2]
--cleanup-comments
Enables deletion of associated comments
--dry-run
Disables in-place rewriting of code
--allow-dirty-ast
Allows syntax errors in the input source code
-h, --help
Print help
The output JSON is the serialization of- PiranhaOutputSummary
produced for each file touched or analyzed by Piranha.
It can be seen that the Python API is basically a wrapper around this command line interface.
In PolyglotPiranha, programs are graphs of match-replace rules that can be composed and chained.
Individual edits are represented as rules in Polyglot Piranha, where each rule matches and replaces a specific code snippet. A program in PolyglotPiranha should contain at least one rule with the following properties:
query
: A query to find the code pattern to refactorreplace_node
: The captured node in the query that will be replaced.replace_string
: Replacement string or pattern for the refactored code.holes
: Placeholders in your queries that will be instantiated at runtime.is_seed_rule
: Specifies whether this rule is an entry point for the rule graph.
Optionally, a rule can have filters. Piranha supports two kinds of filters:
enclosing_node
: A pattern that specifies the enclosing node of the rule.not_enclosing_node
: A pattern that should not match any parent of the main match.
The enclosing_node
and not_enclosing_node
filters can be refined using contains with specified [at_least, at_most]
bounds, as well as not_contains
.
The rule queries, and filters can be written in the following languages:
For a detailed understanding of the syntax, refer to the Tree-sitter Syntax Reference.
To create a rule in regex, prepend your query with rgx
.
For instance: rgx <your regex query>
. Piranha supports the regex syntax derived from the regex crate.
Piranha's Concrete Syntax is a custom rule language designed for matching and replacing code. Concrete Syntax operates at the parse tree level, similar to comby. The key difference is that it matches a parse tree node only if the entire parse tree can be traversed using the concrete syntax template.
To use concrete syntax, prepend the query with cs <your_query>
.
For example, to match the code snippet exp.isTreated("SHOW_MENU")
, you can use the following query cs :[object].isTreated(:[string])
[[rules]]
name = "your_rule_name"
query = """(
(method_invocation name: (_) @name
arguments: (argument_list) @args) @invk
(#eq? @name @method_name))
"""
replace_node = "invk"
replace = "foo @args"
holes = ["method_name"]
is_seed_rule = true
[[rules.filters]]
enclosing_node = "cs class MyClass { :[body] }"
Edges in Polyglot Piranha allow rules to depend on each other, establishing a hierarchy or sequence of application among rules. An edge essentially describes the direction of dependency between two or more rules. Edges are also represented in the TOML format.
Example edges in TOML:
[[edges]]
scope = "Method"
from = "your_rule_name"
to = ["other_rule_name", "another_rule_name"]
[[edges]]
scope = "Method"
from = "other_rule_name"
to = ["your_rule_name"]
We believe, the easiest way to get started with Piranha is to build upon the demos.
To setup the demo please follow the below steps:
git clone https://github.com/uber/piranha.git
cd piranha
- Create a virtual environment:
python3 -m venv .env
source .env/bin/activate
- Install Polyglot Piranha
pip install --upgrade pip
pip install .
to run demo against current source code (please install Rust, it takes less than a minute)- Or,
pip install polyglot-piranha
to run demos against the latest release.
Currently, we have demos for the following :
- run
python3 demo/stale_feature_flag_cleanup_demos.py
. It will execute the scenarios listed under demo/feature_flag_cleanup/java and demo/feature_flag_cleanup/kt. These scenarios use simple feature flag API. - In these demos the
configurations
contain :rules.toml
: expresses how to capture different feature flag APIs (isTreated
,enum constant
)piranha_arguments.toml
: expresses the flag behavior, i.e. the flag name and whether it is treated or not. Basically thesubstitutions
provided in thepiranha_arguments.toml
can be used to instantiate the rules reference.
- run
python3 demo/match_only_demos.py
- This demo also shows how the piranha summary output can be used.
rules.toml
: express how to capture two patterns - (i) invocation of the methodfooBar("...")
and invocation of the methodbarFoo("...")
(but only in non-static methods)
- run
python3 demo/find_replace_demos.py
- This demo shows how to use Piranha as a simple structural find/replace tool (that optionally hooks up to built-in cleanup rules)
- run
python3 demo/find_replace_custom_cleanup_demos.py
- This demo shows how to replace
new ArrayList<>()
withCollections.emptyList()
. Note it also adds the required import statement.
Please refer to our test cases at /test-resources/<language>/
as a reference for handling complicated scenarios
First, check if Polyglot Piranha supports Stale feature flag cleanup for the required language.
Then see if your API usage is similar to the ones shown in the demo (java-demo) or in the test resources (java-ff_system1, java-ff_system2, kt-ff_system1, kt-ff_system2).
If not :|, try to adapt these examples to your requirements. Further, you can study the tree-sitter query documentation to understand how tree-sitter queries work. It is recommended to read the section- Adding support for a new feature flag system
Then adapt the argument file as per your requirements. For instance, you may want to update the value corresponding to the @stale_flag_name
and @treated
. If your rules do not contain require other tags feel free to remove them from your arguments file. In most cases edges file is not required, unless your feature flag system API rules are inter-dependent.
More details for configuring Piranha - Adding support for a new feature flag system and Adding Cleanup Rules.
One can similarly build upon the other demos too.
To onboard a new feature flag system users will have to specify the <path-to-configurations>/rules.toml
and <path-to-configurations>/edges.toml
files (look here). The rules.toml
will contain rules that identify the usage of a feature flag system API. Defining edges.toml
is required if your feature flag system API rules are inter-dependent.
For instance, you want to delete a method declaration with specific annotations and then update its usages with some boolean value.
Please refer to the test-resources/java
for detailed examples.
The example below shows a usage of a feature flag API (experiment.isTreated(STALE_FLAG)
), in a if_statement
.
class PiranhaDemo {
void demoMethod(ExperimentAPI experiment){
// Some code
if (experiment.isTreated(STALE_FLAG)) {
// Do something
} else {
// Do something else
}
// Do other things
}
}
In the case when STALE_FLAG is treated, we would expect Piranha to refactor the code as shown below (assuming that STALE_FLAG
is treated) :
class PiranhaDemo {
void demoMethod(ExperimentAPI experiment){
// Some code
// Do something
// Do other things
}
}
This can be achieved by adding a rule in the input_rules.toml
file (as shown below) :
[[rules]]
name = "Enum Based, toggle enabled"
query = """((
(method_invocation
name : (_) @n1
arguments: ((argument_list
([
(field_access field: (_)@f)
(_) @f
])) )
) @mi
)
(#eq? @n1 "isTreated")
(#eq? @f "@stale_flag_name")
)"""
replace_node = "mi"
replace = "@treated"
groups = [ "replace_expression_with_boolean_literal"]
holes = ["treated", "stale_flag_name"]
This specifies a rule that matches against expressions like exp.isTreated(SOME_FLAG_NAME)
and replaces it with true
or false
.
The query
property of the rule contains a tree-sitter query that is matched against the source code.
The node captured by the tag-name specified in the replace_node
property is replaced with the pattern specified in the replace
property.
The replace
pattern can use the tags from the query
to construct a replacement based on the match (like regex-replace).
Each rule also contains the groups
property, that specifies the kind of change performed by this rule. Based on this group, appropriate
cleanup will be performed by Piranha. For instance, replace_expression_with_boolean_literal
will trigger deep cleanups to eliminate dead code (like eliminating consequent
of a if statement
) caused by replacing an expression with a boolean literal.
Currently, Piranha provides deep clean-ups for edits that belong the groups - replace_expression_with_boolean_literal
, delete_statement
, and delete_method
. Basically, by adding an appropriate entry to the groups, a user can hook up their rules to the pre-built cleanup rules.
Setting the is_seed_rule=False
ensures that the user defined rule is treated as a cleanup rule not as a seed rule (For more details refer to demo/find_replace_custom_cleanup
).
A user can also define exclusion filters for a rule (rules.filters
). These filters allow matching against the context of the primary match. For instance, we can write a rule that matches the expression new ArrayList<>()
and exclude all instances that occur inside static methods (For more details, refer to the demo/match_only
).
At a higher level, we can say that - Piranha first selects AST nodes matching rules.query
, excluding those that match any of the rules.filters.not_contains
(within rules.filters.enclosing_node
). It then replaces the node identified as rules.replace_node
with the formatted (using matched tags) content of rules.replace
.
The rule
contains holes
or template variables that need to be instantiated.
For instance, in the above rule @treated
and @stale_flag_name
need to be replaced with some concrete value so that the rule matches only the feature flag API usages corresponding to a specific flag, and replace it specifically with true
or false
. To specify such a behavior,
user should create a piranha_arguments.toml
file as shown below (assuming that the behavior of STALE_FLAG is treated):
language = ["java"]
substitutions = [
["stale_flag_name", "STALE_FLAG"],
["treated", "true"]
]
This file specifies that, the user wants to perform this refactoring for java
files.
The substitutions
field captures mapping between the tags and their corresponding concrete values. In this example, we specify that the tag named stale_flag_name
should be replaced with STALE_FLAG
and treated
with true
.
This section describes how to configure Piranha to support a new language. Users who do not intend to onboard a new language can skip this section.
This section will describe how to encode cleanup rules that are triggered based on the update applied to the flag API usages.
These rules should perform cleanups like simplifying boolean expressions, or if statements when the condition is constant, or deleting empty interfaces, or in-lining variables.
For instance, the below example shows a rule that simplifies a or
operation where its RHS
is true.
[[rules]]
name = "Or - right operand is True"
query = """(
(binary_expression
left : (_)*
operator:"||"
right: (true)
) @binary_expression)"""
replace_node = "binary_expression"
replace = "true"
Currently, Piranha picks up the language specific configurations from src/cleanup_rule/<language>
.
Let's consider an example where we want to define a cleanup for the scenario where
Before | After |
int foobar(){
boolean x = exp.isTreated(SOME_STALE_FLAG);
if (x || someCondition()) {
return 100;
}
return 0;
} |
int foobar(){
return 100;
} |
We would first define flag API rules as discussed in the section Adding Support for a new language. Assuming this rule replaces the occurrence of the flag API corresponding to SOME_STALE_FLAG
with true
; we would have to define more cleanup rules as follows:
R0
: Deletes the enclosing variable declaration (i.e.x
) (E.g. java-rules:delete_variable_declarations
)R1
: replace the identifier with the RHS of the deleted variable declaration, within the body of the enclosing method whereR0
was applied i.e. replacex
withtrue
within the method body offoobar
. (E.g. java-rules:replace_expression_with_boolean_literal
)R2
: simplify the boolean expressions, for example replacetrue || someCondition()
withtrue
, that encloses the node whereR1
was applied. (E.g. java-rules:true_or_something
)R3
: eliminate the enclosing if statement with a constant condition whereR2
was applied (if (true) { return 100;}
→return 100;
). E.g. java-rules:simplify_if_statement_true, remove_unnecessary_nested_block
R4
: eliminate unreachable code (return 0;
inreturn 100; return 0;
) in the enclosing block whereR3
was applied. (E.g. java-rules:delete_all_statements_after_return
)
The fact that R2
has to be applied to the enclosing node where R1
was applied, is expressed by specifying the edges.toml
file.
To define how these cleanup rules should be chained, one needs to specify edges (e.g. the java-edges file) between the groups and (or) individual rules.
The edges can be labelled as Parent
, Global
or even much finer scopes like Method
or Class
(or let's say functions
in go-lang
).
- A
Parent
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules on any ancestor of"n2"
(e.g.R1
→R2
,R2
→R3
,R3
→R4
) - A
Method
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules within the enclosing method's body. (e.g.R0
→R1
) - A
Class
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules within the enclosing class body. (e.g. in-lining a private field) - A
Global
edge implies that after Piranha applies the"from"
rule to update the noden1
in the AST to noden2
, Piranha tries to apply"to"
rules in the entire code base. (e.g. in-lining a public field).
scope_config.toml
file specifies how to capture these fine-grained scopes like method
, function
, lambda
, class
.
First decide, what scopes you need to capture, for instance, in Java we capture "Method" and "Class" scopes. Once, you decide the scopes construct scope query generators similar to java-scope_config. Each scope query generator has two parts - (i) matcher
is a tree-sitter query that matches the AST for the scope, and (ii) generator
is a tree-sitter query with holes that is instantiated with the code snippets corresponding to tags when matcher
is matched.
Visualizing rules, groups and their edges through a graph is a great way to understand how Piranha Polyglot works.
We use Graphviz to generate a .dot file and a .svg image generated by visualize_rules_graph.py
.
Please follow the instructions to install Graphviz at https://graphviz.org/download/.
Moreover, the script also needs the toml
and graphviz
python packages.
The script takes as first argument a path for the generated .dot file.
The following arguments are paths for directories containing rules.toml
and edges.toml
files.
Optionally, you can provide a --title
argument to give a title to the generated graph.
To generate the .dot file and the .svg image used in this README (assuming a Python venv and a valid Graphviz installation):
pip install toml
pip install graphviz
python visualize_rules_graph.py ./java_prebuilt_rules.dot src/cleanup_rules/java --title "Java pre-built cleanup rules"
To generate an image for java-ff_system1 in our tests:
python visualize_rules_graph.py ./java-ff_system1.dot src/cleanup_rules/java test-resources/java/feature_flag_system_1/control/configurations --title "Java Test Feature Flag Cleanup System 1"
The purpose of Piranha Arguments is determining the behavior of Piranha.
language
: The programming language used by the source codesubstitutions
: Seed substitutions for the rules (if any). In case of stale feature flag cleanup, we pass the stale feature flag name and whether it is treated or not.delete_file_if_empty
: enables delete file if it consequently becomes emptydelete_consecutive_new_lines
: enables deleting consecutive empty new linecleanup_comments
: enables cleaning up the comments associated to the deleted code elements like fields, methods or classescleanup_comments_buffer
: determines how many lines above to look up for a comment.
Prerequisites:
- Install pre-commit
- Install taplo
- We name the rules in the format - _<ast_kind>. E.g.,
delete_method_declaration
orreplace_expression with_boolean_literal
- We name the dummy rules in the format -
<ast_kind>_cleanup
E.g.statement_cleanup
orboolean_literal_cleanup
. Using dummy rules (E.g. java-rules:boolean_literal_cleanup
) makes it easier and cleaner when specifying the flow between rules.
Currently we maintain
- Unit tests for the internal functionality can be found under
<models|utilities>/unit_test
. - End-to-end tests for the configurations execute Piranha on the test scenarios in
test-resources/<language>/input
and check if the output is as expected (test-resources/<language>/expected_treated
andtest-resources/<language>/expected_control
).
To add new scenarios to the existing tests for a given language, you can add them to new file in the input
directory and then create similarly named files with the expected output in expected_treated
and expected_control
directory.
Update the piranha_arguments_treated.toml
and piranha_arguments_control.toml
files too.
To add tests for a new language, please add a new <language>
folder inside test-resources/
and populate the input
, expected_treated
and expected_control
directories appropriately.