Modifications from original SPAT
- Now uses java18
Steps are the same as below except for the argument PathofJre
. It is replaced
by the path of lib
. An example is "/usr/lib/jvm/java-18-openjdk-amd64/lib".
This path can be found with whereis java
and tracing to the directory of the
original binary (instead of a symlink). The library folder is usually a sibling
directory of the directory that contains the binary.
Eclipse is used to develop and build the project. Click "File > Export" and select the option "Runnable JAR file". Use the "Noargs - RuleWriter" launch configuration and keep everything else as default. Click finish. The resulting .jar file should be saved in the "artifacts" folder
See run.sh.
Semantic-and-Naturalness Preserving Auto Transformation. This tool is a source-to-source transformation tool that can deal with partial code snippets (programs without dependency information). The transformed code will be semantic-equivalent to the original ones, as well as syntax-naturalness-preserving.
We have currently verified it on Windows10.
This project is developed in "Eclipse IDE for RCP and RAP Developers". If you want to play with the code, please use the same IDE. Starting with the "src/spat/RuleSelector.java" will bring you a nice view of the whole project.
We have produced a runnable jar file already in "artifacts".
To use this tool, simply type the followed command:
java -jar SPAT.jar [RuleId] [RootDir] [OutputDir] [PathofJre] \& [PathofotherDependentJar]
[RuleId] is the transformation rule you want to adopt.
[RootDir] is the root directory path in which you put all your code snippets to be transformed. each ".java'' file is regarded as a code snippet. Each file should contain one Java class. For method-level code snippets, users need to warp each method with a "foo'' class.
[OutputDir] is the directory path where you want to store the transformed code snippets.
[PathofJre] is the path of rt.jar (usually placed in ".../jre1.x.x_xxx/lib/''})
[PathofotherDependentJar] is optional, one can use it to specify additional dependent libraries.
For example,
java -jar .\artifacts\SPAT.jar 5 .\Benchmarks\9133\Original .\Benchmarks\9133\transformed\_5 C:\Program Files\Java\jre1.8.0_221\lib\rt.jar
This command will transform all java files under the ".\Benchmarks\9133\Original" path by the transformation rule 5 "ConditionalExp2SingleIF" to the path ".\Benchmarks\9133\_5". The only dependency is the rt.jar (java runtime).
Replace the local variables' identifiers with new non-repeated identifiers.
Replace the for statement with an semantic-equivalent while statement.
Replace the while statement with an semantic-equivalent for statement.
Switch the two code blocks in the if statement and the corresponding else statement.
Change a single if statement into a conditional expression statement.
Change a conditional expression statement into a single if statement.
Change the assignment
Change the assignment
Divide a infix expression into two expressions whose values are stored in temporary variables.
Divide a if statement with a compound condition (
Switch the places of two adjacent statements in a code block, where the former statement has no shared variable with the latter statement.
Replace the if-continue statement in a loop block with if-else statement.
Merge the declaration statements into a single composite declaration statement.
Divide the composite declaration statement into separated declaration statements.
Switch the two expressions on both sides of the infix expression whose operator is
Switch the two expressions of the String.equal function, such as '123'.equals(x) -> x.equals('123').
Divide the pre-or-post expression into two seperated expressions.
Change the Switch-Case statements into If-Else statements.
The Educoder code clone dataset. In the "records.txt" file, each record is a triple (file1,file2,label). For example, (file1,file2,-1) means that it is not a clone, otherwise it is a clone.
The 9133 benchmark is selected from BCB benchmark, we use the 9133 instances to evaluate the syntax naturalness, applicability, and speed of each transformation rule.
This dataset is used to train the Neural Probabilistic Language Model (see below).
- The Neural Probabilistic Language Model https://github.com/chiaminchuang/A-Neural-Probabilistic-Language-Model
- Code2vec https://github.com/tech-srl/code2vec
- DeepCom and Hybrid-DeepCom https://github.com/xing-hu/EMSE-DeepCom
- The dataset of DeepCom https://github.com/xing-hu/DeepCom
- ASTNN https://github.com/zhangj111/astnn
- TBCCD https://github.com/yh1105/datasetforTBCCD
- Jobfuscate https://www.duckware.com/jobfuscate/index.html
Shiwen Yu, Ting Wang, Ji Wang, "Data Augmentation by Program Transformation." Journal of Systems and Software (JSS 2022). (under JSS open science, the preprint pdf can be checked in ".\paper")
Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, Xiangke Liao, “Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding.” 44th International Conference on Software Engineering (ICSE 2022)