Skip to content

Commit

Permalink
JVM: Revamp JVM prompts (#396)
Browse files Browse the repository at this point in the history
This PR is the first of a list of PRs to revamp the JVM prompts to make
the LLM model generate better fuzzing harnesses. This PR targets the
base problem description and base requirements of the JVM prompts.

Signed-off-by: Arthur Chan <[email protected]>
Co-authored-by: DavidKorczynski <[email protected]>
  • Loading branch information
arthurscchan and DavidKorczynski authored Jul 1, 2024
1 parent abaf243 commit 6ad3b9f
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 75 deletions.
21 changes: 8 additions & 13 deletions llm_toolkit/prompt_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,13 +599,6 @@ def _get_template(self, template_file: str) -> str:
with open(template_file) as file:
return file.read()

def _format_base(self) -> str:
"""Formats a priming based on the prompt template."""
base = self._get_template(self.base_template_file)
base = base.replace("{PROJECT_NAME}", self.project_name)
base = base.replace("{PROJECT_URL}", self.project_url)
return base

def _format_target_constructor(self, signature: str) -> str:
"""Formats a constructor based on the prompt template."""
class_name = signature.split('].')[0][1:]
Expand Down Expand Up @@ -725,7 +718,8 @@ def _format_source_reference(self, signature: str) -> Tuple[str, str]:

def _format_problem(self, signature: str) -> str:
"""Formats a problem based on the prompt template."""
problem = self._get_template(self.problem_template_file)
base = self._get_template(self.base_template_file)
problem = base + self._get_template(self.problem_template_file)
problem = problem.replace('{TARGET}', self._format_target(signature))
problem = problem.replace('{REQUIREMENTS}',
self._format_requirement(signature))
Expand All @@ -736,12 +730,14 @@ def _format_problem(self, signature: str) -> str:
problem = problem.replace('{SELF_SOURCE}', self_source)
problem = problem.replace('{CROSS_SOURCE}', cross_source)

problem = problem.replace("{PROJECT_NAME}", self.project_name)
problem = problem.replace("{PROJECT_URL}", self.project_url)

return problem

def _prepare_prompt(self, base: str, final_problem: str):
def _prepare_prompt(self, prompt_str: str):
"""Constructs a prompt using the parameters and saves it."""
self._prompt.add_priming(base)
self._prompt.add_problem(final_problem)
self._prompt.add_priming(prompt_str)

def _has_generic(self, arg: str) -> bool:
"""Determine if the argument type contains generic type."""
Expand All @@ -762,9 +758,8 @@ def build(self,
Ignore target_file_type, project_example_content
and project_context_content parameters.
"""
base = self._format_base()
final_problem = self._format_problem(function_signature)
self._prepare_prompt(base, final_problem)
self._prepare_prompt(final_problem)
return self._prompt

def build_fixer_prompt(self, benchmark: Benchmark, raw_code: str,
Expand Down
16 changes: 6 additions & 10 deletions prompts/template_xml/jvm_base.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
<system>
You are a security testing engineer who wants to write a Java program to execute all lines in a given method by defining and initializing its parameters and necessary objects in a suitable way before fuzzing the method with the Java Jazzer framework from Code Intelligence. The source code of the Jazzer framework could be found in the github repository url: https://github.com/CodeIntelligenceTesting/jazzer.

The target method is belonging to the Java project {PROJECT_NAME} ({PROJECT_URL}).

You are a security testing engineer who wants to write a Java program to execute all lines in a given method by defining and initialising its parameters and necessary objects in a suitable way before fuzzing the method with the Java Jazzer framework from Code Intelligence. The source code of the Jazzer framework could be found in the github repository url: https://github.com/CodeIntelligenceTesting/jazzer.
Carefully study the method signature and its parameters, then follow the example problems and solutions to answer the final problem. YOU MUST call the target method to fuzz in the solution.

Try as many variations of these inputs as possible. Do not use random number generator classes or methods such as <code>java.lang.Random</code> class.

The generated fuzzing harness should be wrapped with the <java_code> tag.
</system>
The <target> tag contains information of the target method to invoke.
The <arguments> tag contains information of each of the target method arguments.
The <exceptions> tag contains a list of exceptions thrown by the target method that you MUST catch.
The <constructor> tag contains constructor or method call details you MUST use to create the needed object before calling the target method.
The <requirement> tag contains additional requirements that you MUST follow for this code generation.
3 changes: 3 additions & 0 deletions prompts/template_xml/jvm_problem_constructor.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
<target>
The target method is the constructor of {CONSTRUCTOR_CLASS} with the following signature.
<constructor_signature>
{CONSTRUCTOR_SIGNATURE}
</constructor_signature>
The constructor signature follows the format of <code>[Full qualified name of the class].(method_arguments)</code>.
For example, for the constructor of class <code>Test</code> of package <code>org.test</code> which takes in a single integer would have the following signature:
<code>[org.test.Test].<init>(int)</code>
The target method is belonging to the Java project {PROJECT_NAME} ({PROJECT_URL}).
Here is the list of arguments of the target constructor with descriptions.
<arguments>
{ARGUMENTS}
Expand All @@ -15,3 +17,4 @@ Here is the source code of the target constructor for reference.
</code>
Here is a list of source codes of methods that directly invoke the target consturctor for reference.
{CROSS_SOURCE}
</target>
3 changes: 3 additions & 0 deletions prompts/template_xml/jvm_problem_method.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
<target>
<method_signature>
{METHOD_SIGNATURE}
</method_signature>
The method signature follows the format of <code>[Full qualified name of the class].method_name(method_arguments)</code>.
For example, for a method <code>test</code> in class <code>Test</code> of package <code>org.test</code> which takes in a single integer would have the following method signature:
<code>[org.test.Test].test(int)</code>
The target method is belonging to the Java project {PROJECT_NAME} ({PROJECT_URL}).
Here is the list of arguments of the target method with descriptions.
<arguments>
{ARGUMENTS}
Expand All @@ -14,3 +16,4 @@ Here is the source code of the target method for reference.
</code>
Here is a list of source codes of methods that directly invoke the target method for reference.
{CROSS_SOURCE}
</target>
63 changes: 11 additions & 52 deletions prompts/template_xml/jvm_requirement.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
<requirements>
<item>Carefully study the method signature and its parameters, then follow the example problems and solutions to answer the final problem. YOU MUST call the target method to fuzz in the solution.</item>
<item>Try as many variations of these inputs as possible. Do not use random number generator classes or methods such as <code>java.lang.Random</code> class.</item>
<item>Try as many variations of these inputs as possible.</item>
<item>Try creating the harness as complex as possible.</item>
<item>Try adding some nested loop to invoke the target method for multiple times.</item>
<item>The generated fuzzing harness should be wrapped with the <java_code> tag.</item>
<item>NEVER use any methods from the <code>java.lang.Random</code> class in the generated code.</item>
<item>NEVER use any classes or methods in the <code>java.lang.reflect</code> package in the generated code.</item>
<item>NEVER use the @FuzzTest annotation for specifying the fuzzing method.</item>
<item>Please avoid using any multithreading or multi-processing approach.</item>
<item>Please add import statements for necessary classes, except for classes in the java.lang package.</item>
<item>You must create the object before calling the target method.</item>
<item>Do not create new variables with the same names as existing variables.
WRONG:
<code>
Expand Down Expand Up @@ -30,56 +38,6 @@ public class Fuzz {
}
</code></item>
<item>
Please ensure the newest version found from the Jazzer's github repository is used.
The Javadoc of the newest version of Jazzer framework could be found in https://codeintelligencetesting.github.io/jazzer-docs/jazzer-api
</item>
<item>Please use Fuzz as the Java class name.</item>
<item>Please use <code>public static void fuzzerTestOneInput(FuzzedDataProvider)</code> as the signature for the static fuzzing method.</item>
<item>Please use the name fuzzerInitialize for the static method to initialize before fuzzing the target method.</item>
<item>Please use the name fuzzerTearDown for the static method to tear down the settings after the target method is called.</item>
<item>Please add import statements for necessary classes, except for classes in the java.lang package.</item>
<item>Please avoid using the @FuzzTest annotation for specifying the fuzzing method.</item>
<item>Please avoid using any methods or classes in the <code>java.lang.reflect</code> package.</item>
<item>Please adds try catch block to catch possible <code>RuntimeException</code>.</item>
<item>Please avoid catching the general Exception, Throwable or Error object. Instead, they should be thrown from the fuzzerTestOneInput method.</item>
<item>If the target method is a static method, please avoid create the object before invocation.</item>
<item>Please create the necessary objects only if the target method is an instance method.</item>
<item>
When it is necessary to create an instance of a class, try use the public accessible constructors of that class first.
If no public accessible constructors are existed, try to search for public accessible constructors from any classes that extends or implements that needed class.
If no public accessible constructors are found from the target classes or all its subclasses, then try to search for static methods in the project that returns
an object matching the class of the neededinstance and invoke them to get the needed isntance.
</item>
<item>Always try to generate random data for arguments used to create fuzzing objects or used during target method invocation.</item>
<item>Please ensure the random fuzzing data is provided by the FuzzedDataProvider class and are initialized before the target method is invoked.</item>
<item>Try using static methods or constructors to create other types of arguments required for necessary object creations or invoking the target method.</item>
<item>
If <code>java.lang.Class</code> is needed as arguments, randomly choose any objects and invoke its <code>java.lang.Object::getClass()</code> methods.
</item>
<item>
If <code>java.lang.Object</code> is needed as arguments, randomly choose any objects and directly passed it as the arguments. If creation of the objects is needed,
use static or instance methods in the projects that give correct object types. Constructors of the needed methods can also be used to create the needed objects.
</item>
<item>
If any arguments of the target methods requires a return value from method calls, always stored the return value as local variable and pass in the local variable.
Never make an inner method call during target method invocation. This includes getting random primitive variables from FuzzedDataProvider.
</item>
<item>Please avoid using <code>getObject()</code> from any classes.</item>
<item>Please avoid using any multi-threading or multi-processing approach.</item>
<item>Please avoid using more than 3 layer of inner loopings.</item>
<item>Please manually call <code>System.gc()</code> in suitable locatations if too much resources are being created.</item>
<item>The sample fuzzing harness should be wrapped with the <java_code> tag.</item>
<item>Consult the following list for reference of the target method.
<list>
<item>The javadoc of the target method.</item>
<item>The examples and README files in the github repository.</item>
<item>The junit test cases from the github repository.</item>
<item>Try looking for test directories to locate junit test cases.</item>
<item>Try looking for doc / docs directories or mark down files for documentations and examples.</item>
<item>Try to locate if any other methods in the projects does call the target method.</item>
</list>
</item>
<item>
Here is a list of classes and their fully qualified name. You must import all classes by their fully qualified name.
For example, if the full qualified name of a class is <code>abc.def.ghi</code>, then you must add an import
statement <code>import abc.def.ghi;</code> in the result code.
Expand All @@ -89,6 +47,7 @@ contains the fully qualified name of the given class.
<list>
<item><class_name>FuzzedDataProvider</class_name><full_class_name>com.code_intelligence.jazzer.api.FuzzedDataProvider</full_class_name></item>
{IMPORT_MAPPINGS}
</item>
</list>
</requirements>

0 comments on commit 6ad3b9f

Please sign in to comment.