Skip to content

Refactor prompt generation and evaluation loop. #62

Open
@oliverchang

Description

@oliverchang

Currently our prompt generation is tied to our template format here: https://github.com/google/oss-fuzz-gen/tree/main/prompts/template_xml

We should make it easier and more flexible for others to test different prompt generation strategies, by allowing these custom prompts to be python modules instead that look something like the following:

def generate(benchmark:  Benchmark) -> str:
  ...

i.e. the module would be expected to define a generate function which produces a full prompt to pass to the LLM.

Similarly, we should also make our generation/evaluation loop more configurable, e.g. extract the logic here:

model.prompt_path = model.prepare_generate_prompt(

into a driver.py that can be similarly replaced:

def evaluate(model: models.LLM,  benchmark: Benchmark, prompt_generator: Module):
  prompt = generator.generate(benchmark)
  targets = generate_targets(model, prompt)
  results = evaluate(targets) 
  ...

And tying this all together, the resulting invocations would look something like:

./run_all_experiments --driver /path/to/driver.py --prompt_generator prompts/custom_generator.py

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions