Refactor prompt generation and evaluation loop.

Currently our prompt generation is tied to our template format here: https://github.com/google/oss-fuzz-gen/tree/main/prompts/template_xml

We should make it easier and more flexible for others to test different prompt generation strategies, by allowing these custom prompts to be python modules instead that look something like the following:

```python
def generate(benchmark:  Benchmark) -> str:
  ...
```

i.e. the module would be expected to define a `generate`  function which produces a full prompt to pass to the LLM.

Similarly, we should also make our generation/evaluation loop more configurable, e.g. extract the logic here: https://github.com/google/oss-fuzz-gen/blob/51a636bd78a995997d8813d44807ffc5a5a71be3/run_one_experiment.py#L247

into a driver.py that can be similarly replaced:

```python
def evaluate(model: models.LLM,  benchmark: Benchmark, prompt_generator: Module):
  prompt = generator.generate(benchmark)
  targets = generate_targets(model, prompt)
  results = evaluate(targets) 
  ...
```

And tying this all together, the resulting invocations would look something like:

```
./run_all_experiments --driver /path/to/driver.py --prompt_generator prompts/custom_generator.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor prompt generation and evaluation loop. #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor prompt generation and evaluation loop. #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions