Evaluation
We want our code to be rigorously tested. We want unit tests, integration tests, and end-to-end tests. We want a solid CI pipeline, such that if GitHub Actions shows green, we can know that it's worth merging. In the event of something going awry, we need to be able to track what exact steps Stagehand took and why. When an prompting PR is opened, we need …
We want our code to be rigorously tested. We want unit tests, integration tests, and end-to-end tests. We want a solid CI pipeline, such that if GitHub Actions shows green, we can know that it's worth merging. In the event of something going awry, we need to be able to track what exact steps Stagehand took and why. When an prompting PR is opened, we need a number to go up to know if it's worth merging.