-
Notifications
You must be signed in to change notification settings - Fork 1
Write | Prompts
Hi, professional community.
Prompt development is a distinctive new dimension of software development.
This philosophy applies to all software scales, from a huge enterprise transformation to an open source hobby.
Prompt development is a new kind of development activity. It's tempting to build prompts while building code: write, test, and evaluate them as needed. It even seems agile and TDD.
But prompts are too important to build just-in-time. Incorrect or unexpected behavior threatens your entire software logic.
Happily, prompts are easy and cheap to develop independently, prior to the code base, to prove out ideas and assess feasibility.
More good news! The time investment pays double dividends. Early on, we confirm behavior for hallucinations, security, guardrails. Now these evaluations are plug-and-play ready to help where the stakes are highest: monitoring prod. Literally evaluate, with the same metrics, the production system.
Begin experimenting with vendors, models, and specific prompts during low-res wire-frame development. It's the same activity: clarify the vision into its first concrete form, in order to drive feedback and prove out broad feasibility. Start prompting way earlier than you start coding.
Buy or build enough scaffolding to make and evaluate a lot of prompts. Consider generating synthetic data. Three goals:
- Prove major assumptions - Simulate exactly the prompt, its context, RAG inputs, on exactly the vendor and model you plan to use. Build evaluations and metrics to confirm what you need and make everything pass.
- Set security foundation - Explicitly test, synthesize, red team, etc. the security scenarios. Confirm PII, PHI, compliance, out-of-scope user behavior, anything high-stakes.
- Estimate cost - Produce more reliable, real-world data about cost per query, per user, per conversion, per month, etc. Load test: confirm on your exact stack, your requirements for concurrency, latency, throughput, etc.
Notice, without a line of code we can already identify our cheapest LLM which is secure and performant. We can already tell stakeholders this is feasible and its cost.
Going to prod, your evals are by now extremely valuable and thorough, having incorporated learnings from development and testing, and being in CI/CD.
Use these great metrics to watch prod.
Use any architecture. Functions are fine. Microservices are fine. A local script and rsync is fine for personal projects.
Start anywhere. You can eval logs offline; you can take a random sample; you can be on high alert for all premium users or high-stakes use cases.
Use your own evals in prod! Yes, many tools can help here. But they should augment the home-grown evals, not replace them. Home-grown has accumulated so much value.
You can save time, cost, and risk by starting on prompts earlier than you might think.