Vanna trulens performance metrics #238

samoliverschumacher · 2024-02-07T06:44:11Z

This PR adds a script to support improving the performance (accuracy, cost and latency) of a vanna app.

The Problem;

The various components and prompts contribute to performance, but it's not clear how each of these impact it.
Making improvements means changing something, then manually assessing the new outputs. This is not a scalable way of evaluating.

Context;

vn.ask() carries out RAG in multiple steps that can all be optimised;

Retrieve examples of 3 different data types (SQL, DDL etc.)
- parameters: embedding model chosen, retrieval system, retrieval parameters
Connects to LLM model
- parameters: model chosen, fine-tune vs not.
Prompts the LLM about each of these in different ways.

Further improvements to vanna in the future could open up even more possibilities like;

Self-corrective systems like diagnosing the SQL error and retry the database call.
Chain of thought reasoning for complex questions
Multi-hop programs for complex SQL generation i.e. "break a question into multiple SQL sub-queries to validate a hypothesised correct SQL".

The solution;

A script implements trulens-eval that allows configuration of what is to be evaluated, and how. It presents the results in a dashboard (see the doc for visuals)

Evaluation of the system using TruLens allows evaluation without changing vanna (just adding a log to the vanna model). Alternatives could be to include evaluation in the app's code itself, this might require major refactoring to decouple the vanna components.

Other evaluation frameworks exist, though not many as of yet.

Tests performed

Manual/hand testing only, and only used a few example prompts (shown in the code). No unit tests

zainhoda and others added 4 commits January 26, 2024 23:05

ollama

b4aea35

remove 25 piece training data restriction

e995493

Script to evaluate vanna using test data

7160e2f

added groundedness and agreement measures

cf30d82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vanna trulens performance metrics #238

Vanna trulens performance metrics #238

samoliverschumacher commented Feb 7, 2024 •

edited

Loading

Vanna trulens performance metrics #238

Are you sure you want to change the base?

Vanna trulens performance metrics #238

Conversation

samoliverschumacher commented Feb 7, 2024 • edited Loading

The Problem;

Context;

The solution;

Tests performed

samoliverschumacher commented Feb 7, 2024 •

edited

Loading