Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

confident-ai / deepeval Public

Notifications You must be signed in to change notification settings
Fork 399
Star 4.7k

Code
Issues 125
Pull requests 7
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: confident-ai/deepeval

Releases Tags

Releases · confident-ai/deepeval

LLM-Evals now support all LangChain chatmodels

16 Jan 11:22

penguine-ip

v0.20.57

be8e95c

Compare

Choose a tag to compare

View all tags

LLM-Evals now support all LangChain chatmodels

LLM-Evals (LLM evaluated metrics) now support all of langchain's chat models.
LLMTestCase now has execution_time and cost, useful for those looking to evaluate on these parameters
minimum_score is now threshold instead, meaning you can now create custom metrics that either have a "minimum" or "maximum" threshold
LLMEvalMetric is now GEval
Llamaindex Tracing integration: (https://docs.llamaindex.ai/en/stable/module_guides/observability/observability.html#deepeval)

Assets 2

All reactions

ALL RAG Metrics now offers score reasoning, and a lot more.

28 Dec 11:50

penguine-ip

v0.20.43

ab16dc3

Compare

Choose a tag to compare

View all tags

ALL RAG Metrics now offers score reasoning, and a lot more.

In this release:

Faithfulness, Answer Relevancy, Contextual Relevancy, Contextual Precision, and Contextual Recall, all offer a reasoning for its given score.
Azure OpenAI now supported via a single command in the CLI: https://docs.confident-ai.com/docs/metrics-introduction#using-azure-openai
New Summarization Metric that uses the QAG framework for its implementation: https://docs.confident-ai.com/docs/metrics-summarization
Pulling datasets from Confident AI now offers an intermediate step for additional data processing before evaluation: https://docs.confident-ai.com/docs/confident-ai-evaluate-datasets#pull-your-dataset-from-confident-ai
Decoupled imports from transformers, sentence_transformers, and pandas to reduce package size

Assets 2

All reactions

Lots of new features

14 Dec 10:50

penguine-ip

v0.20.35

c5045b1

Compare

Choose a tag to compare

View all tags

Lots of new features

Lots of new features this release:

JudgementalGPT now allows for different languages - useful for our APAC and European friends
RAGAS metrics now supports all OpenAI models - useful for those running into context length issues
LLMEvalMetric now returns a reasoning for its score
deepeval test run now has hooks that call on test run completion
evaluate now displays retrieval_context for RAG evaluation
RAGAS metric now displays metric breakdown for all its distinct metrics

Assets 2

All reactions

Continuous Evaluation

22 Nov 12:45

penguine-ip

v0.20.27

75bb4c8

Compare

Choose a tag to compare

View all tags

Continuous Evaluation Pre-release

Pre-release

Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):

-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time

Assets 2

All reactions

Continuous Evaluation

04 Dec 10:42

penguine-ip

v0.20.23

0a57b91

Compare

Choose a tag to compare

View all tags

Continuous Evaluation

Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):

Assets 2

All reactions

Evaluate entire datasets

16 Nov 07:20

penguine-ip

v0.20.19

c7c0b8b

Compare

Choose a tag to compare

View all tags

Evaluate entire datasets

Mid-week bug fixes release with an extra feature:

run_test now works
new function evaluate, evaluates a list of test cases (dataset) on metrics you define, all without having to go through the CLI. More info here: https://docs.confident-ai.com/docs/evaluation-datasets#evaluate-your-dataset-without-pytest

Assets 2

All reactions

Judgemental GPT

14 Nov 05:12

penguine-ip

v0.20.18

727fdb3

Compare

Choose a tag to compare

View all tags

Judgemental GPT

In this release, deepeval has added support for:

JudgementalGPT, a dedicated LLM app developed by Confident AI to perform evaluations more robustly and accurately. JudgementalGPT provides a score and a reason for the score.
Parallel testing: execute test cases in parallel and speed up evaluation up to 100x.

Assets 2