Custom Non-LLM evaluators/scores through UI #4484

simonwh · 2024-11-28T15:28:42Z

simonwh
Nov 28, 2024

Describe the feature or potential improvement

Support for non-LLM evaluators for both traces and experiments, and to be able to define and run them in Langfuse, and not locally, would be extremely helpful. While LLM judge is great, there's a big need to use something like "ExactMatch", "ListContains", "ValidJSON" etc., even when domain experts are the prompt engineers (for example imagine categorizing conversations - you would want a combination of LLM + non-LLM evaluators to run in experiments and on your prod traces)

I would say it's one of the (few) places where braintrust.dev excels over Langfuse atm.

Being able to write them ourselves with typescript/python in the UI would be the most flexible, but it would also be very helpful with pre-made evaluators (just like you have with LLM-as-judge).

Some inspiration:

Additional information

No response

marcklingen · 2024-12-02T11:04:41Z

marcklingen
Dec 2, 2024
Maintainer

Thanks for sharing! I agree that the ability to run custom non-LLM evaluators in Langfuse would be helpful. I appreciate you taking the time to share this. I will keep you in the loop, as this is definitely something we are considering adding to Langfuse.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Custom Non-LLM evaluators/scores through UI #4484

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Langfuse

Custom Non-LLM evaluators/scores through UI #4484

simonwh Nov 28, 2024

Describe the feature or potential improvement

Additional information

Replies: 1 comment

marcklingen Dec 2, 2024 Maintainer

simonwh
Nov 28, 2024

marcklingen
Dec 2, 2024
Maintainer