Replies: 1 comment
-
Thanks for sharing! I agree that the ability to run custom non-LLM evaluators in Langfuse would be helpful. I appreciate you taking the time to share this. I will keep you in the loop, as this is definitely something we are considering adding to Langfuse. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the feature or potential improvement
Support for non-LLM evaluators for both traces and experiments, and to be able to define and run them in Langfuse, and not locally, would be extremely helpful. While LLM judge is great, there's a big need to use something like "ExactMatch", "ListContains", "ValidJSON" etc., even when domain experts are the prompt engineers (for example imagine categorizing conversations - you would want a combination of LLM + non-LLM evaluators to run in experiments and on your prod traces)
I would say it's one of the (few) places where braintrust.dev excels over Langfuse atm.
Being able to write them ourselves with typescript/python in the UI would be the most flexible, but it would also be very helpful with pre-made evaluators (just like you have with LLM-as-judge).
Some inspiration:
Additional information
No response
Beta Was this translation helpful? Give feedback.
All reactions