Description
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
A clear and concise description of what the bug is.
ValueError Traceback (most recent call last)
Cell In[38], line 19
12 generator = TestsetGenerator.from_langchain(
13 llm=generator_llm,
14 embedding_model=generator_embeddings,
15 )
17 query_distribution = default_query_distribution(generator_llm)
---> 19 testset = generator.generate_with_langchain_docs(
20 documents=chunks,
21 testset_size=10,
22 query_distribution=query_distribution,
23 )
File ~/Desktop/project/.venv/lib/python3.12/site-packages/ragas/testset/synthesizers/generate.py:164, in TestsetGenerator.generate_with_langchain_docs(self, documents, testset_size, transforms, transforms_llm, transforms_embedding_model, query_distribution, run_config, callbacks, with_debugging_logs, raise_exceptions)
159 raise ValueError(
160 """An embedding client was not provided. Provide an embedding through the transforms_embedding_model parameter. Alternatively you can provide your own transforms through the transforms
parameter."""
161 )
163 if not transforms:
--> 164 transforms = default_transforms(
165 documents=list(documents),
166 llm=transforms_llm or self.llm,
167 embedding_model=transforms_embedding_model or self.embedding_model,
168 )
170 # convert the documents to Ragas nodes
...
161 "Documents appears to be too short (ie 100 tokens or less). Please provide longer documents."
162 )
164 return transforms
ValueError: Documents appears to be too short (ie 100 tokens or less). Please provide longer documents.
Ragas version: 0.2.15
Python version: 3.12
Code to Reproduce
from ragas.llms.base import LangchainLLMWrapper
from ragas.embeddings.base import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from ragas.testset import TestsetGenerator
from ragas.testset.synthesizers import default_query_distribution
generator_llm = LangchainLLMWrapper(langchain_llm=ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(embeddings=OpenAIEmbeddings(model="text-embedding-3-small"))
generator = TestsetGenerator.from_langchain(
llm=generator_llm,
embedding_model=generator_embeddings,
)
query_distribution = default_query_distribution(generator_llm)
testset = generator.generate_with_langchain_docs(
documents=chunks,
testset_size=10,
query_distribution=query_distribution,
)
Error trace
ValueError
Expected behavior
Creation of test dataset
Additional context
Add any other context about the problem here.