-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] HttpError : Invalid bucket name: 'wikipedia-simple-text-embedding-ada-002-100K', 400 #35
Comments
I have experienced the same issue. This relates to https://community.pinecone.io/t/pinecone-datasets-httperror-invalid-bucket-name-wikipedia-simple-text-embedding-ada-002-100k-400/3715/3 . Root cause is that the code is using gs://catalog_base_path\dataset_id The "dirty" fix is to modify this line of code https://github.com/pinecone-io/pinecone-datasets/blob/main/pinecone_datasets/dataset.py#L95 To dataset_path = f"{catalog_base_path}/{dataset_id}" But that wont work when the |
Thanks @martinohanlon ! |
@David-GERARD I dont think the issue should be close. It is a bug which should be fixed imo. |
- src/llm_utilikit/LangChain/notebooks/langchain-embeddings-retrieval-agent.ipynb I found a dirty fix, but don't know how to use it and am currently too lazy to find out. - pinecone-io/pinecone-datasets#35
@martinohanlon your solution worked however another error pops up afterwards.
Code in local Jupyter Notebook (Win10):
^--- modified from: https://docs.pinecone.io/docs/using-public-datasets Exact code worked in Google colab notebook (@David-GERARD fyi) |
Hey. The dirtiest solution is to patch os.path.join at the beginning of datasets.py |
I implemented a "dirty fix" inspired by @martinohanlon's comment. It essentially required changing multiple lines of dataset.py where "os.path.join" is used by an if-else block that constructs the paths with f-strings and forward slash characters in case the system's platform is Windows. For example:
Here's my fork in case anyone trying to download a Pinecone dataset on Windows finds it useful: |
Is this a new bug?
Current Behavior
Hi,
I have used code from one of the example colab Notebook on RAG with langchain to make a lab for students on vector databases.
A minority of the students encountered the following error when importing the wikipedia-simple-text-embedding-ada-002-100K dataset from pinecone_datasets:



Expected Behavior
This cell is supposed to run and import the dataset (it works on my laptop and for most of the students).
Steps To Reproduce
In python 3.11 with the packages versions described later run pinecone_datasets.load_dataset('wikipedia-simple-text-embedding-ada-002-100K ')
Relevant log output
No response
Environment
Additional Context
None of our troubleshooting attempts worked, and we have not identifier the common denominator that leads to this error happening. When using the list_datasets() method, the wikipedia-simple-text-embedding-ada-002-100K appears in the list, and we were thinking it might be a server side error.
The text was updated successfully, but these errors were encountered: