Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't replicate the intended behavior #10

Open
rnabirov opened this issue Apr 27, 2023 · 9 comments
Open

Can't replicate the intended behavior #10

rnabirov opened this issue Apr 27, 2023 · 9 comments

Comments

@rnabirov
Copy link

After installing the app and scraping the repo you referred to in the demo (https://github.com/peterw/Gumroad-Landing-Page-Generator) I can't get the chat to analyze the repo.

This is my chat interaction using the same questions as in the demo. Looks like the repo data embeddings are not used properly in inferences.
image

This is the logging output in the terminal, not sure if it's relevant.

2023-04-27 19:05:51.248 Deep Lake Dataset in my_test_dataset already exists, loading from the storage
Dataset(path='my_test_dataset', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])

  tensor     htype    shape    dtype  compression
  -------   -------  -------  -------  ------- 
 embedding  generic   (0,)    float32   None   
    ids      text     (0,)      str     None   
 metadata    json     (0,)      str     None   
   text      text     (0,)      str     None   
2023-04-27 19:05:51.255 `label` got an empty value. This is discouraged for accessibility reasons and may be disallowed in the future by raising an exception. Please provide a non-empty label and hide it with label_visibility if needed.
@farizrahman4u
Copy link

@rnabirov Cab you try deleting the my_test_dataset folder, run github.py, then chat.py?

@rnabirov
Copy link
Author

Did it a few times, same result.

@farizrahman4u
Copy link

What does the output from github.py looks like?

@rnabirov
Copy link
Author

Cloning into './gumroad'...
remote: Enumerating objects: 27, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 27 (delta 6), reused 11 (delta 2), pack-reused 0
Unpacking objects: 100% (27/27), done.
Created a chunk of size 1525, which is longer than the specified 1000
Created a chunk of size 1020, which is longer than the specified 1000
Created a chunk of size 1540, which is longer than the specified 1000
/Users/rnabirov/opt/anaconda3/lib/python3.8/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.3.2) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.
  warnings.warn(
Your Deep Lake dataset has been successfully created!
The dataset is private so make sure you are logged in!
This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/rnabirov/my_test_repo3
hub://rnabirov/my_test_repo3 loaded successfully.
Evaluating ingest: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00
Dataset(path='hub://rnabirov/my_test_repo3', tensors=['embedding', 'ids', 'metadata', 'text'])

  tensor     htype     shape      dtype  compression
  -------   -------   -------    -------  ------- 
 embedding  generic  (56, 1536)  float32   None   
    ids      text     (56, 1)      str     None   
 metadata    json     (56, 1)      str     None   
   text      text     (56, 1)      str     None   

@farizrahman4u
Copy link

You are ingesting to a cloud dataset with github.py, but chat.py seem to be loading a local dataset. Can I see your .env file (after removing api keys)?

@rnabirov
Copy link
Author

here it is

OPENAI_API_KEY=""
ACTIVELOOP_TOKEN=""
DEEPLAKE_USERNAME=rnabirov
DEEPLAKE_DATASET_PATH=my_test_dataset
DEEPLAKE_REPO_NAME=my_test_repo3
REPO_URL=https://github.com/peterw/Gumroad-Landing-Page-Generator
SITE_TITLE="Repo analysis chat"

@rnabirov
Copy link
Author

rnabirov commented Apr 28, 2023

probably chat.py downloads an empty dataset? the whole my_test_dataset folder is 9000 bytes. tensor_meta.json files in the folder are 400 bytes max

Probably a dumb question. What's the point of downloading a dataset to the local machine, when it's available at activeloop? The script anyway uses outside connections to openai, it might as well work with the remote dataset at activeloop.

@rnabirov
Copy link
Author

i got it working by pointing DEEPLAKE_DATASET_PATH in .env to the remote dataset which was created at activeloop by github.py. Having separate variables DEEPLAKE_DATASET_PATH and DEEPLAKE_REPO_NAME for the same dataset was confusing for me. I'd suggest combining both

@mikayelh
Copy link

mikayelh commented May 1, 2023

hey @peterw i think we can close the issue and add the approach suggested by @rnabirov , I've gotten some questions on this myself too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants