Can't replicate the intended behavior #10

rnabirov · 2023-04-27T13:13:34Z

After installing the app and scraping the repo you referred to in the demo (https://github.com/peterw/Gumroad-Landing-Page-Generator) I can't get the chat to analyze the repo.

This is my chat interaction using the same questions as in the demo. Looks like the repo data embeddings are not used properly in inferences.

This is the logging output in the terminal, not sure if it's relevant.

2023-04-27 19:05:51.248 Deep Lake Dataset in my_test_dataset already exists, loading from the storage
Dataset(path='my_test_dataset', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])

  tensor     htype    shape    dtype  compression
  -------   -------  -------  -------  ------- 
 embedding  generic   (0,)    float32   None   
    ids      text     (0,)      str     None   
 metadata    json     (0,)      str     None   
   text      text     (0,)      str     None   
2023-04-27 19:05:51.255 `label` got an empty value. This is discouraged for accessibility reasons and may be disallowed in the future by raising an exception. Please provide a non-empty label and hide it with label_visibility if needed.

The text was updated successfully, but these errors were encountered:

farizrahman4u · 2023-04-28T08:26:26Z

@rnabirov Cab you try deleting the my_test_dataset folder, run github.py, then chat.py?

rnabirov · 2023-04-28T08:29:06Z

Did it a few times, same result.

farizrahman4u · 2023-04-28T10:22:05Z

What does the output from github.py looks like?

rnabirov · 2023-04-28T11:06:33Z

Cloning into './gumroad'...
remote: Enumerating objects: 27, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 27 (delta 6), reused 11 (delta 2), pack-reused 0
Unpacking objects: 100% (27/27), done.
Created a chunk of size 1525, which is longer than the specified 1000
Created a chunk of size 1020, which is longer than the specified 1000
Created a chunk of size 1540, which is longer than the specified 1000
/Users/rnabirov/opt/anaconda3/lib/python3.8/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.3.2) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.
  warnings.warn(
Your Deep Lake dataset has been successfully created!
The dataset is private so make sure you are logged in!
This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/rnabirov/my_test_repo3
hub://rnabirov/my_test_repo3 loaded successfully.
Evaluating ingest: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00
Dataset(path='hub://rnabirov/my_test_repo3', tensors=['embedding', 'ids', 'metadata', 'text'])

  tensor     htype     shape      dtype  compression
  -------   -------   -------    -------  ------- 
 embedding  generic  (56, 1536)  float32   None   
    ids      text     (56, 1)      str     None   
 metadata    json     (56, 1)      str     None   
   text      text     (56, 1)      str     None

farizrahman4u · 2023-04-28T11:12:27Z

You are ingesting to a cloud dataset with github.py, but chat.py seem to be loading a local dataset. Can I see your .env file (after removing api keys)?

rnabirov · 2023-04-28T11:18:39Z

here it is

OPENAI_API_KEY=""
ACTIVELOOP_TOKEN=""
DEEPLAKE_USERNAME=rnabirov
DEEPLAKE_DATASET_PATH=my_test_dataset
DEEPLAKE_REPO_NAME=my_test_repo3
REPO_URL=https://github.com/peterw/Gumroad-Landing-Page-Generator
SITE_TITLE="Repo analysis chat"

rnabirov · 2023-04-28T11:26:05Z

probably chat.py downloads an empty dataset? the whole my_test_dataset folder is 9000 bytes. tensor_meta.json files in the folder are 400 bytes max

Probably a dumb question. What's the point of downloading a dataset to the local machine, when it's available at activeloop? The script anyway uses outside connections to openai, it might as well work with the remote dataset at activeloop.

rnabirov · 2023-04-29T05:03:49Z

i got it working by pointing DEEPLAKE_DATASET_PATH in .env to the remote dataset which was created at activeloop by github.py. Having separate variables DEEPLAKE_DATASET_PATH and DEEPLAKE_REPO_NAME for the same dataset was confusing for me. I'd suggest combining both

mikayelh · 2023-05-01T21:20:12Z

hey @peterw i think we can close the issue and add the approach suggested by @rnabirov , I've gotten some questions on this myself too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't replicate the intended behavior #10

Can't replicate the intended behavior #10

rnabirov commented Apr 27, 2023

farizrahman4u commented Apr 28, 2023

rnabirov commented Apr 28, 2023

farizrahman4u commented Apr 28, 2023

rnabirov commented Apr 28, 2023

farizrahman4u commented Apr 28, 2023

rnabirov commented Apr 28, 2023

rnabirov commented Apr 28, 2023 •

edited

Loading

rnabirov commented Apr 29, 2023

mikayelh commented May 1, 2023

Can't replicate the intended behavior #10

Can't replicate the intended behavior #10

Comments

rnabirov commented Apr 27, 2023

farizrahman4u commented Apr 28, 2023

rnabirov commented Apr 28, 2023

farizrahman4u commented Apr 28, 2023

rnabirov commented Apr 28, 2023

farizrahman4u commented Apr 28, 2023

rnabirov commented Apr 28, 2023

rnabirov commented Apr 28, 2023 • edited Loading

rnabirov commented Apr 29, 2023

mikayelh commented May 1, 2023

rnabirov commented Apr 28, 2023 •

edited

Loading