Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In many applications, output.text is too long to input #35

Open
jessicazu opened this issue Mar 18, 2023 · 3 comments
Open

In many applications, output.text is too long to input #35

jessicazu opened this issue Mar 18, 2023 · 3 comments

Comments

@jessicazu
Copy link

jessicazu commented Mar 18, 2023

Thank you for the wonderful tool!

Problem

When inputting to GPT-4 at once, I get the error "The message you submitted was too long, please reload the conversation and submit something shorter."

Details

In my fairly simple React app, when I output output.text, the number of lines reached 6 million, and of course, most of it was node_modules. When I excluded descriptions unrelated to the app, such as node_modules, yarn.lock, and .git/, it reduced to about 1,500 lines, but I still got the same error.

Now, I've declared to send one file at a time, and by repeatedly copying and pasting, it seems that the files can be read.

I will now type in the code for an application, file by file. The following text is a Git repository with code. The structure of the text are sections that begin with ----, followed by a single line containing the file path and file name, followed by a variable amount of lines containing the file contents. The text representing the Git repository ends when the symbols --END-- are encounted. Any further text beyond --END-- are meant to be interpreted as instructions using the aforementioned Git repository as context.

Suggestion

It might be useful to have features like the following:

  • A function to exclude large code, binary code, and code unrelated to the app
  • A function to divide text files according to the maximum character count of each GPT API, and add a prompt to that effect

By the way

With GPT-4, I could send about 400 lines at a time (4-5 files).
Also, every time I sent it, the contents of the file were properly explained, and when I sent it up to --END--, it even created a summary for me, which was impressive.

From now on, I would like to verify whether GPT can handle code maintenance and adding new features.

@About7Sharks
Copy link

  1. Exclude is available by using .gptignore placed in the repo you are trying to read from root folder. It behaves the same as .gitignore.
  2. It would be nice if the divide text function could be passed like --maxTokens=15000

shanecp added a commit to shanecp/gpt-repository-loader that referenced this issue Apr 3, 2023
- Update docs to support
- Add requirements file.

This gives a solution for mpoon#35, mpoon#37. Perhaps mpoon#26
@batjko
Copy link

batjko commented Apr 19, 2023

It might also be possible to apply a soft form of minification, that cuts and removes more fat, preserves just enough structure and meaning for GPT4 to still understand (e.g. removing all whitespaces, some vowels, many non-critical words in comments (e.g. a, to, the...) etc.

I wouldn't be surprised if there wasn't already such an algorithm somewhere.

@agn-7
Copy link

agn-7 commented Oct 21, 2023

It would be great to add a new feature or an option to exclude unnecessary files or folders.

For instance something like this:

python gpt_repository_loader.py ../<source-code-path> --exclude "*.pyc,.env/,.git/"

[UPDATE]:

I just realized this feature already exists on this repository with .gtpignore file so that you can append your other unwanted patterns within it.

here's mine:

__pycache__/
*.pyc
*.pyi
*.log
.git/*
.gptignore
.dockerignore
LICENSE
.github/*
.tox/*
.mypy_cache/*
*.whl
*.tar
*.tar.gz
.gitignore
*.env*
*.png
*.jpeg
*.jpg
*bin/*
alembic/*
*/migrations/*
tests/*
*.ini
*.xml
*.md
*.lock
env/*
venv/*
.coverage
.pytest_cache/*
*.css
.flake8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants