Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maxLengthTip to web and increase max-chunk-size #10981

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xu-song
Copy link
Contributor

@xu-song xu-song commented Nov 22, 2024

Summary

  1. Add maxLengthTip: Maximum chunk length is ambiguous. It refers to the num of tokens, which could be confused with string length.
  2. Increase max-chunk-size: With default gpt2-tokenizer, 1000 tokens is roughly equivalent to 400 CJK characters. It is not enough in most cases.

Screenshots

Screenshot 2024-11-22 at 17 04 06

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 📚 documentation Improvements or additions to documentation labels Nov 22, 2024
@xu-song xu-song changed the title Add maxLengthTip to web Add maxLengthTip to web and increase max-chunk-size Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 documentation Improvements or additions to documentation size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant