-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Index
from regex
#125
Draft
torymur
wants to merge
22
commits into
main
Choose a base branch
from
index-from-regex-97
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,056
−2,521
Draft
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
606b460
Build Index from regex
torymur bdc120d
Test Index from regex in Guide
torymur 6c5b853
Use FxHash* as default Hash*
torymur f349404
Cleaner from_regex logic
torymur 15a85aa
Use bytes as Token type, more tests for Index
torymur f02faec
Drop majority of intermediate structures
torymur 70b4bc6
Add PyGuide, use proper types for Index
torymur b477598
Provide basic Guide binding, test it
torymur f3266ee
Improve Vocabulary python binding, add tests
torymur 7edb831
Non-optional eos_token_id
torymur 03e5561
Stabilize vocabulary interface
torymur 64f0d73
Add tests for Guide
torymur 2ab0007
Python vocabulary to accept pretrained params
torymur 063d1c2
Correct interface in pyi, reprs for all python bindings
torymur f65d86f
Adjust benchmarks
torymur 30e29ef
Drop unused dependencies
torymur e04e5be
Index by ref in Guide
torymur 7b6781b
Extend interface of python bindings
torymur 1fab872
Disallow insert of eos token into Vocabulary
torymur 15a45c0
Stabilize Index interfaces
torymur bf6e8a6
Use new interface in statistical
torymur 3fef1d8
Add `remove` to vocabulary interfaces
torymur File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpsimpson Could you, please, take a look at this failing test: https://github.com/dottxt-ai/outlines-core/actions/runs/12695987037/job/35389081344
Interface has changed and I updated it here somewhat accordingly, but it needs to be checked, for example I added third value to
Vocabulary
(instead of eos token before that) just for the sake of keeping the dimensions right, I suspect that could be incorrect 😅But overall, since its statistical, it goes over my head to fully understand the intentions and adjust properly expected testing values or construction logic, so I would really appreciate your help here 🙏