Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First version of smart search #61
First version of smart search #61
Changes from 3 commits
bc084a9
978a18d
9a1b593
588ab38
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good. The representative vector in the future could be extended to also use the description as the representative vector in cases where there is not possible to compute one of the sink or where you would want to use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably make the number of pipelines to use configurable. Assume in most cases you might want to pick the highest. Also might want to consider making it configurable to using any pipelines with a similarity score above X.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, I think that making it configurable to similarities above a certain threshold seems best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddematheu Yep, sounds good. But deciding that threshold is difficult. I mean, you never know even the most similar pipe can have similarity scores that are low sounding like 0.4 etc. We need some way to decide this threshold as per the computed similarities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree with this.maybe keeping to top to start is fine. My thinking is that we could provide different modalities. One where we pass results for N highest pipelines (default to N=1) and another for threshold based where we pass results for pipelines with similarity is higher than N (default N = 0.5 or whatever)
One additional modality might be to look at distribution. Where N is the max distance or % away from the highest scoring.
Btw this are just thoughts, happy to commit to starting with highest and add other modalities based on feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddematheu These are interesting ideas. Especially the distribution one, sounds similar to(not exactly) p-sampling in language models, maybe we can use that idea, like considering all pipelines whose similarities add up to or more than a value, considering descending order of similarities. But yes, its upto the user feedback that you get.