-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support safe update #308
base: main
Are you sure you want to change the base?
support safe update #308
Conversation
LOG_ERROR_AND_RETURNS(ErrorType::INVALID_ARGUMENT, | ||
fmt::format("failed to pretrain(invalid argument): base tag id " | ||
"({}) doesn't belong to index", | ||
base_tag_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a minor fix: error message "bas tag id" to "base tag id"
* @return result indicates whether the update operation is successful. | ||
*/ | ||
virtual tl::expected<bool, Error> | ||
UpdateVector(int64_t id, const DatasetPtr& new_base, bool need_fine_tune = false) { | ||
UpdateVector(int64_t id, const DatasetPtr& new_base, bool force_update = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. We are not responsible for the consequences of this interface if the user specifies force_update = true
Codecov ReportAll modified and coverable lines are covered by tests ✅ @@ Coverage Diff @@
## main #308 +/- ##
==========================================
+ Coverage 49.75% 49.81% +0.06%
==========================================
Files 331 335 +4
Lines 32701 33124 +423
Branches 4129 4190 +61
==========================================
+ Hits 16271 16502 +231
- Misses 15568 15755 +187
- Partials 862 867 +5
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
"ef_search": {} | ||
}} | ||
}})", | ||
vsag::UPDATE_CHECK_SEARCH_L), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
float neighbor_dist = | ||
std::reinterpret_pointer_cast<hnswlib::HierarchicalNSW>(alg_hnsw_) | ||
->getDistanceByLabel(result.value()->GetIds()[i], new_base_vec); | ||
if (neighbor_dist < self_dist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here need set a range limit ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implicitly check the range by checking the neighborhood relationship, since it's hard to directly give a range that is general for every dataset. Besides, if the nearest neighbor of the updated vector is still the old vector, it remains our assumption of this interface (the updated vector stays in a nearby area around the original vector).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe too strict? Have you do a feedback test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a strict check for the non --force option to protect the index structure. Besides, I test this interface for 1.75M updates, only 400 updates failed in this check (i.e., failed rate = 0.02%).
for (int i = 0; i < result.value()->GetDim(); i++) { | ||
float neighbor_dist = | ||
std::reinterpret_pointer_cast<hnswlib::HierarchicalNSW>(alg_hnsw_) | ||
->getDistanceByLabel(result.value()->GetIds()[i], new_base_vec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#309 provides a batch interface, which can be considered for use later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix this in other pr?
Signed-off-by: zhongxiaoyao.zxy <[email protected]>
652697a
to
9ef9613
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
closes: #225