Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Build streaming index of labeled data (#376)
* Add bool param for building a graph of labeled data * Add arguments for building labeled index * Pass arguments for labeled index * Light renaming * Handle labels in insert_point * Fix missing semicolon * Add initial label handling logic * Use unlabeled algo for uniquely labeled point * Ignore frozen points when checking labels * Fix missing newline * Move label-specific logic to threadsafe zone * Check for frozen points when assert num points and num labeled points * Fix file name concatenation for label metadata * inmem_graph_store initial impl * Use Lbuild to append to pruned_list during filter build * Add label counts for deleting from streaming index * Fix typo * Fix conditions for testing * Add medoid search to support deleting label medoids from graph * resolvig error with bfs_medoid_search() * trying to create 2 pruned_lists and combine them * Clear pool between calls to search_for_point_and_prune. Fix integer math * Update pruned_list algo for link method * making fz_points to be medoids for labels encountered * repositioning medoids as well because they are fz points when compacting data * removing unrequired method * rebasing from main * adding tests in yml workflow for dynamic index with labels * quick fix * removing combining of unfiltered + filtered list for now * trying to resolve disk search poor performance * incleasing L size while searching disk index * minor roolback * updating dynamic-label to not use tag file while computing GT * altering some test search L values * adding unfiltered search for filtered batch build index * adding compute gt for zipf dist labels in labsls wowrkflow * searching filtered streaming index with popular label for now * reposition fz points as medoids for filtered dynamic build * minor renaming vars * seoparate functio for insert opoint with labels and without labels * clang error fix * barebones of in mem graph store * refactoring index to use index factory * clang format fix * window build fix * making enum to enum class (c++ 11 style) for scope resolution with same enum values * cleaning up API for GraphSore * resolving comments * clang error fix * adding some comments * moving _nd back to index class * removing funcrion reposition medoidds its not required, incorporated into reposition_points * altering -L (32->5) and -R (16->32) whhile building filterted disk index to work well with modified connections in algo * updating docs -> dynamic_index.md to have info on how to build and search filtered dynamic index * updating docs * updateing _pts_to_labels when repositioning fz_points * error fix * clang fix * making sure _pts_to_labels are not empty * fixing dynamic-label build error * code improvements * adding logic for test_ins_del_consolidate to support filtered index * resolving PR comments * error fix * error fix for dynamic * now test insert delete consolidate support building filters * lowering recal in case of test insert delete consolidte * resolving PR comments * removing _num_frozen_point from graph store * minor fix * moving _start back to main + minor update in graph store api to support that * adding a lock before detect_common_filter + minor naming improvement * adding requested changes from Gopal * removing reservations * resolving namespace resolution for defaults after build failure * minor update * minor update * speeding up location update logic while repositioning * updated with reserving mem for graph neighbours upfront * build error fix * minor update in assert * initial commit * updating python bindings to use new ctor * python binding error fix * error fix * reverting some changes -> experiment * removing redundnt code from native index * python build error fix * tyring to resolve python build error * attempt at python build fix * adding IndexSearchParams * setting search threads to non zero * minor check removed * eperiment 3-> making distance fully owned by data_store * exp 3 clang fix * exp 4 * making distance as unique_ptr * trying to fix build * finally fixing problem * some minor fix * adding dll export to index_factory static function * adding dll export for static fn in index_factory * code cleanup * resolving errors after merge * resolving build errors * fixing build error for stitched index * resolving build errors * removing max_observed_degree set() * removing comments + typo fix * replacing add_neighbour with set_neighbours where we can * error fix * minor fix * fixing error introduced while rebasing * fixing error for dynamic filtered index * resolving dynamic build deadlick error * resolving error with test_insert_del_consolidate for dynamic filter build * minor code cleanup * refactoring fz_pts and filter_index to be property of IndexConfig and hence Index * removing write_params from build() * removing write_params from buidl and taking it upfront in Index Ctor * minor fix * renaming build_params to filter params * fixing errors on auto merge * auto decide universal_label experiment * resolving bug with universal lable * resolving dynamic labels error, if there are unused fz points * exposing set_universal_label() through abstract index * minor update: sanity check * minor update to search * including tag file while computing GT * generating compacted label file and using it in generate GT * minor fix * resolving New PR comments (minor typo fixes) * renaming _pts_to_labels to _tag_to_labels + adding a warning for consolidate deletes and quality of index * minor name chnage + code cleanup * clang format fix * adding locks for filter data_structures * avoiding deadock * universal label defination update * reverting locks on _location_to_labels as its causing problems with large dataset * adding locks for _label_to_medoid_id * Update dynamic_index.md * Update dynamic-labels.yml * renaming some variables --------- Co-authored-by: David Kaczynski <[email protected]> Co-authored-by: yashpatel007 <[email protected]> Co-authored-by: Yash Patel <[email protected]> Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
- Loading branch information