Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch 'main' into yiyolin/internal_main #468

Closed
wants to merge 23 commits into from

Conversation

NeelamMahapatro
Copy link
Contributor

  • Does this PR have a descriptive title that could go in our release notes?
  • Does this PR add any new dependencies?
  • Does this PR modify any existing APIs?
    • Is the change to the API backwards compatible?
  • Should this result in any changes to our documentation, either updating existing docs or adding new ones?

Reference Issues/PRs

What does this implement/fix? Briefly explain your changes.

Any other comments?

jonmclean and others added 23 commits August 4, 2023 09:21
* Added PDoc workflow

* Added documentation to the push-test workflow

* Added diskannpy to the env for pdoc to use

* Initial commit of doc publish workflow

* Tried heredoc to get python version

* Tried another way of getting the version

* Tried another way of getting the version

* Moved to docs/python path

* Removing the test harness

* Add dependencies per wheel

* Moved dependency tree to the 'push' file so it runs on push

* Added label name to the dependency file

* Trying maxtrix.os to get the os and version

* Moved doc generation from push-test to python-release.  Will add 'dev' doc generation to push-test

* Publish latest/version docs only on release.  Publish docs for every dev build on main.

* Install the local-file version of the library

* Disable branch check so I can test the install

* Use python build to build a wheel for use in documentation

* Tried changing to python instead of python3

* Added checkout depth in order to get boost

* Use the python build action to create wheel for documentation

* Revert "Use the python build action to create wheel for documentation"

This reverts commit d900c1d.

* Added linux environment setup

* Made only publish dev when on main and added comments

---------

Co-authored-by: Jonathan McLean <[email protected]>
* moved ssd index constants to defaults.h
* Have a working dockerfile to run perf tests and report the times they take. We can also capture stdout/stderr with it for further information, especially for tools that report internal latencies.

* Slight changes to the perf test script, a perf.yml for the github action
* make sector node an inline function

* convert offset_node macro to inline method

* rename member vars to start with underscore in pq_flash_index.h

* added support in create_disk_index

* add read sector util

* load_cache_list now uses read_blocks util

* allow nullptr for read_nodes

* BFS cache generation uses util

* add num_sectors info to cache_beam_Search

* add CI test for 1020,1024,1536D float and 4096D int8 rand vector on disk
* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving gopal's comments

* resolving build failures
* move read_nodes to public, add get_pq_vector and get_num_points

* clang-format

* Match new private var naming convention

* more private (_) fixes

* VID->vid

* VID->vid cpp
* fix OLS build

* Add a build to CI with feature flags enabled
* inmem_graph_store initial impl

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* moving _nd back to index class

* resolving PR comments

* error fix

* error fix for dynamic

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix
* Undo mistake, let frontier read in PQ flash index be asynchronous

* address changes requested
#439)

* Reduce CI tests for multi-sector disk layout from 10K to 5K points so they run faster

* turn off 1024D
* removing write_params from buidl and taking it upfront in Index Ctor

* renaming build_params to filter params
* made changes to clean up filter number conversion, and fixed bug with universal filter search

* minor typecast fix

---------

Co-authored-by: rakri <[email protected]>
)

* Fixes #432, bug in using openmp with gcc and omp_get_num_threads() only reporting the number of threads collaborating on the current code region not available overall. I made this error and transitioned us from omp_get_num_procs() about 5 or 6 months ago and only with bug #432 did I really get to see how problematic my naive expectations were.

* Removed cosine distance metric from disk index until we can properly fix it in pqflashindex. Documented what distance metrics can be used with what vector dtypes in tables in the documentation.
* Add bool param for building a graph of labeled data

* Add arguments for building labeled index

* Pass arguments for labeled index

* Light renaming

* Handle labels in insert_point

* Fix missing semicolon

* Add initial label handling logic

* Use unlabeled algo for uniquely labeled point

* Ignore frozen points when checking labels

* Fix missing newline

* Move label-specific logic to threadsafe zone

* Check for frozen points when assert num points and num labeled points

* Fix file name concatenation for label metadata

* inmem_graph_store initial impl

* Use Lbuild to append to pruned_list during filter build

* Add label counts for deleting from streaming index

* Fix typo

* Fix conditions for testing

* Add medoid search to support deleting label medoids from graph

* resolvig error with bfs_medoid_search()

* trying to create 2 pruned_lists and combine them

* Clear pool between calls to search_for_point_and_prune. Fix integer math

* Update pruned_list algo for link method

* making fz_points to be medoids for labels encountered

* repositioning medoids as well because they are fz points when compacting data

* removing unrequired method

* rebasing from main

* adding tests in yml workflow for dynamic index with labels

* quick fix

* removing combining of unfiltered + filtered list for now

* trying to resolve disk search poor performance

* incleasing L size while searching disk index

* minor roolback

* updating dynamic-label to not use tag file while computing GT

* altering some test search L values

* adding unfiltered search for filtered batch build index

* adding compute gt for zipf dist labels in labsls wowrkflow

* searching filtered streaming index with popular label for now

* reposition fz points as medoids for filtered dynamic build

* minor renaming vars

* seoparate functio for insert opoint with labels and without labels

* clang error fix

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* window build fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* resolving comments

* clang error fix

* adding some comments

* moving _nd back to index class

* removing funcrion reposition medoidds its not required, incorporated into reposition_points

* altering -L (32->5) and -R (16->32) whhile building filterted disk index to work well with modified connections in algo

* updating docs -> dynamic_index.md to have info on how to build and search filtered dynamic index

* updating docs

* updateing _pts_to_labels when repositioning fz_points

* error fix

* clang fix

* making sure _pts_to_labels are not empty

* fixing dynamic-label build error

* code improvements

* adding logic for test_ins_del_consolidate to support filtered index

* resolving PR comments

* error fix

* error fix for dynamic

* now test insert delete consolidate support building filters

* lowering recal in case of test insert delete consolidte

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding a lock before detect_common_filter + minor naming improvement

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix

* minor fix

* fixing error introduced while rebasing

* fixing error for dynamic filtered index

* resolving dynamic build deadlick error

* resolving error with test_insert_del_consolidate for dynamic filter build

* minor code cleanup

* refactoring fz_pts and filter_index to be property of IndexConfig and hence Index

* removing write_params from build()

* removing write_params from buidl and taking it upfront in Index Ctor

* minor fix

* renaming build_params to filter params

* fixing errors on auto merge

* auto decide universal_label experiment

* resolving bug with universal lable

* resolving dynamic labels error, if there are unused fz points

* exposing set_universal_label() through abstract index

* minor update: sanity check

* minor update to search

* including tag file while computing GT

* generating compacted label file and using it in generate GT

* minor fix

* resolving New PR comments (minor typo fixes)

* renaming _pts_to_labels to _tag_to_labels + adding a warning for consolidate deletes and quality of index

* minor name chnage + code cleanup

* clang format fix

* adding locks for filter data_structures

* avoiding deadock

* universal label defination update

* reverting locks on _location_to_labels as its causing problems with large dataset

* adding locks for _label_to_medoid_id

* Update dynamic_index.md

* Update dynamic-labels.yml

* renaming some variables

---------

Co-authored-by: David Kaczynski <[email protected]>
Co-authored-by: yashpatel007 <[email protected]>
Co-authored-by: Yash Patel <[email protected]>
Co-authored-by: Harsha Vardhan Simhadri <[email protected]>
* add check for .enc extension to support encryption

* check rotation_matrix file in file blobs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants