Skip to content

Releases: alexklibisz/elastiknn

0.1.0-PRE10

05 Apr 02:33
a15c336
Compare
Choose a tag to compare
  • Introduced a cache for exact similarity queries that maintains deserialized vectors in memory instead of repeatedly
    reading them and deserializing them. By default the cache entries expire after 90 seconds.
  • Fixed a mapping issue that was causing warnings to be printed at runtime. Specifically, the term fields corresponding
    to a vector should be given the same name as the field where the vector is stored. A bit confusing, but it works.

0.1.0-PRE9

04 Apr 15:31
2e0eb08
Compare
Choose a tag to compare
  • Remove the usage of Protobufs at the API level. Instead implemented a more idiomatic Elasticsearch API. Now using c
    ustom case classes in scala and data classes in Python, which is more tedious, but worth it for a more intuitive API.
  • Remove the pipelines in favor of processing/indexing vectors in the custom mapping. The model parameters are defined in
    the mapping and applied to any document field with type elastiknn_sparse_bool_vector or elastiknn_dense_float_vector.
    This eliminates the need for a pipeline/processor and the need to maintain custom mappings for the indexed vectors.
  • Implement all queries using custom Lucene queries. This is tightly coupled to the custom mappings, since the mappings
    determine how vector hashes are stored and can be queried. For now I've been able to use very simple Lucene Term and
    Boolean queries.
  • Add a "sparse indexed" mapping for jaccard and hamming similarities. This stores the indices of sparse boolean vectors
    as Lucene terms, allowing you to run a term query to get the intersection of the query vector against all stored vectors.

0.1.0-PRE8

29 Feb 13:54
679b199
Compare
Choose a tag to compare
  • Removed the num_tables argument from JaccardLshOptions as it's redundant to num_bands.
  • Profiled and refactored the JaccardLshModel using the Ann-benchmarks Kosarak Jaccard dataset.
  • Added an example program that grid-searches JaccardLshOptions for best performance and plots the Pareto front.

0.1.0-PRE7

15 Feb 19:32
9499527
Compare
Choose a tag to compare
  • Got rid of base64 encoding/decoding in ElastiKnnVectorFieldMapper. This improves ann-benchmarks performance by about 20%.

0.1.0-PRE6

15 Feb 16:41
fed9110
Compare
Choose a tag to compare
  • Improved exact Jaccard performance by implementing a critical path in Java so that it uses primitive int [] arrays instead of boxed integers in scala.

0.1.0-PRE5

14 Feb 05:09
3c2d25c
Compare
Choose a tag to compare
  • Fixed performance regression.

0.1.0-PRE4

13 Feb 05:55
de009de
Compare
Choose a tag to compare
  • Client and core library interface improvements.
  • Added use_cache parameter to KNearestNeighborsQuery which signals that the vectors should only be read once from Lucene and then cached in memory.

0.1.0-PRE3

08 Feb 20:13
242d54c
Compare
Choose a tag to compare
  • Releasing versioned python client library to PyPi.

0.1.0-PRE2

08 Feb 16:35
f9fb652
Compare
Choose a tag to compare
  • Releasing versioned elastiknn plugin zip file.