Type | Description |
---|---|
boolean | boolean values |
number | any type of numbers. int, float .. |
string | any type of texts |
compressed_string | compressed text, queryable only for equality |
datetime | datetime.datetime objects |
blob | files as bytes, queryable for equality and sorted on size |
json | any type of json dump-able objects |
normalized_embedding | float32 normalized numpy array, queryable for equality, sorting or scoring with similarity to a query embedding |
other | any objects. stored as pickled blobs internally, queryable for equality and sorted on size |
- A schema has to be specified at first initialisation of the index and cannot be modified later on
- data is accessed or updated as a dict of the format {id: record} where record is a dict of format following schema and id is a string
- keys of record /schema can be anything that can be keys in a python dict. eg:
schema_1 = {0: "string", "a": "number"}
- An in-memory index is created by default, i.e: no
db_path
is specified, and cannot be accessed from other processes and threads - If
db_path
is specified, disk-based index is initiated which is accessible from all processes, threads and is persistent None
is allowed as a value for all keys and is the default value for all keys
- Initialize
- Insert/Update
- Get
- Drop or clear Index
- Search
- Distinct
- Group by
- Pop
- Delete
- Count
- Optimize for Search
- Math
- Trigger
- Vaccum
- Export
params
name
: name of the index,no default
, has to be specified, cannot begin with__
schema
: schema of the index,defaults to None
, has to be specified if index does not exist, cannot be modified later ondb_path
: path to the index file.defaults to None
, in-memory index is createdram_cache_mb
: size of the ram cache in MB.defaults to 64
compression_level
: compression level for strings, blobs etcdefaults to -1
, None for no compression
example use
from liteindex import DefinedIndex, DefinedTypes
schema = {
"name": DefinedTypes.string,
"age": DefinedTypes.number,
"password": DefinedTypes.string,
"verified": DefinedTypes.boolean,
"birthday": DefinedTypes.datetime,
"profile_picture": DefinedTypes.blob,
"nicknames": DefinedTypes.json,
"user_embedding": DefinedTypes.normalized_embedding,
"user_bio": DefinedTypes.compressed_string,
}
index = DefinedIndex(
name="user_details",
schema=schema,
db_path="./test.liteindex"
)
- Insert or update single or multiple, partial or full records at once
- Is atomic operation
- input format:
{id: record, id1: record, ....}
params
data
: dict of format{id1: record1, id2: partial_record2, ....}
return
: None
example use
index.update(
{
"john_doe": {
"name": "John Doe",
"age": 25,
"password": "password",
"verified": True,
"birthday": datetime.datetime(1995, 1, 1),
"profile_picture": b"......",
"nicknames": ["John", "Doe"],
"user_embedding": np.array([1, 2, 3]),
"user_bio": "This is a long string that will be compressed and stored"
},
"jane_doe": {
"name": "Jane Doe",
"age": 28,
"verified": True,
}
}
)
- accepts a single id or a list of ids
- always returns a dict of format
{id: record, id1: record, ....}
params
ids
: key or list of keys to get,no default
select_keys
: list of keys to include in the returned record.defaults to None
- selects all keysreturn
: dict of format{id: record, id1: record, ....}
example use
index.get("john_doe")
# {"john_doe": record_for_john_doe}
index.get(["john_doe", "jane_doe"])
# {"john_doe": record_for_john_doe, "jane_doe": record_for_jane_doe}
- can delete by single id or list of ids or by query
params
ids
: key or list of keys to delete,no default
query
: query dictionary to delete records matching the query.defaults to None
- deletes no recordreturn
: None
# delete a single record or multiple records or by query
# returns dict of format {key: record} of deleted records
index.delete("john_doe")
index.delete(["john_doe", "jane_doe"])
index.delete(query={"name": "John Doe"})
- no params
# clear the index
index.clear()
# drop/ delete the index completely
index.drop()
-
query
: Query dictionary. Defaults to{}
which will return all records. Full list of queries supported -
sort_by
: A key from schema.Defaults to None
which will return records in insertion order. ifsort_by
is a key of type normalized_embedding, a np array has to be provided insort_by_embedding
to sort by similarity to this array, scores will be returned in__meta
key of the record -
reversed_sort
: Defaults toFalse
. IfTrue
, will return records in reverse order. -
n
: Defaults toNone
which will return all records. -
page_no
: Defaults to1
which will return the first page of n records. -
offset
: Defaults to0
which will return the first page of n records. page_no or offset can be used, not both. -
select_keys
: A list of keys from schema. Defaults toNone
which will return all keys. -
update
: Optional dictionary of format{key: record}
. If provided, will update the records in the index that match the query and return the updated records. -
return_metadata
: -
metadata_key_name
: defaults to__meta
under this key will be a dict with {"integer_id": unique_integer_id, "updated_at": last_update_at time from epoch, "score": if doing embedding sort} -
sort_by_embedding
: ifsort_by
is a key of type normalized_embedding, a np array has to be provided here to sort by similarity to this array -
return
: dict of format{id: record, id1: record, ....}
Full list of queries supported
index.search()
# Returns {id_1: record_1, id_2: record_2, ...}
# Each record is a dict of format following schema, as inserted.
# By default ordered by insertion order, and returns all records.
index.search(
query={"name": "John Doe"},
sort_by="age",
reversed_sort=True,
n=10,
page_no=1,
select_keys=["name", "age"],
update={"verified": True}
)
index.search(
query={"name": "John Doe"},
sort_by="user_embedding",
sort_by_embedding=np.array([1, 2, 3]), # should be normalized and same size as user_embedding
reversed_sort=True,
n=10,
page_no=1,
select_keys=["name", "age"],
update={"verified": True}
)
params
query
: Query dictionary.Defaults to {}
which will return count of all records.return
: int or None
example use
index.count()
index.count({"name": "Joe Biden"})
params
key
: key from schema to get distinct values for,no default
query
: query dictionary to get distinct values for records matching the query.defaults to {}
- gets distinct values for all recordsreturn
: set of distinct values
example use
index.distinct("name")
index.distinct("name", query={"gender": "female"})
params
keys
: single key or list of keys to group by,no default
query
: query dictionary to group records matching the query.defaults to {}
- groups all records
index.group()
- Optimizes the index for search on a key *** params ***
key
: key from schema to optimize for search,no default
is_unique
:defaults to False
, if True, will not allow duplicate values for the key
index.optimize_for_query(key="name", is_unique=True)
*** params ***
return
: {key: {"is_unique": bool}}
list_optimized_keys()