Skip to content

Commit

Permalink
Add user documentation for Kvrocks Search
Browse files Browse the repository at this point in the history
  • Loading branch information
PragmaTwice committed Nov 10, 2024
1 parent b05facb commit a62a786
Show file tree
Hide file tree
Showing 2 changed files with 270 additions and 0 deletions.
269 changes: 269 additions & 0 deletions docs/kvrocks-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
# Search

**Apache Kvrocks™** Search, also known as **Kvrocks Search** (or KQIR, as a technical term), is an internal component of Apache Kvrocks™. It functions as a query engine that supports (secondary) indexing on structured data and complex queries by effectively utilizing various indexes.

In addition to being compatible with many commands and the query syntax of [RediSearch](https://redis.io/docs/latest/develop/interact/search-and-query/) (e.g. [FT.CREATE](#ftcreate) and [FT.SEARCH](#ftsearch)), Kvrocks Search also offers support for SQL syntax to accommodate various scenarios (via [FT.SEARCHSQL](#ftsearchsql-extension) and other related commands).

Kvrocks Search is currently in the experimental stage and only available on the `unstable` branch. We do not provide compatibility guarantees at this time. If you encounter any problems, please submit them to [GitHub issues](https://github.com/apache/kvrocks/issues).

For its implementation details, please refer to [this blog post](../blog/kqir-query-engine).

## Supported Commands

Currently, Kvrocks has supported some of the main commands in RediSearch, these commands are mostly used for creating indexes, managing indexes (listing, showing details, deleting), and querying.

### FT.SEARCH

```
FT.SEARCH index query
[RETURN count identifier [ identifier ...]]
[SORTBY sortby [ ASC | DESC]]
[LIMIT offset num]
[PARAMS nargs name value [ name value ...]]
```

`FT.SEARCH` is to perform a `query` (in RediSearch query syntax) on a given `index` (created by `FT.CREATE`).

Additional parameters:
- `RETURN` to control which fields will be presented in the output;
- `SORTBY` to control the order of rows in the output (same as `ORDER BY` in SQL);
- `LIMIT` to control how many rows and the offset of actual results in the output;
- `PARAMS` to supply additional information to the parameterized query.

Please refer to [here](#redisearch-query-syntax) to check available syntax of `query`.

### FT.EXPLAIN

```
FT.EXPLAIN index query
[RETURN count identifier [ identifier ...]]
[SORTBY sortby [ ASC | DESC]]
[LIMIT offset num]
[PARAMS nargs name value [ name value ...]]
```

`FT.EXPLAIN` is to obtain a plan on how Kvrocks will execute the `query` (a.k.a. the query plan).

### FT.CREATE

```
FT.CREATE index
[ON HASH | JSON]
[PREFIX count prefix [prefix ...]]
SCHEMA field_name TAG | NUMERIC | VECTOR [FIELD PROPERTIES ...] [NOINDEX]
[ field_name TAG | NUMERIC | VECTOR [FIELD PROPERTIES ...] [NOINDEX]
...]
```

`FT.CREATE` is to create a new `index` with a given schema.

Addtional parameters:
- `ON HASH | JSON`: the data type of keys to be indexed;
- `PREFIX`: the prefix of keys to be indexed.

Schema details:
- `field_name`: name of the field, multiple of which an index is composed of;
- `TAG | NUMERIC | VECTOR`: currently only these 3 types of fields is supported;
- `FIELD PROPERTIES`: additional properties of this field; depends on the field type;
- `NOINDEX`: do not indexing data on this field (just for filtering data on queries).

### FT.DROPINDEX

```
FT.DROPINDEX index
```

`FT.DROPINDEX` is to drop the given `index` to delete all indexing data and index information.

### FT._LIST

```
FT._LIST
```

`FT._LIST` is to list names of all indexes (in the current namespace).

### FT.INFO

```
FT.INFO index
```

`FT.INFO` is to obtain detailed information of the given `index`.

The output format of this command is like:

```
1) index_name
2) ...
3) index_definition
4) 1) key_type
2) ...
3) prefixes
4) 1) ...
2) ...
5) fields
6) 1) 1) identifier
2) ...
3) type
4) "tag"
5) options
6) ...
2) 1) identifier
2) ...
3) type
4) "numeric"
5) options
6) ...
3) ...
```

Note that the output format may change as Kvrocks Search is currently experimental.

### FT.SEARCHSQL (extension)

```
FT.SEARCHSQL sql
[PARAMS nargs name value [ name value ...]]
```

`FT.SEARCHSQL` is to perform a `sql` query on an index created by `FT.CREATE`.

Additional parameters:
- `PARAMS` to supply additional information to the parameterized query.

### FT.EXPLAINSQL (extension)

```
FT.EXPLAINSQL sql
[PARAMS nargs name value [ name value ...]]
[SIMPLE | DOT]
```

`FT.EXPLAINSQL` is to obtain a plan on how Kvrocks will execute the `sql` query (a.k.a. the query plan).

Additional parameters:
- `PARAMS`: same as in `FT.SEARCHSQL`;
- `SIMPLE`: print a simple representation of the query plan;
- `DOT`: print the query plan in Graphviz [DOT](https://en.wikipedia.org/wiki/DOT_(graph_description_language)) format (which can be used to generate a graphical representation of a directed graph).

## SQL syntax

Currently Kvrocks supports an extended subset of the MySQL query syntax, in particular the `SELECT` statement:

```
SELECT
* | field [, field ...]
FROM index_name
WHERE query_expr
ORDER BY
field_name [ASC | DESC] | vec_field <-> vec < range
LIMIT [offset] count
```

where the query expression `query_expr` can be:

```
true | false |
(query_expr) |
query_expr AND query_expr |
query_expr OR query_expr |
NOT query_expr |
tag_field HASTAG tag |
num_atom NUM_OP num_atom |
vec_field <-> vec < range
```

where the numeric operation `NUM_OP` can be:

```
< | <= | > | >= | !=
```

and the `num_atom` can be:

```
num_field | num_literal
```

Also, these literals inside the query in can be parameters `@param_name`,
e.g. `a < 233` can be `a < @num` with `PARAMS 1 num 233` supplied to the `FT.SEARCHSQL`.

## RediSearch query syntax

Currently Kvrocks also supports a subset of [the RediSearch query syntax](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/).

RediSearch controls the evolution of the query syntax through [dialect versioning](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/dialects/).
Currently, Kvrocks supports `DIALECT 2`.
And in future developments, we may support higher versions of dialect (currently, 3 and 4), but `DIALECT 1` is NOT considered for support.

The followings are the query clauses currently supported in Kvrocks, and you can compose them via `clause | clause` (OR), `clause clause` (AND) and `-clause` (NOT):
- `*`, i.e. `true` in SQL;
- `@num_field:[NUM_BOUND NUM_BOUND]`, e.g. `@a:[1 (3]` means `a >= 1 and a < 3`;
- `@tag_field:{tag [|tag ...]}`, e.g. `@b:{x | y}` means `b hastag x or b hastag y`;
- `@vec_field:[VECTOR_RANGE range $vec]` for vector range query.

where `NUM_BOUND` can be:
```
num
| (num
| INF
| +INF
| -INF
```

Also KNN query without prefiltering is supported:
```
* => [KNN n @vec_field $vec]
```

Also, these literals inside the query in can be parameters `$param_name`,
e.g. `@a:[inf 233]` can be `@a:[inf $num]` with `PARAMS 1 num 233` supplied to the `FT.SEARCH`.

## Field types

An index in RediSearch consists of multiple fields, and fields can be in different types.
Currently, Kvrocks supports three field types:
- `TAG`: a tag field can hold a set of string tags, to filter rows by specific tags in queries;
- `NUMERIC`: a numeric field can hold a floating point number;
- `VECTOR`: a vector field can hold a vector, for performing vector search.

### Tag

Field properties:
```
SCHEMA field_name TAG
[SEPARATOR sep]
[CASESENSITIVE]
```

By default, the `SEPARATOR` is `,` and `CASESENSITIVE` is not set.

The only operation for tag field in queries is to check if a row is labeled by tag, i.e. `tag_field HASTAG tag` in SQL.

### Numeric

Numeric field has no field properties, i.e.
```
SCHEMA field_name NUMERIC
```

As shown in the query syntax, numeric fields can be used in numeric comparison to filter data.

### Vector

Field properties:
```
SCHEMA field_name VECTOR HNSW nargs
TYPE FLOAT64
DIM dim
DISTANCE_METRIC L2 | IP | COSINE
[M m]
[EF_CONSTRUCTION ef_construcion]
[EF_RUNTIME ef_runtime]
[EPSILON epsilon]
```

Currently the indexing algorithm of vector field can only be `HNSW`,
and the `TYPE` of HNSW vector field can only be `FLOAT64`.
We may extend it to more types like `FLOAT32` and `FLOAT16`.
1 change: 1 addition & 0 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ const sidebars = {
'getting-started',
'namespace',
'cluster',
'kvrocks-search',
'replication',
{
"type": "category",
Expand Down

0 comments on commit a62a786

Please sign in to comment.