Skip to content

Commit

Permalink
chore: Refine docs to use the same title convention (#30)
Browse files Browse the repository at this point in the history
Signed-off-by: Ce Gao <[email protected]>
  • Loading branch information
gaocegege authored Jan 19, 2024
1 parent a450d8c commit 9631855
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 37 deletions.
6 changes: 3 additions & 3 deletions .vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ export default defineConfig({
text: 'Use Cases',
collapsed: false,
items: [
{ text: 'Hybrid Search', link: '/use-cases/hybrid-search' },
{ text: 'Hybrid search', link: '/use-cases/hybrid-search' },
],
},
{
Expand All @@ -88,9 +88,9 @@ export default defineConfig({
collapsed: false,
items: [
{ text: 'Configuration', link: '/admin/configuration' },
{ text: 'Upgrading from older versions', link: '/admin/upgrading' },
{ text: 'Upgrading', link: '/admin/upgrading' },
{ text: 'Logical replication', link: '/admin/logical_replication' },
{ text: 'Foreign Data Wrapper (FDW)', link: '/admin/fdw' },
{ text: 'Foreign data wrapper (FDW)', link: '/admin/fdw' },
{ text: 'Kubernetes', link: '/admin/kubernetes' },
],
},
Expand Down
32 changes: 16 additions & 16 deletions src/admin/fdw.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Setting up Foreign Data Wrapper (FDW) reduce the pressure on the primary instance
# Foreign data wrapper (FDW)

Vector retrieval is a query that consumes CPU and IO, even if there already have an index. If this query is run on the primary instance, it can negatively impact its performance. To alleviate this issue, it is recommended to execute the query on a specific PostgreSQL cluster, which will reduce the pressure on the primary instance.
[Foreign data wrapper(FDW)](https://wiki.postgresql.org/wiki/Foreign_data_wrappers) is an extension in PostgreSQL that allows you to access and query data stored in remote databases as if they were local tables. FDW provides a way to integrate and interact with different data sources.

[Foreign data wrapper(FDW)](https://wiki.postgresql.org/wiki/Foreign_data_wrappers) is a module that you can use to access and interact with an external data (foreign data) source. They allow you to query foreign objects from remote servers as if they were local objects. Postgres now has a lot of foreign data wrappers available and they work with plenty of different source types: NoSQL databases, platforms like Twitter and Facebook, geospatial data formats, etc.
With the FDW extension, you can define foreign tables in your local PostgreSQL database that mirror the structure of tables in remote databases. These foreign tables act as proxies or wrappers for the remote data, enabling you to perform SELECT, INSERT, UPDATE, and DELETE operations on them, just like regular tables in PostgreSQL.

In this tutorial, we will use the [`postgres_fdw`](https://www.postgresql.org/docs/current/postgres-fdw.html) module, which includes the foreign-data wrapper. This wrapper can be used to access data stored in external PostgreSQL servers. This article will explain how to use `postgres_fdw` to access index data in a foreign PostgreSQL cluster that already has the [`pgvecto.rs`](https://github.com/tensorchord/pgvecto.rs) extension installed.

## Deploying PostgreSQL Clusters
## Deploy local and foreign PostgreSQL clusters

In this tutorial, we will use docker compose to deploy two PostgreSQL clusters.

Expand Down Expand Up @@ -54,18 +54,18 @@ DROP EXTENSION IF EXISTS "vectors";
CREATE EXTENSION "vectors";
```

## Foreign Database Operations
## Foreign database operations

First, we need create a table `test` and build an index on it in the foreign db. The `test` table has two columns: `id` and `embedding`. The `embedding` column is a vector column, and its type is `vector(10)`. The `id` column is the primary key of the `test` table.

### Create Table In Foreign DB
### Create a table
```sql
DROP TABLE IF EXISTS test;
CREATE TABLE test (id integer PRIMARY KEY, embedding vector(10) NOT NULL);
INSERT INTO test SELECT i, ARRAY[random(),random(),random(),random(),random(),random(),random(),random(),random(),random()]::real[] FROM generate_series(1, 100) i;
```

### Create User In Foreign DB
### Create a user

Create a user named `fdw_user` in foreign db, and grant `SELECT`, `INSERT`, `UPDATE`, `DELETE` privileges on table `test` to `fdw_user`.

Expand All @@ -74,35 +74,35 @@ CREATE USER fdw_user WITH ENCRYPTED PASSWORD 'secret';
GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE test TO fdw_user;
```

### Create Index In Foreign DB
### Create the index

We create an index on the `embedding` column of the `test` table. The index type is flat, it is a brute force algorithm. We choose `vector_l2_ops` squared Euclidean distance to measure the distance between vectors. Another index type and distance function can be found in [here](https://docs.pgvecto.rs/usage/indexing.html).

```sql
CREATE INDEX ON test USING vectors (embedding vector_l2_ops) WITH (options = "[indexing.flat]");
```

## Local Database Operations
## Local database operations

In local database, we need to create a table `local` and a foreign server `foreign_server`. The `local` table has two columns: `id` and `name`. The `id` column is the primary key of the `local` table. The `foreign_server` is a foreign server, which is used to access the foreign db.

### Create Local Table
### Create local table

```sql
DROP TABLE IF EXISTS local;
CREATE TABLE local (id integer PRIMARY KEY, name VARCHAR(50) NOT NULL);
INSERT INTO local (id, name) VALUES (1, 'terry'), (2, 'jason'), (3, 'curry');
```

### Create User In Local DB
### create local user

Using superuser, execute the following statement in the local PostgreSQL database to create a regular user named `local_user`.

```sql
CREATE USER local_user WITH ENCRYPTED PASSWORD 'secret';
```

### Create Foreign Server
### Create the foreign server

To create an external server using the `CREATE SERVER` statement, you need to specify the host, port, and database name of the remote database.

Expand All @@ -112,7 +112,7 @@ CREATE SERVER foreign_server
OPTIONS (host '<foreign_db_ip>', port '5432', dbname 'postgres');
```

### Create User Mapping
### Create the user mapping

Use the `CREATE USER MAPPING` statement to create a mapping between remote users and local users, requiring the username and password of the remote user.

Expand All @@ -121,7 +121,7 @@ CREATE USER MAPPING FOR local_user
SERVER foreign_server
OPTIONS (user 'fdw_user', password 'secret');
```
### Create Foreign Table
### Creat the foreign table

Use the `CREATE FOREIGN TABLE` statement to create a remote table. It is important to note that the types of each column should match those of the actual remote table, and it's best to keep the column names consistent. Otherwise, you will need to use the column_name parameter to specify the column name in the remote table for each individual column.

Expand All @@ -144,7 +144,7 @@ GRANT USAGE ON FOREIGN SERVER foreign_server TO local_user;
GRANT SELECT,INSERT,UPDATE,DELETE ON ALL TABLES IN SCHEMA public TO local_user;
```

## Join Query In Local DB
## Query

Now we can use join query to access the foreign table in local db.
```shell
Expand All @@ -169,4 +169,4 @@ EXPLAIN SELECT l.id, l.name FROM local l LEFT JOIN foreign_test f on l.id = f.id
-> Foreign Scan on foreign_test f (cost=100.00..150.95 rows=1365 width=36)
-> Hash (cost=15.40..15.40 rows=540 width=122)
-> Seq Scan on local l (cost=0.00..15.40 rows=540 width=122)
```
```
30 changes: 15 additions & 15 deletions src/admin/logical_replication.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Setting up logical replication
# Logical replication

Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). It is often used in migration, Change Data Capture(CDC), and fine-grained database integration and access control.
Logical replication is a feature in PostgreSQL that enables the replication of individual database objects or a subset of data from one PostgreSQL database to another.

In this article, we will introduce how to set logical replication for pgvecto.rs index. Even if the source index instance goes down, you can still query the target database using the index.
With logical replication, you can selectively replicate specific tables, databases, or even specific rows based on predefined replication rules. This provides more flexibility compared to physical replication, where the entire database cluster is replicated. It allows you to design custom replication topologies and replicate only the data that is necessary for your use case.

## Deploying PostgreSQL Clusters
We will show you how to use logical replication to replicate data from one PostgreSQL database to another.

In this tutorial, we will use docker compose to deploy two PostgreSQL clusters.
## Deploy source and target PostgreSQL clusters

``` shell
$ echo 'version: "3.7"
Expand Down Expand Up @@ -55,22 +55,22 @@ DROP EXTENSION IF EXISTS "vectors";
CREATE EXTENSION "vectors";
```

## Set Logical Replication
## Configure logical replication

Now, we can set logical replication between source database and target database.

### Prepare Data
### Prepare test data

We need to create a table named `test` with a column named `embedding` of type `vector(10)` in source database and target database. Then we create an index on the `embedding` column of the `test` table in source database and target database. Finally, we insert data into the source database.

#### Create test table
#### Create the table

```sql
DROP TABLE IF EXISTS test;
CREATE TABLE test (id integer PRIMARY KEY, embedding vector(10) NOT NULL);
```

#### Create index
#### Create the index

We create an index on the `embedding` column of the `test` table in source database and target database. The index type is flat, it is a brute force algorithm. We choose `vector_l2_ops` squared Euclidean distance to measure the distance between vectors. Another index type and distance function can be found in [here](https://docs.pgvecto.rs/usage/indexing.html).

Expand Down Expand Up @@ -114,11 +114,11 @@ SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824, 0.37059402,
(0 rows)
```

### Config Logical Replication
### Set up logical replication

To simplify the process of setting up logical replication between two PostgreSQL databases, we will use [pg-easy-replicate](https://github.com/shayonj/pg_easy_replicate).

#### Config Check
We use [pg-easy-replicate](https://github.com/shayonj/pg_easy_replicate) to set up logical replication between source database and target database.

```shell
# get the network name
Expand All @@ -141,17 +141,17 @@ Every sync will need to be bootstrapped before you can set up the sync between t
$ docker run -e SOURCE_DB_URL="postgres://postgres:password@<source_db_ip>:5432/postgres" -e TARGET_DB_URL="postgres://postgres:password@<target_db_ip>:5432/postgres" -it --rm --network=pg_regress_localnet shayonj/pg_easy_replicate:latest pg_easy_replicate bootstrap --group-name database-cluster-1 --copy-schema
```

### Start Sync
### Start sync

Once the bootstrap is complete, you can start the sync. Starting the sync sets up the publication, subscription and performs other minor housekeeping things.

```shell
$ docker run -e SOURCE_DB_URL="postgres://postgres:[email protected]:5432/postgres" -e TARGET_DB_URL="postgres://postgres:[email protected]:5432/postgres" -it --rm --network=pg_regress_localnet shayonj/pg_easy_replicate:latest pg_easy_replicate start_sync --group-name database-cluster-1
```

## Test Logical Replication
## Test logical replication

### Query In Target Database
### Query

Now we can query the target database to get the nearest neighbor of a vector in the `embedding` column of the `test` table. The result is the same as the source database.

Expand All @@ -171,7 +171,7 @@ postgres=# SELECT id FROM test ORDER BY embedding <-> '[0.40671515, 0.24202824,
(10 rows)
```

#### Update Index
#### Update index

If you insert data into the source database, the data will be synchronized to the target database. Insert data into the source database:

Expand Down
8 changes: 5 additions & 3 deletions src/integration/langchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ LangChain is a framework for developing applications powered by language models.

`pgvecto.rs` provides a LangChain integration that allows you to retrieve the most similar vectors in LangChain.

## Pre-requisites
## Install dependencies

Some dependencies are required to use the LangChain integration:

Expand Down Expand Up @@ -37,7 +37,7 @@ DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;
```

## Create a vector store from scratch
## Create the database and load documents

We will show how to use `pgvecto.rs` in LangChain to retrieve the most similar vectors.

Expand Down Expand Up @@ -99,6 +99,8 @@ db = PGVecto_rs.from_documents(
)
```

## Query

Finally, we can retrieve the most similar vectors in LangChain.

```python
Expand Down Expand Up @@ -148,7 +150,7 @@ For new users, we recommend using the [Docker image](https://hub.docker.com/r/te
...
```

## Add vectors to an existing store
## Initialize from an existing database

Above, we created a vector store from scratch. However, often times we want to work with an existing vector store. In order to do that, we can initialize it directly.

Expand Down

0 comments on commit 9631855

Please sign in to comment.