Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note on using write disposition in Weaviate #623

Merged
merged 2 commits into from
Sep 12, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docs/website/docs/dlt-ecosystem/destinations/weaviate.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Weaviate
description: Weaviate is an open source vector database that can be used as a destination in the DLT.
description: Weaviate is an open source vector database that can be used as a destination in dlt.
keywords: [weaviate, vector database, destination, dlt]
---

Expand Down Expand Up @@ -145,7 +145,7 @@ info = pipeline.run(

### Merge

The [merge](../../general-usage/incremental-loading.md) disposition merges the data from the resource with the data in the destination.
The [merge](../../general-usage/incremental-loading.md) write disposition merges the data from the resource with the data in the destination.
For `merge` disposition you would need to specify a `primary_key` for the resource:

```python
Expand All @@ -161,6 +161,13 @@ info = pipeline.run(

Internally dlt will use `primary_key` (`document_id` in the example above) to generate a unique identifier ([UUID](https://weaviate.io/developers/weaviate/manage-data/create#id)) for each object in Weaviate. If the object with the same UUID already exists in Weaviate, it will be updated with the new data. Otherwise, a new object will be created.


:::caution

If you are using the merge write disposition, you must set it from the first run of your pipeline, otherwise the data will be duplicated in the database on subsequent loads.

:::

### Append

This is the default disposition. It will append the data to the existing data in the destination ignoring the `primary_key` field.
Expand Down