diff --git a/docs/website/docs/examples/qdrant_zendesk/code/qdrant-snippets.py b/docs/website/docs/examples/qdrant_zendesk/code/qdrant-snippets.py index b2f606fbf4..378b4ff14e 100644 --- a/docs/website/docs/examples/qdrant_zendesk/code/qdrant-snippets.py +++ b/docs/website/docs/examples/qdrant_zendesk/code/qdrant-snippets.py @@ -62,4 +62,7 @@ def main(): print(response) + # @@@DLT_REMOVE + assert len(response) == 3 + # @@@DLT_SNIPPET_END example \ No newline at end of file diff --git a/docs/website/docs/examples/qdrant_zendesk/index.md b/docs/website/docs/examples/qdrant_zendesk/index.md index 5faf5cb666..63652515dc 100644 --- a/docs/website/docs/examples/qdrant_zendesk/index.md +++ b/docs/website/docs/examples/qdrant_zendesk/index.md @@ -23,14 +23,21 @@ In order to begin the import, we must setup first. This means making sure we hav ```commandline pip install dlt[qdrant] ``` -Next, configure the destination credentials for Qdrant in .dlt/secrets.toml as follows: + +Next, install the dlt Zendesk verified source: + +```commandline +dlt init zendesk qdrant +``` + +Now, configure the destination credentials for Qdrant in .dlt/secrets.toml as follows: ```commandline [destination.qdrant.credentials] location = "https://your-qdrant-url" api_key = "your-qdrant-api-key" ``` -Lastly, within the same secrets.toml file, we must also declare credentials for the source, Zendesk: +And lastly, within the same secrets.toml file, we must also declare credentials for the source, Zendesk: ```commandline [sources.zendesk.zendesk_support.credentials] @@ -56,7 +63,7 @@ def main(): pipeline = dlt.pipeline( pipeline_name="qdrant_zendesk_pipeline", destination="qdrant", - dataset_name="zendesk_data_tickets", + dataset_name="zendesk_data", ) # 2. Initialize Zendesk source to get the ticket data @@ -82,7 +89,7 @@ if __name__ == "__main__": Overview of the code above: -1. We create a pipeline with the name `qdrant_zendesk_pipeline` and the destination Qdrant. The name of the dataset here will be the same as the "collection" name on Qdrant! +1. We create a pipeline with the name `qdrant_zendesk_pipeline` and the destination Qdrant. The name of the dataset here will be the same as the "collection" name on Qdrant. 2. Then, we initialize the Zendesk verified source. We only need to load the tickets data, so we get tickets resource from the source by getting the tickets attribute. 3. pipeline.run() runs the pipeline and returns information about the load process. 4. Qdrant being a vector database, specializes in conducting similarity searches within its **embedded data**. To make that possible, we use the special Qdrant adapter to **embed** (or vectorize) our data before loading it. @@ -113,7 +120,9 @@ print(qdrant_client.get_collections()) You should be able to see your own data there. For the purposes of this article, it would be the same name as the dataset you declared in the pipeline above, i.e. `zendesk_data_tickets`. -Next, we query Qdrant to conduct a similarity search using their "query" function. For example, we would like to see tickets that are similar to the ("subject", "description") pair: ("cancel", "cancel subscription"). It can be queried as follows: +Next, we query Qdrant to conduct a similarity search using their "query" function. For example, we would like to see tickets that are similar to the ("subject", "description") pair: ("cancel", "cancel subscription"). It can be queried using the collection/dataset name, a prompt to search around. + +In this example, let's also add a limit to the number of results that we will see. Here, it has been set to 3. Additionally, the collection name here is not the same as the one we declared. That is because dlt has modeled the data into various different tables (that you would have seen when you listed the collections). The tables can be identified by the suffixes added to the dataset name that you declared. The tickets data has been put into the table `zendesk_data_tickets`. ```py