Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create extracting-data-from-master-data-with-search-and-scroll.md #1470

Merged
merged 7 commits into from
Oct 17, 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: "Extracting data from Master Data with search and scroll"
slug: "extracting-data-from-master-data-with-search-and-scroll"
hidden: false
createdAt: "2024-09-27T10:00:00.000Z"
updatedAt: "2024-09-27T10:00:00.000Z"
---
julia-rabello marked this conversation as resolved.
Show resolved Hide resolved

In this guide, you will learn how to extract data from Master Data using the search and scroll endpoints, including when to use each route, how to optimize queries, and best practices.
julia-rabello marked this conversation as resolved.
Show resolved Hide resolved

>ℹ In Master Data v1, you can export data directly from the interface. See [Exporting data from Master Data v1](https://help.vtex.com/en/tutorial/exporting-data--tutorials_1125) for more information.

## Search

The search route is ideal when you need to find a specific set of documents within your store. It is particularly useful for paginated queries, where you want to retrieve up to 10000 documents in small chunks over multiple requests. Each page is limited to 100 documents.
julia-rabello marked this conversation as resolved.
Show resolved Hide resolved

>ℹ When paginating, the `_sort` parameter is recommended. The API by itself does not guarantee order, so without a defined `_sort`, documents may return duplicate or not return at the expected page.

See the Search endpoint reference depending on the Master Data version you are using:

* [Master Data API v1 - Search](https://developers.vtex.com/docs/api-reference/masterdata-api#get-/api/dataentities/-acronym-/search)
* [Master Data API v2 - Search](https://developers.vtex.com/docs/api-reference/master-data-api-v2#get-/api/dataentities/-dataEntityName-/search)

### Best practices

When using the search endpoint, these best practices will help enhance your data retrieval process:
julia-rabello marked this conversation as resolved.
Show resolved Hide resolved

* **Apply filters to narrow your search**: Improve performance by reducing the number of documents returned. This speeds up the query and ensures that your requests are more efficient.

* **Use exact values for queries instead of wildcards (`*`):** Heavy usage of wildcards may be subject to temporary blocks.

* **Avoid large datasets:** If you are querying many documents, break your query into smaller intervals.

## Scroll

The scroll route is designed for extensive data retrieval, especially when integrating Master Data with external systems. It is the best choice if you need to query the entire database or when dealing with over 10000 documents.
julia-rabello marked this conversation as resolved.
Show resolved Hide resolved

See the Scroll endpoint reference depending on the Master Data version you are using:

* [Master Data API v1 - Scroll](https://developers.vtex.com/docs/api-reference/masterdata-api#get-/api/dataentities/-acronym-/scroll)
* [Master Data API v2 - Scroll](https://developers.vtex.com/docs/api-reference/master-data-api-v2#get-/api/dataentities/-dataEntityName-/scroll)

Your first scroll request will return a token in the `X-VTEX-MD-TOKEN` response header. Inform this value in the `_token` query parameter for your next requests until you receive an empty list, indicating that all documents have been retrieved.

### Scroll best practices

To ensure efficient and reliable data retrieval, follow these strategies when using the scroll endpoint:

* **Implement filters to divide the request into smaller batches,** reducing the likelihood of timeouts. For example, you might filter by creation date and process data month by month. Smaller batches are also easier to reprocess if a timeout occurs, making your operation more resilient.
* **Run up to 10 scrolls simultaneously per account.** Limiting the number of parallel scrolls helps prevent errors and timeouts. By using filters to create smaller batches and parallelizing these batches in a controlled manner, you can speed up data retrieval while reducing the risk of overloading the account.

>⚠️ **Scroll behavior and limitations**
>
> * **Each scroll operation allows only one query for the duration of the token.** This means that you cannot change a scroll’s query by changing parameters after the first request: you can navigate pages of the original first request until the token expires, or initiate other scrolls (up to 10 simultaneously).
> * If Master Data stops receiving requests with the scroll `X-VTEX-MD-TOKEN` token, it will expire in **20 minutes**. After that, you can make new scroll requests, limited to 10 simultaneous scrolls.
> * The maximum number of documents per scroll request is **1000**.
Loading