-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add missing parts for rest client docs #1397
Conversation
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments
@@ -80,7 +80,7 @@ For example, if the API response looks like this: | |||
} | |||
``` | |||
|
|||
The `data_selector` should be set to `"posts"` to extract the list of posts from the response. | |||
The `data_selector` should be set to `"posts"` or `"$.posts"` to extract the list of posts from the response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some people are used to use JSONPath starting with $.
so this is just to give relation to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but I think if they know JSONPath already they would know that extended syntax anyway. I would try to optimize here for those unfamiliar with JSONPath. We also link JSONPath docs twice for those who need advanced JSONPath.
@@ -96,7 +96,7 @@ For a nested structure like this: | |||
} | |||
``` | |||
|
|||
The `data_selector` needs to be set to `"results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors. | |||
The `data_selector` needs to be set to `"results.posts"` or `"$.results.posts"`. Read more about [JSONPath syntax](https://github.com/h2non/jsonpath-ng?tab=readme-ov-file#jsonpath-syntax) to learn how to write selectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for this. Why would we need to have an alternative declaration here?
## Common resource defaults | ||
|
||
In `RESTAPIConfig` you can provide via `resource_defaults` which will then be applied to all requests | ||
|
||
```py | ||
my_params = { | ||
"from_year": 2018, | ||
"end_year": 2024, | ||
} | ||
|
||
source_config: RESTAPIConfig = { | ||
"client": {...}, | ||
"resource_defaults": { | ||
"endpoint": { | ||
"params": my_params, | ||
} | ||
} | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part does not belong to this document. This is documentation for RESTClient and not rest_api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I will remove it
### Incremental loading | ||
|
||
It is often needed to load only the new data based on some incremental property be it timestamp, date and time, integer identifier or a cursor value. | ||
Fortunately our `RESTClient` allows you to elegantly express this behavior. | ||
|
||
Let's use our slightly modified example response json and we want to load new posts as they appear without complete reload of data. | ||
|
||
```json | ||
{ | ||
"data": [ | ||
{ "id": 1, "title": "Post 1", "created_at": "2010-08-21T17:11:27-0400" }, | ||
{ "id": 2, "title": "Post 2", "created_at": "2010-09-21T17:11:27-0400" }, | ||
{ "id": 3, "title": "Post 3", "created_at": "2010-10-21T17:11:27-0400" } | ||
] | ||
} | ||
``` | ||
|
||
To achive our objective we need to use `endpoint.params` by adding the incremental type. | ||
In the following examples we use `id` - primary key and `created_at` - creation datetime. | ||
|
||
**Incremental loading by id** | ||
|
||
```py | ||
source_config: RESTAPIConfig = { | ||
"resources": [ | ||
{ | ||
"name": "get_posts_list", | ||
"table_name": "posts", | ||
"endpoint": { | ||
"data_selector": "$.data", | ||
"path": "/posts", | ||
"params": { | ||
"post_id": { | ||
"type": "incremental", | ||
"cursor_path": "id", | ||
"initial_value": 1, | ||
} | ||
}, | ||
}, | ||
} | ||
] | ||
} | ||
``` | ||
|
||
**Incremental loading by creation date** | ||
|
||
```py | ||
source_config: RESTAPIConfig = { | ||
"resources": [ | ||
{ | ||
"name": "get_posts_list", | ||
"table_name": "posts", | ||
"endpoint": { | ||
"data_selector": "$.data", | ||
"path": "/posts", | ||
"params": { | ||
"creation_date": { | ||
"type": "incremental", | ||
"cursor_path": "created_at", | ||
"initial_value": "2010-08-21T17:11:27-0400", | ||
} | ||
}, | ||
}, | ||
} | ||
] | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place for this section: rest-client.md is only for documenting RESTClient class & relevant functionality and not rest_api source. Incremental loading is covered in rest_api here: https://dlthub.com/docs/dlt-ecosystem/verified-sources/rest_api#incremental-loading
**Custom combined auth:** | ||
Sometimes you need to pass authentication parameters via headers as well as query params | ||
|
||
```py | ||
from dlt.sources.helpers.rest_client.auth import AuthConfigBase | ||
|
||
class CombinedAuth(AuthConfigBase): | ||
def __init__(self, client_id: str, client_secret: str): | ||
self.client_id = client_id | ||
self.client_secret = client_secret | ||
|
||
def __call__(self, request): | ||
# Modify the request object to include the necessary authentication headers and request params | ||
request.headers["Authorization"] = f"Bearer {self.client_secret}" | ||
request.prepare_url(request.url, {"client_id": self.client_id}) | ||
return request | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we need this example here: the difference that I see here from the previous example is that it shows that request
is a PreparedRequest instance (because we can call prepare_url
on it). I think showing that request
is a PreparedRequest has a value, so instead of the example I would add explicit types in the previous example (like you did with token: str) and add a text description elaborating on what __call__
actually is receiving and linking Requests docs from PreparedRequest so the reader can quickly see what's possible.
This PR improves and add more details on certain features of our rest client like
resource_defaults
,data_selector
path examples,TODO