-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples for docs #616
examples for docs #616
Conversation
* add custom snippets element
✅ Deploy Preview for dlt-hub-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
d9dce90
to
b088cec
Compare
bebd527
to
249ebe7
Compare
78fadeb
to
2323a3b
Compare
20614f2
to
15c5d0e
Compare
15c5d0e
to
180de6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on top of the comments:
should we have docosaurus directly in /docs? o docs/docs? not in docs/website/docs?
we can just delete archives or skip it from being crawled by docosaursu
Makefile
Outdated
@@ -52,7 +52,7 @@ lint: | |||
# $(MAKE) lint-security | |||
|
|||
test-and-lint-snippets: | |||
poetry run mypy --config-file mypy.ini docs/website docs/examples | |||
poetry run mypy --config-file mypy.ini --namespace-packages --explicit-package-bases docs/website docs/examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a script that creates missing init files, check make lint
. or you do not want that to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this again
<div>{props.intro}</div> | ||
|
||
## Setup: Running this example on your machine | ||
<CodeBlock language="sh"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to use this and not
```sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the variable will not work in the markdown block unfortunately..
docs/website/docs/examples/incremental_loading/code/run-snippets.py
Outdated
Show resolved
Hide resolved
Yes, I think that would be nice. But we should do this when there are no open documentation branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are good with it so let's write two examples and merge it. I'll take a look at pokemon and maybe add google sheets example?
docs/website/docs/examples/incremental_loading/code/run-snippets.py
Outdated
Show resolved
Hide resolved
dlt/common/schema/schema.py
Outdated
@@ -392,7 +392,7 @@ def _coerce_non_null_value(self, table_columns: TTableSchemaColumns, table_name: | |||
raise CannotCoerceColumnException(table_name, col_name, py_type, table_columns[col_name]["data_type"], v) | |||
# otherwise we must create variant extension to the table | |||
# pass final=True so no more auto-variants can be created recursively | |||
# TODO: generate callback so DLT user can decide what to do | |||
# TODO: generate callback sodltuser can decide what to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow surrounding spaces got eaten. and that happened in many places. I hope you can revert the commit and do that again?
bef2ec9
to
78577ea
Compare
86b209f
to
eb7b780
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
instead of adding zendesk credentials, use the same method we have in verified sources (via google secrets)
also because we rely on credentials to run tests we prevent external contributors from using our CI. could we have a pytest skip that will skip the test if it detects we run on github CI and from the fork (easy - all in env variables) -
python file names are always "run.py". those module names are used to create sections for config and secrets. let's use a meaningful name ie. zendesk, pokemon
-
title the example so it is clear what it does and what is the main highlight:
- "Get Pokemon details in parallel using transformers"
- "Load zendesk tickets incrementally"
Structure is OK, what I'd change
TLDR -> Summary - One sentence what example does
Highlights
3-5 bullet points with the most interesting stuff or building blocks used. ideal bullet point
- one sentence explains what we do in the example + links the docs page (if applicable) + shows relevant code snippet (if the example is long)
IMO we do not need to include the whole snippet (maybe if it is short)
example:
- we configure incremental loading for the resource that takes start and end date passed from the source (link to incremental docs) and allows it to use Airflow schedule (link to the docs)
code snippet
# go to example directory | ||
cd ./dlt/docs/examples/${props.slug} | ||
# install dlt | ||
pip3 install dlt |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -0,0 +1,3 @@ | |||
# @@@DLT_SNIPPET_START example | |||
some_key="some_value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
@@ -0,0 +1,130 @@ | |||
|
|||
|
|||
def incremental_snippet() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sh-rp running this as a test requires access to credentials. this will prevent people from contributing to the docs from forks.
we need to be able to skip this test if it is run from a fork on github-ci. alternatively we can disable this test and just lint it.
verified sources detect forks so you can start by looking and github workflows there
# @@@DLT_SNIPPET_END example | ||
|
||
# test assertions | ||
row_counts = pipeline.last_trace.last_normalize_info.row_counts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be easier if we define examples directly in examples and execute it there with exec? then the code does not need to be copied. you have access to all locals anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer it this way, because we can add tests inbetween etc. But I can change it if you like.
POKEMON_URL = "https://pokeapi.co/api/v2/pokemon" | ||
|
||
# retrieve pokemon list | ||
@dlt.resource(write_disposition="replace") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it selected=False (so we do not store this in database)
) | ||
|
||
# the pokemon_list resource does not need to be loaded | ||
load_info = pipeline.run([pokemon(), species()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok now it is clear why you have double evaluation. please create a dlt.source that returns 3 resources
- the list resource
- pokemons
- species
use pipes to connect resources to transformers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have done that, but you should check that it is done the right way. The tests pass.
|
||
## Using transformers with the pokemon api | ||
|
||
For this example we will be loading data from the [Poke Api](https://pokeapi.co/). We will load a list of pokemon from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look what @AstrakhantsevaAA did. 3-5 bullet points with highlights
- we create two transformers that are connected to a single list resource with |
- we load in parallel using requests
- we load in parallel using async
- we configure parallelism in config.toml
- we deselect pokemon list by default to not load this in database
provide relevant docs links
``` | ||
<!--@@@DLT_SNIPPET_END ./code/run-snippets.py::example--> | ||
|
||
### Example pokemon list data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd not include that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like this, but I have removed it for now.
@@ -0,0 +1,26 @@ | |||
# here is a file with the secrets for all the example pipelines in `examples` folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rudolfix should this be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @sh-rp @AstrakhantsevaAA thanks for your hard work!
Description
This preview provides a scaffold for our examples. Notes: