Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples for docs #616

Merged
merged 37 commits into from
Oct 4, 2023
Merged

examples for docs #616

merged 37 commits into from
Oct 4, 2023

Conversation

sh-rp
Copy link
Collaborator

@sh-rp sh-rp commented Sep 10, 2023

Description

This preview provides a scaffold for our examples. Notes:

  • This is based on the remove snip sync PR: remove snipsync (#613) #615
  • The full code example of every example is in the /code directory of example in the website/docs folder.
  • update_snippets is updated to sync the examples from the website into /docs/examples. It will copy over every file and take the "example" snippet from each file.
  • There is a unique component that all examples share, which is the header with some introductory info and stuff that is shared between all examples. This is an approach we can take in other parts of the docs too to reduce duplication.
  • The code example itself is just for illustration purposes for now.

* add custom snippets element
@netlify
Copy link

netlify bot commented Sep 10, 2023

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 0fcada4
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/651d280853ce9d0007066aea
😎 Deploy Preview https://deploy-preview-616--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sh-rp sh-rp force-pushed the d#/docs_examples branch 7 times, most recently from d9dce90 to b088cec Compare September 11, 2023 10:25
@sh-rp sh-rp changed the title docs examples examples for docs Sep 11, 2023
@sh-rp sh-rp force-pushed the d#/docs_examples branch 4 times, most recently from bebd527 to 249ebe7 Compare September 11, 2023 13:52
@sh-rp sh-rp force-pushed the d#/docs_examples branch 2 times, most recently from 78fadeb to 2323a3b Compare September 11, 2023 14:03
@AstrakhantsevaAA AstrakhantsevaAA changed the base branch from devel to rfix/remove-snipsync September 11, 2023 15:02
@sh-rp sh-rp force-pushed the rfix/remove-snipsync branch from 20614f2 to 15c5d0e Compare September 11, 2023 15:16
@sh-rp sh-rp force-pushed the rfix/remove-snipsync branch from 15c5d0e to 180de6e Compare September 11, 2023 15:27
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on top of the comments:
should we have docosaurus directly in /docs? o docs/docs? not in docs/website/docs?
we can just delete archives or skip it from being crawled by docosaursu

Makefile Outdated
@@ -52,7 +52,7 @@ lint:
# $(MAKE) lint-security

test-and-lint-snippets:
poetry run mypy --config-file mypy.ini docs/website docs/examples
poetry run mypy --config-file mypy.ini --namespace-packages --explicit-package-bases docs/website docs/examples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a script that creates missing init files, check make lint. or you do not want that to happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this again

docs/website/docs/examples/_examples-header.md Outdated Show resolved Hide resolved
<div>{props.intro}</div>

## Setup: Running this example on your machine
<CodeBlock language="sh">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to use this and not

```sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the variable will not work in the markdown block unfortunately..

@sh-rp
Copy link
Collaborator Author

sh-rp commented Sep 14, 2023

on top of the comments:
should we have docosaurus directly in /docs? o docs/docs? not in docs/website/docs?
we can just delete archives or skip it from being crawled by docosaursu

Yes, I think that would be nice. But we should do this when there are no open documentation branches.

@sh-rp sh-rp marked this pull request as ready for review September 14, 2023 08:29
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are good with it so let's write two examples and merge it. I'll take a look at pokemon and maybe add google sheets example?

@@ -392,7 +392,7 @@ def _coerce_non_null_value(self, table_columns: TTableSchemaColumns, table_name:
raise CannotCoerceColumnException(table_name, col_name, py_type, table_columns[col_name]["data_type"], v)
# otherwise we must create variant extension to the table
# pass final=True so no more auto-variants can be created recursively
# TODO: generate callback so DLT user can decide what to do
# TODO: generate callback sodltuser can decide what to do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow surrounding spaces got eaten. and that happened in many places. I hope you can revert the commit and do that again?

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. instead of adding zendesk credentials, use the same method we have in verified sources (via google secrets)
    also because we rely on credentials to run tests we prevent external contributors from using our CI. could we have a pytest skip that will skip the test if it detects we run on github CI and from the fork (easy - all in env variables)

  2. python file names are always "run.py". those module names are used to create sections for config and secrets. let's use a meaningful name ie. zendesk, pokemon

  3. title the example so it is clear what it does and what is the main highlight:

  • "Get Pokemon details in parallel using transformers"
  • "Load zendesk tickets incrementally"

Structure is OK, what I'd change
TLDR -> Summary - One sentence what example does
Highlights
3-5 bullet points with the most interesting stuff or building blocks used. ideal bullet point

  • one sentence explains what we do in the example + links the docs page (if applicable) + shows relevant code snippet (if the example is long)

IMO we do not need to include the whole snippet (maybe if it is short)

example:

  • we configure incremental loading for the resource that takes start and end date passed from the source (link to incremental docs) and allows it to use Airflow schedule (link to the docs)
code snippet

# go to example directory
cd ./dlt/docs/examples/${props.slug}
# install dlt
pip3 install dlt

This comment was marked as resolved.

@@ -0,0 +1,3 @@
# @@@DLT_SNIPPET_START example
some_key="some_value"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

@@ -0,0 +1,130 @@


def incremental_snippet() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sh-rp running this as a test requires access to credentials. this will prevent people from contributing to the docs from forks.

we need to be able to skip this test if it is run from a fork on github-ci. alternatively we can disable this test and just lint it.

verified sources detect forks so you can start by looking and github workflows there

docs/examples/transformers/.dlt/config.toml Outdated Show resolved Hide resolved
docs/website/docs/examples/incremental_loading/index.md Outdated Show resolved Hide resolved
# @@@DLT_SNIPPET_END example

# test assertions
row_counts = pipeline.last_trace.last_normalize_info.row_counts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be easier if we define examples directly in examples and execute it there with exec? then the code does not need to be copied. you have access to all locals anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it this way, because we can add tests inbetween etc. But I can change it if you like.

POKEMON_URL = "https://pokeapi.co/api/v2/pokemon"

# retrieve pokemon list
@dlt.resource(write_disposition="replace")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it selected=False (so we do not store this in database)

)

# the pokemon_list resource does not need to be loaded
load_info = pipeline.run([pokemon(), species()])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok now it is clear why you have double evaluation. please create a dlt.source that returns 3 resources

  • the list resource
  • pokemons
  • species
    use pipes to connect resources to transformers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have done that, but you should check that it is done the right way. The tests pass.


## Using transformers with the pokemon api

For this example we will be loading data from the [Poke Api](https://pokeapi.co/). We will load a list of pokemon from the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look what @AstrakhantsevaAA did. 3-5 bullet points with highlights

  • we create two transformers that are connected to a single list resource with |
  • we load in parallel using requests
  • we load in parallel using async
  • we configure parallelism in config.toml
  • we deselect pokemon list by default to not load this in database
    provide relevant docs links

```
<!--@@@DLT_SNIPPET_END ./code/run-snippets.py::example-->

### Example pokemon list data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd not include that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this, but I have removed it for now.

@@ -0,0 +1,26 @@
# here is a file with the secrets for all the example pipelines in `examples` folder
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudolfix should this be here?

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @sh-rp @AstrakhantsevaAA thanks for your hard work!

@rudolfix rudolfix merged commit 58f8ad1 into devel Oct 4, 2023
30 of 31 checks passed
@rudolfix rudolfix deleted the d#/docs_examples branch October 4, 2023 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants