From 8b4b47b4d1110b9fda18ee2aa6768e106e571d09 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 13 Mar 2024 05:03:22 +0000 Subject: [PATCH 01/15] Updated --- .../deploy-a-pipeline/deploy-with-dagster.md | 60 +++++++++++++++++++ docs/website/sidebars.js | 1 + 2 files changed, 61 insertions(+) create mode 100644 docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md new file mode 100644 index 0000000000..8c34f7ce9e --- /dev/null +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -0,0 +1,60 @@ +--- +title: Deploy with Dagster +description: How to deploy a pipeline with Dagster +keywords: [how to, deploy a pipeline, Dagster] +--- + +# Deploy with Dagster + + +## Introduction to Dagster + +Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports. It makes it easier for data engineers to create, test, deploy, and oversee data-related assets. Dagster ensures these processes are reliable and focuses on using software-defined assets (SDAs) to simplify complex data management, enhance the ability to reuse code and provide a better understanding of data. + +To learn more, please read Dagster’s [documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) + +## Dagster Cloud Features + +Dagster Cloud further enhances these features by providing an enterprise-level orchestration service with serverless or hybrid deployment options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. It enables scalable, cost-effective operations without the hassle of infrastructure management. + +## Dagster deployment options: **Serverless versus Hybrid**: + +The *serverless* option fully hosts the orchestration engine, while the *hybrid* model offers flexibility to use your computing resources, with Dagster managing the control plane, reducing operational overhead, and ensuring security. + +For more, please visit [Dagster cloud.](https://dagster.io/cloud) + +## Using Dagster for Free + +Dagster offers a 30-day free trial during which you can explore its features, such as pipeline orchestration, data quality checks, and embedded ELTs. You can try Dagster using its open source or by signing up for the trial. + +## Building Data Pipelines with `dlt` + +`dlt` is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets through automatic schema inference and evolution. It simplifies building data pipelines by providing functionality to support the entire extract and load process. + +How does `dlt` integrate with Dagster for pipeline orchestration? + +`dlt` integrates with Dagster for pipeline orchestration, providing a streamlined process for building, enhancing, and managing data pipelines. This enables developers to leverage `dlt`'s capabilities for handling data extraction and load and Dagster's orchestration features to efficiently manage and monitor data pipelines. + +Here’s a brief summary of how to orchestrate `dlt` pipeline on Dagster: + +1. Create a `dlt` pipeline. For instructions on creating a pipeline, please refer to the [documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). +2. Set up a Dagster project, configure resources, and define the asset. For more information, please refer to [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) +3. Next, define Dagster definitions, start the web server, and materialize the asset. +4. View the populated metadata and data in the configured destination. + +To do a hands-on project about “Orchestrating unstructured data pipelines with dagster and `dlt`," please read the following [article](https://dagster.io/blog/dagster-dlt). Here, the author has given a detailed overview and steps to ingest GitHub issue data from a repository and store the data in BigQuery. To build your pipelines, you can employ a similar approach. + +## Additional Resources + +- A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). +- `dlt` pipelines configured for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). + +:::note +These are external repositories and are subject to change. +::: + +## Conclusion + +In conclusion, integrating `dlt` within the data pipeline ecosystem significantly enhances the efficiency and manageability of data operations. The synergy between `dlt` and Dagster simplifies the development of data pipelines and ensures that data assets are more maintainable and scalable over time. `dlt` offers plenty of [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources/) that can be orchestrated on Dagster in a simplified way and can be easily managed, customized and maintained. + +We encourage data engineers and developers to explore the capabilities of `dlt` within the Dagster platform. Leveraging `dlt` on Dagster streamlines the pipeline development process and unlocks the potential for greater insights and value from your data assets. \ No newline at end of file diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 821a1affad..bf45076047 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -214,6 +214,7 @@ const sidebars = { 'reference/explainers/airflow-gcp-cloud-composer', 'walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions', 'walkthroughs/deploy-a-pipeline/deploy-gcp-cloud-function-as-webhook', + 'walkthroughs/deploy-a-pipeline/deploy-with-dagster', ] }, { From 2a3ea847244081cfd8f11495d948787664b2ec14 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 13 Mar 2024 05:07:48 +0000 Subject: [PATCH 02/15] Update --- .../walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 8c34f7ce9e..ba0304dac3 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -13,17 +13,17 @@ Dagster is an orchestrator that's designed for developing and maintaining data a To learn more, please read Dagster’s [documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) -## Dagster Cloud Features +### Dagster Cloud Features Dagster Cloud further enhances these features by providing an enterprise-level orchestration service with serverless or hybrid deployment options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. It enables scalable, cost-effective operations without the hassle of infrastructure management. -## Dagster deployment options: **Serverless versus Hybrid**: +### Dagster deployment options: **Serverless versus Hybrid**: The *serverless* option fully hosts the orchestration engine, while the *hybrid* model offers flexibility to use your computing resources, with Dagster managing the control plane, reducing operational overhead, and ensuring security. For more, please visit [Dagster cloud.](https://dagster.io/cloud) -## Using Dagster for Free +### Using Dagster for Free Dagster offers a 30-day free trial during which you can explore its features, such as pipeline orchestration, data quality checks, and embedded ELTs. You can try Dagster using its open source or by signing up for the trial. @@ -44,7 +44,7 @@ Here’s a brief summary of how to orchestrate `dlt` pipeline on Dagster: To do a hands-on project about “Orchestrating unstructured data pipelines with dagster and `dlt`," please read the following [article](https://dagster.io/blog/dagster-dlt). Here, the author has given a detailed overview and steps to ingest GitHub issue data from a repository and store the data in BigQuery. To build your pipelines, you can employ a similar approach. -## Additional Resources +### Additional Resources - A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). - `dlt` pipelines configured for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). From 501ae52790486162654978df38869dfcd1259e47 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Mon, 18 Mar 2024 06:13:41 +0000 Subject: [PATCH 03/15] Updated --- .../deploy-a-pipeline/deploy-with-dagster.md | 35 +++++++++++-------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index ba0304dac3..f5ceefbc01 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -9,19 +9,19 @@ keywords: [how to, deploy a pipeline, Dagster] ## Introduction to Dagster -Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports. It makes it easier for data engineers to create, test, deploy, and oversee data-related assets. Dagster ensures these processes are reliable and focuses on using software-defined assets (SDAs) to simplify complex data management, enhance the ability to reuse code and provide a better understanding of data. +Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports. Dagster ensures these processes are reliable and focuses on using software-defined assets (SDAs) to simplify complex data management, enhance the ability to reuse code and provide a better understanding of data. -To learn more, please read Dagster’s [documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) +To read more, please refer to Dagster’s [documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) ### Dagster Cloud Features Dagster Cloud further enhances these features by providing an enterprise-level orchestration service with serverless or hybrid deployment options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. It enables scalable, cost-effective operations without the hassle of infrastructure management. -### Dagster deployment options: **Serverless versus Hybrid**: +### Dagster deployment options: **Serverless** versus **Hybrid**: -The *serverless* option fully hosts the orchestration engine, while the *hybrid* model offers flexibility to use your computing resources, with Dagster managing the control plane, reducing operational overhead, and ensuring security. +The *serverless* option fully hosts the orchestration engine, while the *hybrid* model offers flexibility to use your computing resources, with Dagster managing the control plane. Reducing operational overhead and ensuring security. -For more, please visit [Dagster cloud.](https://dagster.io/cloud) +For more info, please [refer.](https://dagster.io/cloud) ### Using Dagster for Free @@ -37,24 +37,29 @@ How does `dlt` integrate with Dagster for pipeline orchestration? Here’s a brief summary of how to orchestrate `dlt` pipeline on Dagster: -1. Create a `dlt` pipeline. For instructions on creating a pipeline, please refer to the [documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). -2. Set up a Dagster project, configure resources, and define the asset. For more information, please refer to [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) -3. Next, define Dagster definitions, start the web server, and materialize the asset. -4. View the populated metadata and data in the configured destination. +1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the +[documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). -To do a hands-on project about “Orchestrating unstructured data pipelines with dagster and `dlt`," please read the following [article](https://dagster.io/blog/dagster-dlt). Here, the author has given a detailed overview and steps to ingest GitHub issue data from a repository and store the data in BigQuery. To build your pipelines, you can employ a similar approach. +1. Set up a Dagster project, configure resources, and define the asset. For more information, please refer to [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) + +1. Next, define Dagster definitions, start the web server, and materialize the asset. +1. View the populated metadata and data in the configured destination. + +:::info +For a hands-on project on “Orchestrating unstructured data pipelines with dagster and dlt", read the [article]((https://dagster.io/blog/dagster-dlt)) provided. The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in BigQuery. You can use a similar approach to build your pipelines. +::: ### Additional Resources - A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). - `dlt` pipelines configured for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). -:::note -These are external repositories and are subject to change. -::: + :::note + These are external repositories and are subject to change. + ::: ## Conclusion -In conclusion, integrating `dlt` within the data pipeline ecosystem significantly enhances the efficiency and manageability of data operations. The synergy between `dlt` and Dagster simplifies the development of data pipelines and ensures that data assets are more maintainable and scalable over time. `dlt` offers plenty of [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources/) that can be orchestrated on Dagster in a simplified way and can be easily managed, customized and maintained. +In conclusion, integrating `dlt` into the data pipeline ecosystem markedly improves data operations' efficiency and manageability. The combination of `dlt` and Dagster eases the development of data pipelines, making data assets more maintainable and scalable over time. With a wealth of [verified sources](/docs/website/docs/dlt-ecosystem/verified-sources/) available, `dlt` enables streamlined orchestration on Dagster, offering easy management, customization, and maintenance. -We encourage data engineers and developers to explore the capabilities of `dlt` within the Dagster platform. Leveraging `dlt` on Dagster streamlines the pipeline development process and unlocks the potential for greater insights and value from your data assets. \ No newline at end of file +We encourage data engineers and developers to explore the capabilities of `dlt` within the Dagster platform. By levraging `dlt` on Dagster, you can simplify the pipeline development process and gain greater insights and value from your data assets. \ No newline at end of file From ea7470544c6efad4011b398ac12c606a171291fa Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Mon, 18 Mar 2024 06:19:09 +0000 Subject: [PATCH 04/15] Updated --- .../docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index f5ceefbc01..06e570f447 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -60,6 +60,6 @@ For a hands-on project on “Orchestrating unstructured data pipelines with dags ## Conclusion -In conclusion, integrating `dlt` into the data pipeline ecosystem markedly improves data operations' efficiency and manageability. The combination of `dlt` and Dagster eases the development of data pipelines, making data assets more maintainable and scalable over time. With a wealth of [verified sources](/docs/website/docs/dlt-ecosystem/verified-sources/) available, `dlt` enables streamlined orchestration on Dagster, offering easy management, customization, and maintenance. +In conclusion, integrating `dlt` into the data pipeline ecosystem markedly improves data operations' efficiency and manageability. The combination of `dlt` and Dagster eases the development of data pipelines, making data assets more maintainable and scalable over time. With a wealth of [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources/) available, `dlt` enables streamlined orchestration on Dagster, offering easy management, customization, and maintenance. We encourage data engineers and developers to explore the capabilities of `dlt` within the Dagster platform. By levraging `dlt` on Dagster, you can simplify the pipeline development process and gain greater insights and value from your data assets. \ No newline at end of file From b4b65cb878ef735389ad7fc34aa731e400e83b14 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Mon, 18 Mar 2024 06:25:26 +0000 Subject: [PATCH 05/15] Updated --- .../docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 06e570f447..c621b940e8 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -46,7 +46,7 @@ Here’s a brief summary of how to orchestrate `dlt` pipeline on Dagster: 1. View the populated metadata and data in the configured destination. :::info -For a hands-on project on “Orchestrating unstructured data pipelines with dagster and dlt", read the [article]((https://dagster.io/blog/dagster-dlt)) provided. The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in BigQuery. You can use a similar approach to build your pipelines. +For a hands-on project on “Orchestrating unstructured data pipelines with dagster and dlt", read the [article](https://dagster.io/blog/dagster-dlt) provided. The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in BigQuery. You can use a similar approach to build your pipelines. ::: ### Additional Resources From 12396d2527428034487ee1ddde1c217da56fb3ec Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Mon, 18 Mar 2024 06:29:34 +0000 Subject: [PATCH 06/15] Updated --- .../walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index c621b940e8..6c56c50ee5 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -54,9 +54,9 @@ For a hands-on project on “Orchestrating unstructured data pipelines with dags - A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). - `dlt` pipelines configured for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). - :::note - These are external repositories and are subject to change. - ::: +:::note +These are external repositories and are subject to change. +::: ## Conclusion From 5d3b16f29484f9a2bfa3cba99fe1c46d479c66ec Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Mon, 18 Mar 2024 06:36:21 +0000 Subject: [PATCH 07/15] Updated --- .../walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 6c56c50ee5..5f783fda4a 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -57,9 +57,3 @@ For a hands-on project on “Orchestrating unstructured data pipelines with dags :::note These are external repositories and are subject to change. ::: - -## Conclusion - -In conclusion, integrating `dlt` into the data pipeline ecosystem markedly improves data operations' efficiency and manageability. The combination of `dlt` and Dagster eases the development of data pipelines, making data assets more maintainable and scalable over time. With a wealth of [verified sources](https://dlthub.com/docs/dlt-ecosystem/verified-sources/) available, `dlt` enables streamlined orchestration on Dagster, offering easy management, customization, and maintenance. - -We encourage data engineers and developers to explore the capabilities of `dlt` within the Dagster platform. By levraging `dlt` on Dagster, you can simplify the pipeline development process and gain greater insights and value from your data assets. \ No newline at end of file From e56528ba667eff1a1226f5d3f04ce7c2cf8ad8c3 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Thu, 21 Mar 2024 01:36:51 +0000 Subject: [PATCH 08/15] Updated --- .../deploy-a-pipeline/deploy-with-dagster.md | 61 +++++++++++++++++-- 1 file changed, 55 insertions(+), 6 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 5f783fda4a..6dd8e9ec92 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -15,7 +15,7 @@ To read more, please refer to Dagster’s [documentation.](https://docs.dagster. ### Dagster Cloud Features -Dagster Cloud further enhances these features by providing an enterprise-level orchestration service with serverless or hybrid deployment options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. It enables scalable, cost-effective operations without the hassle of infrastructure management. +Dagster Cloud offers enterprise-level orchestration service with serverless or hybrid deployment options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. It enables scalable, cost-effective operations without the hassle of infrastructure management. ### Dagster deployment options: **Serverless** versus **Hybrid**: @@ -35,25 +35,74 @@ How does `dlt` integrate with Dagster for pipeline orchestration? `dlt` integrates with Dagster for pipeline orchestration, providing a streamlined process for building, enhancing, and managing data pipelines. This enables developers to leverage `dlt`'s capabilities for handling data extraction and load and Dagster's orchestration features to efficiently manage and monitor data pipelines. -Here’s a brief summary of how to orchestrate `dlt` pipeline on Dagster: +Here's a concise guide to orchestrating a `dlt` pipeline with Dagster, using the project "Ingesting GitHub issue data from a repository and storing it in BigQuery" as an example, detailed in the article [“Orchestrating unstructured data pipelines with dagster and dlt."](https://dagster.io/blog/dagster-dlt) 1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the [documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). -1. Set up a Dagster project, configure resources, and define the asset. For more information, please refer to [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) +1. Set up a Dagster project, configure resources, and define the asset as follows: + 1. To create a Dagster project: + ```sh + mkdir dagster_github_issues + cd dagster_github_issues + dagster project scaffold --name github-issues + ``` + 1. Define `dlt` as a Dagster resource: + ```py + from dagster import ConfigurableResource + from dagster import ConfigurableResource + import dlt + + class DltResource(ConfigurableResource): + pipeline_name: str + dataset_name: str + destination: str + + def create_pipeline(self, resource_data, table_name): + + # configure the pipeline with your destination details + pipeline = dlt.pipeline( + pipeline_name=self.pipeline_name, + destination=self.destination, + dataset_name=self.dataset_name + ) + + # run the pipeline with your parameters + load_info = pipeline.run(resource_data, table_name=table_name) + + return load_info + ``` + 1. Define the Asset: + ```py + @asset + def issues_pipeline(pipeline: DltResource): + + logger = get_dagster_logger() + results = pipeline.create_pipeline(github_issues_resource, table_name='github_issues') + logger.info(results) + ``` + >For more information, please refer to [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) 1. Next, define Dagster definitions, start the web server, and materialize the asset. + 1. Start the webserver: + ```sh + dagster dev + ``` 1. View the populated metadata and data in the configured destination. :::info -For a hands-on project on “Orchestrating unstructured data pipelines with dagster and dlt", read the [article](https://dagster.io/blog/dagster-dlt) provided. The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in BigQuery. You can use a similar approach to build your pipelines. +For a hands-on project on “Orchestrating unstructured data pipelines with dagster and `dlt`", read the [article](https://dagster.io/blog/dagster-dlt) provided. The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in BigQuery. You can use a similar approach to build your pipelines. ::: ### Additional Resources + - A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). -- `dlt` pipelines configured for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). +- Configure `dlt` pipelines for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). +- Configure MongoDB source as an Asset factory: + >Dagster provides the feature of [@multi_asset](https://github.com/dlt-hub/dlt-dagster-demo/blob/21a8d18b6f0424f40f2eed5030989306af8b8edb/mongodb_dlt/mongodb_dlt/assets/__init__.py#L18) declaration that will allow us to convert each collection under a database into a separate asset. This will make our pipeline easy to debug in case of failure and the collections independent of each other. + :::note These are external repositories and are subject to change. -::: +::: \ No newline at end of file From 1ef9a1ec3824331c93da75d6198565955060c75c Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Fri, 22 Mar 2024 06:10:11 +0000 Subject: [PATCH 09/15] Updated --- .../deploy-a-pipeline/deploy-with-dagster.md | 164 ++++++++++-------- 1 file changed, 94 insertions(+), 70 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 6dd8e9ec92..2e3f096f21 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -6,103 +6,127 @@ keywords: [how to, deploy a pipeline, Dagster] # Deploy with Dagster - ## Introduction to Dagster -Dagster is an orchestrator that's designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports. Dagster ensures these processes are reliable and focuses on using software-defined assets (SDAs) to simplify complex data management, enhance the ability to reuse code and provide a better understanding of data. +Dagster is an orchestrator that's designed for developing and maintaining data assets, such as +tables, data sets, machine learning models, and reports. Dagster ensures these processes are +reliable and focuses on using software-defined assets (SDAs) to simplify complex data management, +enhance the ability to reuse code and provide a better understanding of data. -To read more, please refer to Dagster’s [documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) +To read more, please refer to Dagster’s +[documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) ### Dagster Cloud Features -Dagster Cloud offers enterprise-level orchestration service with serverless or hybrid deployment options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. It enables scalable, cost-effective operations without the hassle of infrastructure management. +Dagster Cloud offers enterprise-level orchestration service with serverless or hybrid deployment +options. It incorporates native branching and built-in CI/CD to prioritize the developer experience. +It enables scalable, cost-effective operations without the hassle of infrastructure management. ### Dagster deployment options: **Serverless** versus **Hybrid**: -The *serverless* option fully hosts the orchestration engine, while the *hybrid* model offers flexibility to use your computing resources, with Dagster managing the control plane. Reducing operational overhead and ensuring security. +The *serverless* option fully hosts the orchestration engine, while the *hybrid* model offers +flexibility to use your computing resources, with Dagster managing the control plane. Reducing +operational overhead and ensuring security. For more info, please [refer.](https://dagster.io/cloud) ### Using Dagster for Free -Dagster offers a 30-day free trial during which you can explore its features, such as pipeline orchestration, data quality checks, and embedded ELTs. You can try Dagster using its open source or by signing up for the trial. +Dagster offers a 30-day free trial during which you can explore its features, such as pipeline +orchestration, data quality checks, and embedded ELTs. You can try Dagster using its open source or +by signing up for the trial. ## Building Data Pipelines with `dlt` -`dlt` is an open-source Python library that allows you to declaratively load data sources into well-structured tables or datasets through automatic schema inference and evolution. It simplifies building data pipelines by providing functionality to support the entire extract and load process. +`dlt` is an open-source Python library that allows you to declaratively load data sources into +well-structured tables or datasets through automatic schema inference and evolution. It simplifies +building data pipelines by providing functionality to support the entire extract and load process. How does `dlt` integrate with Dagster for pipeline orchestration? -`dlt` integrates with Dagster for pipeline orchestration, providing a streamlined process for building, enhancing, and managing data pipelines. This enables developers to leverage `dlt`'s capabilities for handling data extraction and load and Dagster's orchestration features to efficiently manage and monitor data pipelines. - -Here's a concise guide to orchestrating a `dlt` pipeline with Dagster, using the project "Ingesting GitHub issue data from a repository and storing it in BigQuery" as an example, detailed in the article [“Orchestrating unstructured data pipelines with dagster and dlt."](https://dagster.io/blog/dagster-dlt) - -1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the -[documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). - -1. Set up a Dagster project, configure resources, and define the asset as follows: - 1. To create a Dagster project: - ```sh - mkdir dagster_github_issues - cd dagster_github_issues - dagster project scaffold --name github-issues - ``` - 1. Define `dlt` as a Dagster resource: - ```py - from dagster import ConfigurableResource - from dagster import ConfigurableResource - import dlt - - class DltResource(ConfigurableResource): - pipeline_name: str - dataset_name: str - destination: str - - def create_pipeline(self, resource_data, table_name): - - # configure the pipeline with your destination details - pipeline = dlt.pipeline( - pipeline_name=self.pipeline_name, - destination=self.destination, - dataset_name=self.dataset_name - ) - - # run the pipeline with your parameters - load_info = pipeline.run(resource_data, table_name=table_name) - - return load_info - ``` - 1. Define the Asset: - ```py - @asset - def issues_pipeline(pipeline: DltResource): - - logger = get_dagster_logger() - results = pipeline.create_pipeline(github_issues_resource, table_name='github_issues') - logger.info(results) - ``` - >For more information, please refer to [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) +`dlt` integrates with Dagster for pipeline orchestration, providing a streamlined process for +building, enhancing, and managing data pipelines. This enables developers to leverage `dlt`'s +capabilities for handling data extraction and load and Dagster's orchestration features to +efficiently manage and monitor data pipelines. + +Here's a concise guide to orchestrating a `dlt` pipeline with Dagster, using the project "Ingesting +GitHub issue data from a repository and storing it in BigQuery" as an example, detailed in the +article +[“Orchestrating unstructured data pipelines with dagster and dlt."](https://dagster.io/blog/dagster-dlt) + +1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the + [documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). + +1. Set up a Dagster project, configure resources, and define the asset as follows: + + 1. To create a Dagster project: + ```sh + mkdir dagster_github_issues + cd dagster_github_issues + dagster project scaffold --name github-issues + ``` + 1. Define `dlt` as a Dagster resource: + ```py + from dagster import ConfigurableResource + from dagster import ConfigurableResource + import dlt + + class DltResource(ConfigurableResource): + pipeline_name: str + dataset_name: str + destination: str + + def create_pipeline(self, resource_data, table_name): + + # configure the pipeline with your destination details + pipeline = dlt.pipeline( + pipeline_name=self.pipeline_name, + destination=self.destination, + dataset_name=self.dataset_name + ) + + # run the pipeline with your parameters + load_info = pipeline.run(resource_data, table_name=table_name) + + return load_info + ``` + 1. Define the Asset: + ```py + @asset + def issues_pipeline(pipeline: DltResource): + + logger = get_dagster_logger() + results = pipeline.create_pipeline(github_issues_resource, table_name='github_issues') + logger.info(results) + ``` + > For more information, please refer to + > [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) 1. Next, define Dagster definitions, start the web server, and materialize the asset. + 1. Start the webserver: ```sh dagster dev - ``` + ``` + 1. View the populated metadata and data in the configured destination. -:::info -For a hands-on project on “Orchestrating unstructured data pipelines with dagster and `dlt`", read the [article](https://dagster.io/blog/dagster-dlt) provided. The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in BigQuery. You can use a similar approach to build your pipelines. -::: +:::info For a hands-on project on “Orchestrating unstructured data pipelines with dagster and +`dlt`", read the [article](https://dagster.io/blog/dagster-dlt) provided. The author offers a +detailed overview and steps for ingesting GitHub issue data from a repository and storing it in +BigQuery. You can use a similar approach to build your pipelines. ::: ### Additional Resources - -- A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). -- Configure `dlt` pipelines for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). -- Configure MongoDB source as an Asset factory: - >Dagster provides the feature of [@multi_asset](https://github.com/dlt-hub/dlt-dagster-demo/blob/21a8d18b6f0424f40f2eed5030989306af8b8edb/mongodb_dlt/mongodb_dlt/assets/__init__.py#L18) declaration that will allow us to convert each collection under a database into a separate asset. This will make our pipeline easy to debug in case of failure and the collections independent of each other. - - -:::note -These are external repositories and are subject to change. -::: \ No newline at end of file +- A general configurable `dlt` resource orchestrated on Dagster: + [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). +- Configure `dlt` pipelines for Dagster: + [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). +- Configure MongoDB source as an Asset factory: + > Dagster provides the feature of + > [@multi_asset](https://github.com/dlt-hub/dlt-dagster-demo/blob/21a8d18b6f0424f40f2eed5030989306af8b8edb/mongodb_dlt/mongodb_dlt/assets/__init__.py#L18) + > declaration that will allow us to convert each collection under a database into a separate + > asset. This will make our pipeline easy to debug in case of failure and the collections + > independent of each other. + +:::note These are external repositories and are subject to change. ::: From 246e6771f554a471e47c3566c3c4562c5914eded Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Tue, 26 Mar 2024 08:29:33 +0000 Subject: [PATCH 10/15] Updated deploy with dagster --- .../deploy-a-pipeline/deploy-with-dagster.md | 65 ++++++++++++++----- 1 file changed, 47 insertions(+), 18 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 2e3f096f21..615139ac32 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -39,23 +39,27 @@ by signing up for the trial. ## Building Data Pipelines with `dlt` `dlt` is an open-source Python library that allows you to declaratively load data sources into -well-structured tables or datasets through automatic schema inference and evolution. It simplifies -building data pipelines by providing functionality to support the entire extract and load process. +well-structured tables or datasets through automatic schema inference and evolution. It simplifies +building data pipelines with support for extract and load processes. -How does `dlt` integrate with Dagster for pipeline orchestration? +**How does `dlt` integrate with Dagster for pipeline orchestration?** `dlt` integrates with Dagster for pipeline orchestration, providing a streamlined process for building, enhancing, and managing data pipelines. This enables developers to leverage `dlt`'s capabilities for handling data extraction and load and Dagster's orchestration features to efficiently manage and monitor data pipelines. +### Orchestrating `dlt` pipeline on Dagster + Here's a concise guide to orchestrating a `dlt` pipeline with Dagster, using the project "Ingesting -GitHub issue data from a repository and storing it in BigQuery" as an example, detailed in the -article +GitHub issue data from a repository and storing it in BigQuery" as an example. + +More details can be found in the article [“Orchestrating unstructured data pipelines with dagster and dlt."](https://dagster.io/blog/dagster-dlt) -1. Create a `dlt` pipeline. For detailed instructions on creating a pipeline, please refer to the - [documentation](https://dlthub.com/docs/walkthroughs/create-a-pipeline). +**The steps are as follows:** +1. Create a `dlt` pipeline. For more, please refer to the documentation: +[Creating a pipeline.](https://dlthub.com/docs/walkthroughs/create-a-pipeline) 1. Set up a Dagster project, configure resources, and define the asset as follows: @@ -65,6 +69,7 @@ article cd dagster_github_issues dagster project scaffold --name github-issues ``` + 1. Define `dlt` as a Dagster resource: ```py from dagster import ConfigurableResource @@ -90,7 +95,7 @@ article return load_info ``` - 1. Define the Asset: + 1. Define the asset as: ```py @asset def issues_pipeline(pipeline: DltResource): @@ -102,26 +107,48 @@ article > For more information, please refer to > [Dagster’s documentation.](https://docs.dagster.io/getting-started/quickstart) -1. Next, define Dagster definitions, start the web server, and materialize the asset. - - 1. Start the webserver: - ```sh - dagster dev - ``` +1. Next, define Dagster definitions as follows: + ```py + all_assets = load_assets_from_modules([assets]) + simple_pipeline = define_asset_job(name="simple_pipeline", selection= ['issues_pipeline']) + + defs = Definitions( + assets=all_assets, + jobs=[simple_pipeline], + resources={ + "pipeline": DltResource( + pipeline_name = "github_issues", + dataset_name = "dagster_github_issues", + destination = "bigquery", + table_name= "github_issues" + ), + } + ) + ``` + +1. Finally, start the web server as: + + ```sh + dagster dev + ``` 1. View the populated metadata and data in the configured destination. -:::info For a hands-on project on “Orchestrating unstructured data pipelines with dagster and -`dlt`", read the [article](https://dagster.io/blog/dagster-dlt) provided. The author offers a +:::info +For the complete hands-on project on “Orchestrating unstructured data pipelines with dagster and +`dlt`", please refer to [article](https://dagster.io/blog/dagster-dlt). The author offers a detailed overview and steps for ingesting GitHub issue data from a repository and storing it in -BigQuery. You can use a similar approach to build your pipelines. ::: +BigQuery. You can use a similar approach to build your pipelines. +::: ### Additional Resources - A general configurable `dlt` resource orchestrated on Dagster: [dlt resource](https://github.com/dagster-io/dagster-open-platform/blob/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/resources/dlt_resource.py#L29). + - Configure `dlt` pipelines for Dagster: [dlt pipelines](https://github.com/dagster-io/dagster-open-platform/tree/5030ff6828e2b001a557c6864f279c3b476b0ca0/dagster_open_platform/assets/dlt_pipelines). + - Configure MongoDB source as an Asset factory: > Dagster provides the feature of > [@multi_asset](https://github.com/dlt-hub/dlt-dagster-demo/blob/21a8d18b6f0424f40f2eed5030989306af8b8edb/mongodb_dlt/mongodb_dlt/assets/__init__.py#L18) @@ -129,4 +156,6 @@ BigQuery. You can use a similar approach to build your pipelines. ::: > asset. This will make our pipeline easy to debug in case of failure and the collections > independent of each other. -:::note These are external repositories and are subject to change. ::: +:::note +These are external repositories and are subject to change. +::: From ba3085f8f10664f521584b2ddd75b4fceda7783d Mon Sep 17 00:00:00 2001 From: Zaeem Athar Date: Wed, 27 Mar 2024 13:13:42 +0100 Subject: [PATCH 11/15] Update deploy-with-dagster.md Fixing typos --- .../deploy-a-pipeline/deploy-with-dagster.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 615139ac32..35578952c7 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -8,10 +8,10 @@ keywords: [how to, deploy a pipeline, Dagster] ## Introduction to Dagster -Dagster is an orchestrator that's designed for developing and maintaining data assets, such as +Dagster is an orchestrator designed for developing and maintaining data assets, such as tables, data sets, machine learning models, and reports. Dagster ensures these processes are reliable and focuses on using software-defined assets (SDAs) to simplify complex data management, -enhance the ability to reuse code and provide a better understanding of data. +enhance the ability to reuse code, and provide a better understanding of data. To read more, please refer to Dagster’s [documentation.](https://docs.dagster.io/getting-started?_gl=1*19ikq9*_ga*NTMwNTUxNDAzLjE3MDg5Mjc4OTk.*_ga_84VRQZG7TV*MTcwOTkwNDY3MS4zLjEuMTcwOTkwNTYzNi41Ny4wLjA.*_gcl_au*OTM3OTU1ODMwLjE3MDg5Mjc5MDA.) @@ -28,7 +28,7 @@ The *serverless* option fully hosts the orchestration engine, while the *hybrid* flexibility to use your computing resources, with Dagster managing the control plane. Reducing operational overhead and ensuring security. -For more info, please [refer.](https://dagster.io/cloud) +For more info, please refer to the Dagster Cloud [docs.](https://dagster.io/cloud) ### Using Dagster for Free @@ -46,13 +46,12 @@ building data pipelines with support for extract and load processes. `dlt` integrates with Dagster for pipeline orchestration, providing a streamlined process for building, enhancing, and managing data pipelines. This enables developers to leverage `dlt`'s -capabilities for handling data extraction and load and Dagster's orchestration features to -efficiently manage and monitor data pipelines. +capabilities for handling data extraction and load and Dagster's orchestration features to efficiently manage and monitor data pipelines. ### Orchestrating `dlt` pipeline on Dagster Here's a concise guide to orchestrating a `dlt` pipeline with Dagster, using the project "Ingesting -GitHub issue data from a repository and storing it in BigQuery" as an example. +GitHub issues data from a repository and storing it in BigQuery" as an example. More details can be found in the article [“Orchestrating unstructured data pipelines with dagster and dlt."](https://dagster.io/blog/dagster-dlt) @@ -132,8 +131,6 @@ More details can be found in the article dagster dev ``` -1. View the populated metadata and data in the configured destination. - :::info For the complete hands-on project on “Orchestrating unstructured data pipelines with dagster and `dlt`", please refer to [article](https://dagster.io/blog/dagster-dlt). The author offers a From 42c238ec1d14368a98d3e7480c646d81d57085c9 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 27 Mar 2024 13:01:25 +0000 Subject: [PATCH 12/15] Updated sidebars.js --- docs/website/sidebars.js | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 4fd6bfca6b..15c9c27512 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -219,6 +219,7 @@ const sidebars = { 'walkthroughs/deploy-a-pipeline/deploy-with-google-cloud-functions', 'walkthroughs/deploy-a-pipeline/deploy-gcp-cloud-function-as-webhook', 'walkthroughs/deploy-a-pipeline/deploy-with-kestra', + 'walkthroughs/deploy-a-pipeline/deploy-with-dagster', ] }, { From 54f6a04d1cf9b7460f3137cfc884c93f72be7783 Mon Sep 17 00:00:00 2001 From: Zaeem Athar Date: Tue, 2 Apr 2024 09:55:18 +0200 Subject: [PATCH 13/15] Update deploy-with-dagster.md Adding import DltResource in the Definition script. --- .../docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 35578952c7..03f45ababb 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -108,6 +108,8 @@ More details can be found in the article 1. Next, define Dagster definitions as follows: ```py + import DltResource + all_assets = load_assets_from_modules([assets]) simple_pipeline = define_asset_job(name="simple_pipeline", selection= ['issues_pipeline']) From 79e456dc4288a6908a89e47ba6125cb7e2d104c9 Mon Sep 17 00:00:00 2001 From: Zaeem Athar Date: Tue, 2 Apr 2024 10:53:06 +0200 Subject: [PATCH 14/15] Update deploy-with-dagster.md Removing DltResource args --- .../walkthroughs/deploy-a-pipeline/deploy-with-dagster.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index 03f45ababb..c901b38e00 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -117,12 +117,7 @@ More details can be found in the article assets=all_assets, jobs=[simple_pipeline], resources={ - "pipeline": DltResource( - pipeline_name = "github_issues", - dataset_name = "dagster_github_issues", - destination = "bigquery", - table_name= "github_issues" - ), + "pipeline": DltResource(), } ) ``` From 17c15aafaab19bae200499fa402aeeee58cff056 Mon Sep 17 00:00:00 2001 From: Zaeem Athar Date: Tue, 2 Apr 2024 12:13:29 +0200 Subject: [PATCH 15/15] Update deploy-with-dagster.md Changing resource name from DltResource to DltPipeline --- .../deploy-a-pipeline/deploy-with-dagster.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md index c901b38e00..cca882ba38 100644 --- a/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md +++ b/docs/website/docs/walkthroughs/deploy-a-pipeline/deploy-with-dagster.md @@ -75,7 +75,7 @@ More details can be found in the article from dagster import ConfigurableResource import dlt - class DltResource(ConfigurableResource): + class DltPipeline(ConfigurableResource): pipeline_name: str dataset_name: str destination: str @@ -97,7 +97,7 @@ More details can be found in the article 1. Define the asset as: ```py @asset - def issues_pipeline(pipeline: DltResource): + def issues_pipeline(pipeline: DltPipeline): logger = get_dagster_logger() results = pipeline.create_pipeline(github_issues_resource, table_name='github_issues') @@ -108,8 +108,6 @@ More details can be found in the article 1. Next, define Dagster definitions as follows: ```py - import DltResource - all_assets = load_assets_from_modules([assets]) simple_pipeline = define_asset_job(name="simple_pipeline", selection= ['issues_pipeline']) @@ -117,7 +115,11 @@ More details can be found in the article assets=all_assets, jobs=[simple_pipeline], resources={ - "pipeline": DltResource(), + "pipeline": DltPipeline( + pipeline_name = "github_issues", + dataset_name = "dagster_github_issues", + destination = "bigquery", + ), } ) ```