From 0039700203033a0dffa97f42b3122ddf39cc89a4 Mon Sep 17 00:00:00 2001 From: dat-a-man <98139823+dat-a-man@users.noreply.github.com> Date: Wed, 27 Mar 2024 11:50:20 +0000 Subject: [PATCH] Changed directory of all the images for all the images from "../img/images" to google storage (dlthub-analytics). --- .../2023-03-09-duckdb-1M-downloads-users.mdx | 2 +- ...3-03-16-is-duckdb-a-database-for-ducks.mdx | 2 +- .../2023-04-27-ga4-internal-dashboard-demo.md | 2 +- ...-05-15-hacker-news-gpt-4-dashboard-demo.md | 2 +- ...05-25-postgresql-bigquery-metabase-demo.md | 4 +- ...evolution-next-generation-data-platform.md | 2 +- ...oogle-sheets-to-data-warehouse-pipeline.md | 14 ++--- ...6-14-dlthub-gpt-accelerated learning_01.md | 6 +- .../2023-06-15-automating-data-engineers.md | 2 +- .../blog/2023-06-20-dlthub-gptquestion1-.md | 2 +- .../blog/2023-06-26-dlthub-gptquestion2.md | 4 +- .../blog/2023-08-14-dlt-motherduck-blog.md | 22 +++---- .../blog/2023-08-21-dlt-lineage-support.md | 2 +- docs/website/blog/2023-08-24-dlt-etlt.md | 2 +- docs/website/blog/2023-09-05-mongo-etl.md | 2 +- .../blog/2023-09-20-data-engineering-cv.md | 2 +- .../blog/2023-09-26-verba-dlt-zendesk.md | 8 +-- docs/website/blog/2023-10-06-dlt-holistics.md | 22 +++---- .../blog/2023-10-09-dlt-ops-startups.md | 10 ++-- .../blog/2023-10-10-data-product-docs.md | 4 +- .../blog/2023-10-16-first-data-warehouse.md | 4 +- docs/website/blog/2023-10-19-dbt-runners.md | 2 +- docs/website/blog/2023-10-23-arrow-loading.md | 2 +- docs/website/blog/2023-10-25-dlt-deepnote.md | 10 ++-- docs/website/blog/2023-10-26-dlt-prefect.md | 2 +- .../blog/2023-10-30-data-modelling-tools.md | 54 +++++++++--------- .../blog/2023-11-08-solving-ingestion.md | 6 +- docs/website/static/img/why-duckdb.png | Bin 388520 -> 0 bytes 28 files changed, 98 insertions(+), 98 deletions(-) delete mode 100644 docs/website/static/img/why-duckdb.png diff --git a/docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx b/docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx index 3bd5750e31..d862cb686f 100644 --- a/docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx +++ b/docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx @@ -20,7 +20,7 @@ Like so many others, we are excited about the project, too. Recently, we attende We aimed to identify the most popular reasons why people try out DuckDB with our research. We found five perspectives that people commonly have when trying out DuckDB. -![Marcin watching a MotherDuck presentation](/img/Marcin-dltHub-DuckDB-DuckCon-Brussels.jpg) +![Marcin watching a MotherDuck presentation](https://storage.googleapis.com/dlt-blog-images/Marcin-dltHub-DuckDB-DuckCon-Brussels.jpg) dltHub co-founder Marcin watching a MotherDuck presentation at DuckCon in Brussels in February diff --git a/docs/website/blog/2023-03-16-is-duckdb-a-database-for-ducks.mdx b/docs/website/blog/2023-03-16-is-duckdb-a-database-for-ducks.mdx index 2f4d6a620f..a44da1807c 100644 --- a/docs/website/blog/2023-03-16-is-duckdb-a-database-for-ducks.mdx +++ b/docs/website/blog/2023-03-16-is-duckdb-a-database-for-ducks.mdx @@ -58,7 +58,7 @@ Check this out in the [Colab notebook](https://colab.research.google.com/drive/1 Okay. It’s called DuckDB because ducks are amazing and [@hannes](https://github.com/hannes) once had a pet duck 🤣 -![Why "Duck" DB?](/img/why-duckdb.png) +![Why "Duck" DB?](https://storage.googleapis.com/dlt-blog-images/why-duckdb.png) Source: [DuckDB: an Embeddable Analytical RDBMS](https://db.in.tum.de/teaching/ss19/moderndbs/duckdb-tum.pdf) ## Enjoy this blog post? Give data load tool (dlt) a ⭐ on GitHub [here](https://github.com/dlt-hub/dlt) 🤜🤛 diff --git a/docs/website/blog/2023-04-27-ga4-internal-dashboard-demo.md b/docs/website/blog/2023-04-27-ga4-internal-dashboard-demo.md index 24c70e4c7a..2d7b7de724 100644 --- a/docs/website/blog/2023-04-27-ga4-internal-dashboard-demo.md +++ b/docs/website/blog/2023-04-27-ga4-internal-dashboard-demo.md @@ -23,7 +23,7 @@ We decided to make a dashboard that helps us better understand data attribution ### Internal dashboard -![Dashboard 1](/img/g4_dashboard_screen_grab_1.jpg) ![Dashboard 2](/img/g4_dashboard_screen_grab_2.jpg) +![Dashboard 1](https://storage.googleapis.com/dlt-blog-images/g4_dashboard_screen_grab_1.jpg) ![Dashboard 2](https://storage.googleapis.com/dlt-blog-images/g4_dashboard_screen_grab_2.jpg) With the data loaded locally, we were able to build the dashboard on our system using Streamlit. You can also do this on your system by simply cloning [this repo](https://github.com/dlt-hub/ga4-internal-dashboard-demo) and following the steps listed [here](https://github.com/dlt-hub/ga4-internal-dashboard-demo/tree/main/intial-explorations). diff --git a/docs/website/blog/2023-05-15-hacker-news-gpt-4-dashboard-demo.md b/docs/website/blog/2023-05-15-hacker-news-gpt-4-dashboard-demo.md index d38af2e1d9..c9b2b2fcc3 100644 --- a/docs/website/blog/2023-05-15-hacker-news-gpt-4-dashboard-demo.md +++ b/docs/website/blog/2023-05-15-hacker-news-gpt-4-dashboard-demo.md @@ -29,7 +29,7 @@ Now that the comments were loaded, we were ready to use GPT-4 to create a one se Since these comments were posted in response to stories or other comments, we fed in the story title and any parent comments as context in the prompt. To avoid hitting rate-limit error and losing all progress, we ran this for 100 comments at a time, saving the results in the CSV file each time. We then built a streamlit app to load and display them in a dashboard. Here is what the dashboard looks like: -![dashboard.png](/img/hn_gpt_dashboard.png) +![dashboard.png](https://storage.googleapis.com/dlt-blog-images/hn_gpt_dashboard.png) ## Deploying the pipeline, Google Bigquery, and Streamlit app diff --git a/docs/website/blog/2023-05-25-postgresql-bigquery-metabase-demo.md b/docs/website/blog/2023-05-25-postgresql-bigquery-metabase-demo.md index a1a5b8d0ad..11fbc39f4d 100644 --- a/docs/website/blog/2023-05-25-postgresql-bigquery-metabase-demo.md +++ b/docs/website/blog/2023-05-25-postgresql-bigquery-metabase-demo.md @@ -40,10 +40,10 @@ With the database uploaded to BigQuery, we were now ready to build a dashboard. The DVD store database contains data on the products (film DVDs), product categories, existing inventory, customers, orders, order histories etc. For the purpose of the dashboard, we decided to explore the question: *How many orders are being placed each month and which films and film categories are the highest selling?* -![orders_chart.png](/img/experiment3_dashboard_orders_chart.png) ![top_selling_tables.png](/img/experiment3_dashboard_top_selling_tables.png) +![orders_chart.png](https://storage.googleapis.com/dlt-blog-images/experiment3_dashboard_orders_chart.png) ![top_selling_tables.png](https://storage.googleapis.com/dlt-blog-images/experiment3_dashboard_top_selling_tables.png) In addition to this, we were also able to set up email alerts to get notified whenever the stock of a DVD was either empty or close to emptying. -![low_stock_email_alert.png](/img/experiment3_low_stock_email_alert.png) +![low_stock_email_alert.png](https://storage.googleapis.com/dlt-blog-images/experiment3_low_stock_email_alert.png) ### 3. Deploying the pipeline diff --git a/docs/website/blog/2023-05-26-structured-data-lakes-through-schema-evolution-next-generation-data-platform.md b/docs/website/blog/2023-05-26-structured-data-lakes-through-schema-evolution-next-generation-data-platform.md index 10a397adf3..1192f47abf 100644 --- a/docs/website/blog/2023-05-26-structured-data-lakes-through-schema-evolution-next-generation-data-platform.md +++ b/docs/website/blog/2023-05-26-structured-data-lakes-through-schema-evolution-next-generation-data-platform.md @@ -103,7 +103,7 @@ To try out schema evolution with `dlt`, check out our [colab demo.](https://cola -![colab demo](/img/schema_evolution_colab_demo_light.png) +![colab demo](https://storage.googleapis.com/dlt-blog-images/schema_evolution_colab_demo_light.png) ### Want more? diff --git a/docs/website/blog/2023-06-05-google-sheets-to-data-warehouse-pipeline.md b/docs/website/blog/2023-06-05-google-sheets-to-data-warehouse-pipeline.md index 05c100bebc..e934f866ad 100644 --- a/docs/website/blog/2023-06-05-google-sheets-to-data-warehouse-pipeline.md +++ b/docs/website/blog/2023-06-05-google-sheets-to-data-warehouse-pipeline.md @@ -1,7 +1,7 @@ --- slug: google-sheets-to-data-warehouse-pipeline title: Using the Google Sheets `dlt` pipeline in analytics and ML workflows -image: /img/experiment4-blog-image.png +image: https://storage.googleapis.com/dlt-blog-images/experiment4-blog-image.png authors: name: Rahul Joshi title: Data Science Intern at dltHub @@ -23,13 +23,13 @@ As an example of such a use-case, consider this very common scenario: You're the To demonstrate this process, we created some sample data where we stored costs related to some campaigns in a Google Sheet and and the rest of the related data in BigQuery. -![campaign-roi-google-sheets](/img/experiment4-campaign-roi-google-sheets.png) ![campaign-roi-data-warehouse](/img/experiment4-campaign-roi-datawarehouse.png) +![campaign-roi-google-sheets](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-google-sheets.png) ![campaign-roi-data-warehouse](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-datawarehouse.png) We then used the `dlt` google sheets pipeline by following [these](https://github.com/dlt-hub/google-sheets-bigquery-pipeline) simple steps to load the Google Sheets data into BigQuery. With the data loaded, we finally connected Metabase to the data warehouse and created a dashboard to understand the ROIs across each platform: -![campaign-roi-dashboard-1](/img/experiment4-campaign-roi-dashboard-1.png) -![campaign-roi-dashboard-2](/img/experiment4-campaign-roi-dashboard-2.png) +![campaign-roi-dashboard-1](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-dashboard-1.png) +![campaign-roi-dashboard-2](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-dashboard-2.png) ## Use-case #2: Evaluating the performance of your ML product using google sheets pipeline @@ -37,13 +37,13 @@ Another use-case for Google Sheets that we've come across frequently is to store A very common example for such a workflow is with customer support platforms that use text classfication models to categorize incoming customer support tickets into different issue categories for an efficient routing and resolution of the tickets. To illustrate this example, we created a Google Sheet with issues manually annotated with a category. We also included other manually annotated features that might help measure the effectiveness of the platform, such as priority level for the tickets and customer feedback. -![customer-support-platform-google-sheets](/img/experiment4-customer-support-platform-google-sheets.png) +![customer-support-platform-google-sheets](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-google-sheets.png) We then populated a BigQuery dataset with potential product usage data, such as: the status of the ticket (open or closed), response and resolution times, whether the ticket was escalated etc. -![customer-support-platform-data-warehouse](/img/experiment4-customer-support-platform-data-warehouse.png) +![customer-support-platform-data-warehouse](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-data-warehouse.png) Then, as before, we loaded the google sheets data to the data warehouse using the `dlt` google sheets pipeline and following [these](https://github.com/dlt-hub/google-sheets-bigquery-pipeline) steps. Finally we connected Metabase to it and built a dashboard measuring the performance of the model over the period of a month: -![customer-support-platform-dashboard](/img/experiment4-customer-support-platform-dashboard.png) \ No newline at end of file +![customer-support-platform-dashboard](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-dashboard.png) \ No newline at end of file diff --git a/docs/website/blog/2023-06-14-dlthub-gpt-accelerated learning_01.md b/docs/website/blog/2023-06-14-dlthub-gpt-accelerated learning_01.md index 394504dc64..4c1963acb0 100644 --- a/docs/website/blog/2023-06-14-dlthub-gpt-accelerated learning_01.md +++ b/docs/website/blog/2023-06-14-dlthub-gpt-accelerated learning_01.md @@ -1,7 +1,7 @@ --- slug: training-gpt-with-opensource-codebases title: "GPT-accelerated learning: Understanding open source codebases" -image: /img/blog_gpt_1.jpg +image: https://storage.googleapis.com/dlt-blog-images/blog_gpt_1.jpg authors: name: Tong Chen title: Data Engineer Intern at dltHub @@ -150,7 +150,7 @@ After the walkthrough, we can start to experiment different questions and it wil Here, I asked " why should data teams use dlt? " -![chatgptq1](\img\chatgptQ1.png) +![chatgptq1](https://storage.googleapis.com/dlt-blog-images/chatgptQ1.png) It outputted: @@ -160,7 +160,7 @@ It outputted: Next, I asked " Who is dlt for? " -![chatgptq2](\img\chatgptQ2..png) +![chatgptq2](https://storage.googleapis.com/dlt-blog-images/chatgptQ2..png) It outputted: 1. `dlt` is meant to be accessible to every person on the data team, including data engineers, analysts, data scientists, and other stakeholders involved in data loading. It is designed to reduce knowledge requirements and enable collaborative working between engineers and analysts. diff --git a/docs/website/blog/2023-06-15-automating-data-engineers.md b/docs/website/blog/2023-06-15-automating-data-engineers.md index e29236c319..3247fd2638 100644 --- a/docs/website/blog/2023-06-15-automating-data-engineers.md +++ b/docs/website/blog/2023-06-15-automating-data-engineers.md @@ -12,7 +12,7 @@ tags: [data engineer shortage, structured data, schema evolution] # Automating the data engineer: Addressing the talent shortage -![automated pipeline automaton](/img/pipeline-automaton.png) +![automated pipeline automaton](https://storage.googleapis.com/dlt-blog-images/pipeline-automaton.png) ## Why is there a data engineer shortage? diff --git a/docs/website/blog/2023-06-20-dlthub-gptquestion1-.md b/docs/website/blog/2023-06-20-dlthub-gptquestion1-.md index 7b528fbf02..6a14013775 100644 --- a/docs/website/blog/2023-06-20-dlthub-gptquestion1-.md +++ b/docs/website/blog/2023-06-20-dlthub-gptquestion1-.md @@ -1,7 +1,7 @@ --- slug: trained-gpt-q&a title: "Hey GPT, tell me about dlthub!" -image: /img/traingptblog.jpg +image: https://storage.googleapis.com/dlt-blog-images/traingptblog.jpg authors: name: Tong Chen title: Data Engineer Intern at dltHub diff --git a/docs/website/blog/2023-06-26-dlthub-gptquestion2.md b/docs/website/blog/2023-06-26-dlthub-gptquestion2.md index c29a05a83e..75f10e362a 100644 --- a/docs/website/blog/2023-06-26-dlthub-gptquestion2.md +++ b/docs/website/blog/2023-06-26-dlthub-gptquestion2.md @@ -1,7 +1,7 @@ --- slug: trained-gpt-q&a-2 title: "dlt AI Assistant provides answers you need!" -image: /img/blog_gpt_QA2.jpg +image: https://storage.googleapis.com/dlt-blog-images/blog_gpt_QA2.jpg authors: name: Tong Chen title: Data Engineer Intern at dltHub @@ -100,7 +100,7 @@ Now we understand how `dlt` significantly improves our work efficiency! Want to ask your own questions to the `dlt` AI Assistant? Just click on the "Get Help" button located at the bottom right. -![dlthelp](\img\dlthelp.jpg) +![dlthelp](https://storage.googleapis.com/dlt-blog-images/dlthelp.jpg) *** [ What's more? ] diff --git a/docs/website/blog/2023-08-14-dlt-motherduck-blog.md b/docs/website/blog/2023-08-14-dlt-motherduck-blog.md index 9f48d808a5..2e023fce8b 100644 --- a/docs/website/blog/2023-08-14-dlt-motherduck-blog.md +++ b/docs/website/blog/2023-08-14-dlt-motherduck-blog.md @@ -1,7 +1,7 @@ --- slug: dlt-motherduck-demo title: "dlt-dbt-DuckDB-MotherDuck: My super simple and highly customizable approach to the Modern Data Stack in a box" -image: /img/dlt-motherduck-logos.png +image: https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-logos.png authors: name: Rahul Joshi title: Developer Relations at dltHub @@ -25,11 +25,11 @@ In my example, I wanted to customize reports on top of Google Analytics 4 (GA4) By first pulling all the data from different sources into DuckDB files in my laptop, I was able to do my development and customization locally. -![local-workflow](/img/dlt-motherduck-local-workflow.png) +![local-workflow](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-local-workflow.png) And then when I was ready to move to production, I was able to seamlessly switch from DuckDB to MotherDuck with almost no code re-writing! -![production-workflow](/img/dlt-motherduck-production-workflow.png) +![production-workflow](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-production-workflow.png) Thus I got a super simple and highly customizable MDS in a box that is also close to company production setting. @@ -90,11 +90,11 @@ This is a perfect problem to test out my new super simple and highly customizabl `dlt` simplifies this process by automatically normalizing such nested data on load. - ![nested-bigquery](/img/dlt-motherduck-nested-bigquery.png) + ![nested-bigquery](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-nested-bigquery.png) Example of what the nested data in BigQuery looks like. - ![normalized-bigquery](/img/dlt-motherduck-normalized-bigquery.png) + ![normalized-bigquery](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-normalized-bigquery.png) `dlt` loads the main data into table `ga_events`, and creates another table `ga_events__event_params` for the nested data. @@ -109,7 +109,7 @@ This is a perfect problem to test out my new super simple and highly customizabl In this example, after running the BigQuery pipeline, the data was loaded into a locally created DuckDB file called ‘bigquery.duckdb’, and this allowed me to use python to the explore the loaded data: - ![duckdb-explore](/img/dlt-motherduck-duckdb-explore.png) + ![duckdb-explore](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-duckdb-explore.png) The best thing about using DuckDB is that it provides a local testing and development environment. This means that you can quickly and without any additional costs test and validate your workflow before deploying it to production. @@ -127,13 +127,13 @@ This is a perfect problem to test out my new super simple and highly customizabl Metabase OSS has a DuckDB driver, which meant that I could simply point it to the DuckDB files in my system and build a dashboard on top of this data. - ![dashboard-1](/img/dlt-motherduck-dashboard-1.png) + ![dashboard-1](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-1.png) - ![dashboard-2](/img/dlt-motherduck-dashboard-2.png) + ![dashboard-2](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-2.png) - ![dashboard-3](/img/dlt-motherduck-dashboard-3.png) + ![dashboard-3](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-3.png) - ![dashboard-4](/img/dlt-motherduck-dashboard-4.png) + ![dashboard-4](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-4.png) 5. **Going to production: Using MotherDuck as the destination** @@ -169,7 +169,7 @@ This is a perfect problem to test out my new super simple and highly customizabl In my example, after I load the data to MotherDuck, I can provide access to my team just by clicking on ‘Share’ in the menu of their web UI. - ![motherduck-share](/img/dlt-motherduck-share.png) + ![motherduck-share](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-share.png) **Conclusion:** diff --git a/docs/website/blog/2023-08-21-dlt-lineage-support.md b/docs/website/blog/2023-08-21-dlt-lineage-support.md index a76f89ed6a..e64de3f0aa 100644 --- a/docs/website/blog/2023-08-21-dlt-lineage-support.md +++ b/docs/website/blog/2023-08-21-dlt-lineage-support.md @@ -1,7 +1,7 @@ --- slug: dlt-lineage-support title: "Trust your data! Column and row level lineages, an explainer and a recipe." -image: /img/eye_of_data_lineage.png +image: https://storage.googleapis.com/dlt-blog-images/eye_of_data_lineage.png authors: name: Adrian Brudaru title: Open source data engineer diff --git a/docs/website/blog/2023-08-24-dlt-etlt.md b/docs/website/blog/2023-08-24-dlt-etlt.md index 3e27a21338..0b0b86de53 100644 --- a/docs/website/blog/2023-08-24-dlt-etlt.md +++ b/docs/website/blog/2023-08-24-dlt-etlt.md @@ -1,7 +1,7 @@ --- slug: dlt-etlt title: "The return of ETL in the Python age" -image: /img/went-full-etltlt.png +image: https://storage.googleapis.com/dlt-blog-images/went-full-etltlt.png authors: name: Adrian Brudaru title: Open source data engineer diff --git a/docs/website/blog/2023-09-05-mongo-etl.md b/docs/website/blog/2023-09-05-mongo-etl.md index 19e1f18682..e49efdf38f 100644 --- a/docs/website/blog/2023-09-05-mongo-etl.md +++ b/docs/website/blog/2023-09-05-mongo-etl.md @@ -1,7 +1,7 @@ --- slug: mongo-etl title: "Dumpster diving for data: The MongoDB experience" -image: /img/data-dumpster.png +image: https://storage.googleapis.com/dlt-blog-images/data-dumpster.png authors: name: Adrian Brudaru title: Open source data engineer diff --git a/docs/website/blog/2023-09-20-data-engineering-cv.md b/docs/website/blog/2023-09-20-data-engineering-cv.md index 545d6f0ecb..7010434a81 100644 --- a/docs/website/blog/2023-09-20-data-engineering-cv.md +++ b/docs/website/blog/2023-09-20-data-engineering-cv.md @@ -1,7 +1,7 @@ --- slug: data-engineering-cv title: "How to write a data engineering CV for Europe and America - A hiring manager’s perspective" -image: /img/dall-e-de-cv.png +image: https://storage.googleapis.com/dlt-blog-images/dall-e-de-cv.png authors: name: Adrian Brudaru title: Open source data engineer diff --git a/docs/website/blog/2023-09-26-verba-dlt-zendesk.md b/docs/website/blog/2023-09-26-verba-dlt-zendesk.md index 1990a5df7f..a8a198d1fd 100644 --- a/docs/website/blog/2023-09-26-verba-dlt-zendesk.md +++ b/docs/website/blog/2023-09-26-verba-dlt-zendesk.md @@ -1,7 +1,7 @@ --- slug: verba-dlt-zendesk title: "Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A Step by Step Guide" -image: /img/dlt-business-knowledge-retrieval-augmented-generation-diagram.png +image: https://storage.googleapis.com/dlt-blog-images/dlt-business-knowledge-retrieval-augmented-generation-diagram.png authors: name: Anton Burnashev title: Software Engineer @@ -16,7 +16,7 @@ As businesses scale and the volume of internal knowledge grows, it becomes incre With the latest advancements in large language models (LLMs) and [vector databases](https://weaviate.io/blog/what-is-a-vector-database), it's now possible to build a new class of tools that can help get insights from this data. One approach to do so is Retrieval-Augmented Generation (RAG). The idea behind RAGs is to retrieve relevant information from your database and use LLMs to generate a customised response to a question. Leveraging RAG enables the LLM to tailor its responses based on your proprietary data. -![Diagram illustrating the process of internal business knowledge retrieval and augmented generation (RAG), involving components like Salesforce, Zendesk, Asana, Jira, Notion, Slack and HubSpot, to answer user queries and generate responses.](/img/dlt-business-knowledge-retrieval-augmented-generation-diagram.png) +![Diagram illustrating the process of internal business knowledge retrieval and augmented generation (RAG), involving components like Salesforce, Zendesk, Asana, Jira, Notion, Slack and HubSpot, to answer user queries and generate responses.](https://storage.googleapis.com/dlt-blog-images/dlt-business-knowledge-retrieval-augmented-generation-diagram.png) One such source of internal knowledge is help desk software. It contains a wealth of information about the company's customers and their interactions with the support team. @@ -78,7 +78,7 @@ INFO: Application startup complete. Now, open your browser and navigate to [http://localhost:8000](http://localhost:8000/). -![A user interface screenshot showing Verba, retrieval and augmented generation chatbot, powered by Weaviate](/img/dlt-weaviate-verba-ui-1.png) +![A user interface screenshot showing Verba, retrieval and augmented generation chatbot, powered by Weaviate](https://storage.googleapis.com/dlt-blog-images/dlt-weaviate-verba-ui-1.png) Great! Verba is up and running. @@ -272,7 +272,7 @@ verba start Head back to [http://localhost:8000](http://localhost:8000/) and ask Verba a question. For example, "What are common issues our users report?". -![A user interface screenshot of Verba showing Zendesk tickets with different issues like API problems and update failures, with responses powered by Weaviate](/img/dlt-weaviate-verba-ui-2.png) +![A user interface screenshot of Verba showing Zendesk tickets with different issues like API problems and update failures, with responses powered by Weaviate](https://storage.googleapis.com/dlt-blog-images/dlt-weaviate-verba-ui-2.png) As you can see, Verba is able to retrieve relevant information from Zendesk Support and generate an answer to our question. It also displays the list of relevant documents for the question. You can click on them to see the full text. diff --git a/docs/website/blog/2023-10-06-dlt-holistics.md b/docs/website/blog/2023-10-06-dlt-holistics.md index b2791bd2ec..5e0bd20370 100644 --- a/docs/website/blog/2023-10-06-dlt-holistics.md +++ b/docs/website/blog/2023-10-06-dlt-holistics.md @@ -1,7 +1,7 @@ --- slug: MongoDB-dlt-Holistics title: "Modeling Unstructured Data for Self-Service Analytics with dlt and Holistics" -image: /img/dlt_holistics_overview.jpg +image: https://storage.googleapis.com/dlt-blog-images/dlt_holistics_overview.jpg authors: name: Zaeem Athar title: Junior Data Engineer @@ -19,7 +19,7 @@ In this blog, we will show you how you can combine `dlt` and **Holistics** and c ## An Overview of the MongoDB Modern Analytics Stack -![Diagram illustrating the inner workings of our Modern Analytics Stack](/img/dlt_holistics_overview.jpg) +![Diagram illustrating the inner workings of our Modern Analytics Stack](https://storage.googleapis.com/dlt-blog-images/dlt_holistics_overview.jpg) | Tool | Layer | Why it’s awesome | @@ -77,7 +77,7 @@ In addition to the transformation layer, Holistics provides advanced features su The overall Holistics workflow looks something like this: -![Holistics Overview](/img/holistics_overview.png) +![Holistics Overview](https://storage.googleapis.com/dlt-blog-images/holistics_overview.png) - Connect Holistics to an existing SQL data warehouse. - Data teams use Holistics Data Modeling to model and transform analytics data. This model layer is reusable across reports & datasets. @@ -286,7 +286,7 @@ To get a sense of what we accomplished let's examine what the unstructured data This is a typical way data is structured in a NoSQL database. The data is in a JSON-like format and contains nested data. Now, let's look at what is loaded in BigQuery. Below you can see the same data in BigQuery. -![BigQuery Data Overview](/img/dlt_holistics_bigquery_data.png) +![BigQuery Data Overview](https://storage.googleapis.com/dlt-blog-images/dlt_holistics_bigquery_data.png) The ddl (data definition language) for the movies table in BigQuery can be seen below: @@ -336,7 +336,7 @@ CREATE TABLE `dlthub-analytics.mongo_database.movies` If you compare the ddl against the sample document in MongoDB you will notice that the nested arrays such as **CAST** are missing from the ddl in BigQuery. This is because of how dlt handles nested arrays. If we look at our database in BigQuery you can see the **CAST** is loaded as a separate table. -![BigQuery Table Overview](/img/dlt_holistics_bigquery_table.png) +![BigQuery Table Overview](https://storage.googleapis.com/dlt-blog-images/dlt_holistics_bigquery_table.png) `dlt` normalises nested data by populating them in separate tables and creates relationships between the tables, so they can be combined together using normal SQL joins. All this is taken care of by `dlt` and we need not worry about how transformations are handled. In short, the transformation steps we discussed in [Why is dlt useful when you want to ingest data from a production database such as MongoDB?](#why-is-dlt-useful-when-you-want-to-ingest-data-from-a-production-database-such-as-mongodb) are taken care of by dlt, making the data analyst's life easier. @@ -375,7 +375,7 @@ In Holistics, go to the **Modelling 4.0** section from the top bar. We will be g Under the Models folder, let's add the MongoDB data from BigQuery as Table Models. Hover over the Models folder and click on the (+) sign then select **Add Table Model.** In the **Data Sources** select the BigQuery Source we created before and then select the relevant table models to import into Holistics. In this case, we are importing the `movies`, `movies_cast` and `movies_directors` tables. -![Holistics Add Model](/img/holistics_add_model.png) +![Holistics Add Model](https://storage.googleapis.com/dlt-blog-images/holistics_add_model.png) #### **Adding Holistics Dataset(s) and Relationships:** @@ -389,11 +389,11 @@ Datasets works like a data marts, except that it exists only on the semantic lay Hover over the Datasets folder, click on the (+) sign, and then select **Add Datasets.** Select the previously created Table Models under this dataset, and **Create Dataset**. -![Holistics Create Dataset](/img/holistics_add_dataset.png) +![Holistics Create Dataset](https://storage.googleapis.com/dlt-blog-images/holistics_add_dataset.png) We will then be asked to create relationships between the models. We create a **Many-to-one (n - 1)** relationship between the `cast` and the `movies` models. -![Add Relationship between Models](/img/holistics_add_relationship.png) +![Add Relationship between Models](https://storage.googleapis.com/dlt-blog-images/holistics_add_relationship.png) The resulting relationship can seen As Code using the Holistics 4.0 Analytics as Code feature. To activate this feature click on the newly created dataset and select the **View as Code** option from the top right. For more detailed instructions on setting up relationships between models refer to the model relationship [guide](https://docs.holistics.io/docs/relationships#automatic-relationship-creation). @@ -435,7 +435,7 @@ Dataset movies { The corresponding view for the `dataset.aml` file in the GUI looks like this: -![Add Relationship GUI](/img/holistics_relationship_gui.png) +![Add Relationship GUI](https://storage.googleapis.com/dlt-blog-images/holistics_relationship_gui.png) Once the relationships between the tables have been defined we are all set to create some visualizations. We can select the **Preview** option from next to the View as Code toggle to create some visualization in the development mode. This comes in handy if we have connected an external git repository to track our changes, this way we could test out the dataset in preview mode before committing and pushing changes, and deploying the dataset to production. @@ -447,11 +447,11 @@ The Movies dataset should now be available in the Reporting section. We will cre The visualization part is pretty self-explanatory and is mostly drag and drop as we took the time to define the relationships between the tables. Below we create a simple table in Holistics that shows the actors that have appeared in most movies since the year 2000. -![Holistics Create Visualization](/img/Holistics_new.gif) +![Holistics Create Visualization](https://storage.googleapis.com/dlt-blog-images/Holistics_new.gif) Similarly, we can add other reports and combine them into a dashboard. The resulting dashboard can be seen below: -![Holistics Dashboard](/img/holistics_dashboard.png) +![Holistics Dashboard](https://storage.googleapis.com/dlt-blog-images/holistics_dashboard.png) ## Conclusion diff --git a/docs/website/blog/2023-10-09-dlt-ops-startups.md b/docs/website/blog/2023-10-09-dlt-ops-startups.md index 94c1ff662b..33e5090b06 100644 --- a/docs/website/blog/2023-10-09-dlt-ops-startups.md +++ b/docs/website/blog/2023-10-09-dlt-ops-startups.md @@ -1,7 +1,7 @@ --- slug: dlt-ops-startups title: "PDF invoices → Real-time financial insights: How I stopped relying on an engineer to automate my workflow and learnt to do it myself" -image: /img/invoice_flowchart.png +image: https://storage.googleapis.com/dlt-blog-images/invoice_flowchart.png authors: name: Anna Hoffmann title: COO @@ -20,7 +20,7 @@ So, I often end up doing manual tasks such as budgeting, cost estimation, updati For example, I need to analyze expenses in order to prepare a budget estimation. I get numerous PDFs daily in a dedicated Gmail group inbox. I was wondering to which extent [dlt](https://github.com/dlt-hub/dlt) can help fulfill my automation dream. I decided to work with Alena from our data team on an internal project. -![invoice flow chart](/img/invoice_flowchart.png) +![invoice flow chart](https://storage.googleapis.com/dlt-blog-images/invoice_flowchart.png) ## Use Case @@ -130,16 +130,16 @@ Now you can [deploy this script with GitHub Actions](https://dlthub.com/docs/wal Here’s how the result looks like in BigQuery: -![screenshot 1](/img/pdf_parse_outcome_1.png) +![screenshot 1](https://storage.googleapis.com/dlt-blog-images/pdf_parse_outcome_1.png) …and as a Google Sheet. You can easily export this table from BigQuery to Google Sheets using the Export button in the top right corner. -![screenshot 2](/img/pdf_parse_outcome_2.png) +![screenshot 2](https://storage.googleapis.com/dlt-blog-images/pdf_parse_outcome_2.png) Bonus: In order to have a Google Sheet with live updates, you can go to the Data tab in your Spreadsheet → Data Connectors → BigQuery → choose your database and voila, your data will be updated automatically. -![screenshot 3](/img/pdf_parse_outcome_3.png) +![screenshot 3](https://storage.googleapis.com/dlt-blog-images/pdf_parse_outcome_3.png) # **Conclusion:** diff --git a/docs/website/blog/2023-10-10-data-product-docs.md b/docs/website/blog/2023-10-10-data-product-docs.md index d5d71ade54..2adfd42675 100644 --- a/docs/website/blog/2023-10-10-data-product-docs.md +++ b/docs/website/blog/2023-10-10-data-product-docs.md @@ -1,7 +1,7 @@ --- slug: data-product-docs title: "The role of docs in data products" -image: /img/parrot-baby.gif +image: https://storage.googleapis.com/dlt-blog-images/parrot-baby.gif authors: name: Adrian Brudaru title: Open source data engineer @@ -54,7 +54,7 @@ Examples of data products: The term product assumes more than just some code. A "quick and dirty" pipeline is what you would call a "proof of concept" in the product world and far from a product. -![Who the duck wrote this garbage??? Ah nvm… it was me…](/img/parrot-baby.gif) +![Who the duck wrote this garbage??? Ah nvm… it was me…](https://storage.googleapis.com/dlt-blog-images/parrot-baby.gif) > Who the duck wrote this trash??? Ahhhhh it was me :( ... To create a product, you need to consider how it will be used, by whom, and enable that usage by others. diff --git a/docs/website/blog/2023-10-16-first-data-warehouse.md b/docs/website/blog/2023-10-16-first-data-warehouse.md index 79186fd267..e1e1c759ca 100644 --- a/docs/website/blog/2023-10-16-first-data-warehouse.md +++ b/docs/website/blog/2023-10-16-first-data-warehouse.md @@ -1,7 +1,7 @@ --- slug: first-data-warehouse title: "Your first data warehouse: A practical approach" -image: /img/oil-painted-dashboard.png +image: https://storage.googleapis.com/dlt-blog-images/oil-painted-dashboard.png authors: name: Adrian Brudaru title: Open source data engineer @@ -15,7 +15,7 @@ tags: [first data warehouse] Building a data warehouse is a complex endeavor, often too intricate to navigate flawlessly in the initial attempt. In this article, we'll provide insights and pointers to guide you in choosing the right stack for your data warehouse. -![hard coded dashboard](/img/oil-painted-dashboard.png) +![hard coded dashboard](https://storage.googleapis.com/dlt-blog-images/oil-painted-dashboard.png) diff --git a/docs/website/blog/2023-10-19-dbt-runners.md b/docs/website/blog/2023-10-19-dbt-runners.md index 713815abb0..7d2c1cc6f3 100644 --- a/docs/website/blog/2023-10-19-dbt-runners.md +++ b/docs/website/blog/2023-10-19-dbt-runners.md @@ -1,7 +1,7 @@ --- slug: dbt-runners-usage title: "Running dbt Cloud or core from python - use cases and simple solutions" -image: /img/purple-python-spoderweb.png +image: https://storage.googleapis.com/dlt-blog-images/purple-python-spoderweb.png authors: name: Adrian Brudaru title: Open source data engineer diff --git a/docs/website/blog/2023-10-23-arrow-loading.md b/docs/website/blog/2023-10-23-arrow-loading.md index 978586fa76..0a33929577 100644 --- a/docs/website/blog/2023-10-23-arrow-loading.md +++ b/docs/website/blog/2023-10-23-arrow-loading.md @@ -1,7 +1,7 @@ --- slug: dlt-arrow-loading title: "Get 30x speedups when reading databases with ConnectorX + Arrow + dlt" -image: /img/arrow_30x_faster.png +image: https://storage.googleapis.com/dlt-blog-images/arrow_30x_faster.png authors: name: Marcin Rudolf title: dltHub CTO diff --git a/docs/website/blog/2023-10-25-dlt-deepnote.md b/docs/website/blog/2023-10-25-dlt-deepnote.md index 864353a36d..d04b65f278 100644 --- a/docs/website/blog/2023-10-25-dlt-deepnote.md +++ b/docs/website/blog/2023-10-25-dlt-deepnote.md @@ -1,7 +1,7 @@ --- slug: deepnote-women-wellness-violence-tends title: "DLT & Deepnote in women's wellness and violence trends: A Visual Analysis" -image: /img/blog_deepnote_improved_flow.png +image: https://storage.googleapis.com/dlt-blog-images/blog_deepnote_improved_flow.png authors: name: Hiba Jamal title: Data Science intern at dlthub @@ -49,7 +49,7 @@ like, let’s list down the steps we usually undergo. ### The usual flow of data for data science projects -![usual flow](/img/blog_deepnote_usual_flow.png) +![usual flow](https://storage.googleapis.com/dlt-blog-images/blog_deepnote_usual_flow.png) We sign up for our jobs because we enjoy the last two activities the most. These parts have all the pretty charts, the flashy animations, and, if the stars align, include watching your @@ -109,7 +109,7 @@ Our next step could be using a visualization package like `matplotlib`, and othe We can reimagine the flow of data with dlt and Deepnote in the following way: -![revised flow](/img/blog_deepnote_improved_flow.png) +![revised flow](https://storage.googleapis.com/dlt-blog-images/blog_deepnote_improved_flow.png) We leave the loading of the raw data to dlt, while we leave the data exploration and visualization to the Deepnote interface. @@ -170,7 +170,7 @@ Take your average Notebook experience, and combine it with the powers of a colla At this point, we would probably move towards a `plt.plot` or `plt.bar` function. However, with Deepnote, the little Visualize button on top of any data frame will help us jump straight to an easy figure. Clicking on the Visualize button takes you to a new cell block, where you can choose your parameters, types of charts, and customization settings in the sidebar. The following chart is built from the `joined` data frame we defined above. -![chart](/img/blog_deepnote_chart.png) +![chart](https://storage.googleapis.com/dlt-blog-images/blog_deepnote_chart.png) And a stacked bar chart came into existence! A little note about the query results; the **value** column corresponds to how much (in %) a person justifies violence against women. An interesting yet disturbing insight from the above plot: in many countries, women condone violence against women as often if not more often than men do! @@ -206,7 +206,7 @@ Lastly, based on these indicators of wellness and violence about women, let’s Within these countries, the KMEANs algorithm converges to 4 clusters. -![clustering](/img/blog_deepnote_animation.gif) +![clustering](https://storage.googleapis.com/dlt-blog-images/blog_deepnote_animation.gif) The color bar shows us which color is associated to which cluster. Namely; 1: purple, 2: blue, 3: green, and 4: yellow. diff --git a/docs/website/blog/2023-10-26-dlt-prefect.md b/docs/website/blog/2023-10-26-dlt-prefect.md index 8bd6321489..3f778b0648 100644 --- a/docs/website/blog/2023-10-26-dlt-prefect.md +++ b/docs/website/blog/2023-10-26-dlt-prefect.md @@ -4,7 +4,7 @@ title: "Building resilient pipelines in minutes with dlt + Prefect" meta: - name: canonical content: https://www.prefect.io/blog/building-resilient-data-pipelines-in-minutes-with-dlt-prefect -image: /img/prefect-dlt.png +image: https://storage.googleapis.com/dlt-blog-images/prefect-dlt.png authors: name: Dylan Hughes & Chris Reuter title: Engineering & Community at Prefect.io diff --git a/docs/website/blog/2023-10-30-data-modelling-tools.md b/docs/website/blog/2023-10-30-data-modelling-tools.md index e5839ee66e..ca0c6e19f2 100644 --- a/docs/website/blog/2023-10-30-data-modelling-tools.md +++ b/docs/website/blog/2023-10-30-data-modelling-tools.md @@ -1,7 +1,7 @@ --- slug: semantic-modeling-tools-comparison title: "Semantic Modeling Capabilities of Power BI, GoodData & Metabase: A Comparison" -image: /img/people-stuck-with-tables-2.jpg +image: https://storage.googleapis.com/dlt-blog-images/people-stuck-with-tables-2.jpg authors: name: Hiba Jamal title: Data Science intern at dlthub @@ -10,7 +10,7 @@ authors: tags: [data modelling] --- -![cover](/img/people-stuck-with-tables-2.jpg) +![cover](https://storage.googleapis.com/dlt-blog-images/people-stuck-with-tables-2.jpg) DeepAI Image with prompt: People stuck with tables.