dlt-hub · zem360 · Apr 8, 2024 · Mar 27, 2024 · Mar 31, 2024
diff --git a/docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx b/docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx
@@ -20,7 +20,7 @@ Like so many others, we are excited about the project, too. Recently, we attende
 
 We aimed to identify the most popular reasons why people try out DuckDB with our research. We found five perspectives that people commonly have when trying out DuckDB.
 
-![Marcin watching a MotherDuck presentation](/img/Marcin-dltHub-DuckDB-DuckCon-Brussels.jpg)
+![Marcin watching a MotherDuck presentation](https://storage.googleapis.com/dlt-blog-images/Marcin-dltHub-DuckDB-DuckCon-Brussels.jpg)
 
 dltHub co-founder Marcin watching a MotherDuck presentation at DuckCon in Brussels in February
 

diff --git a/docs/website/blog/2023-03-16-is-duckdb-a-database-for-ducks.mdx b/docs/website/blog/2023-03-16-is-duckdb-a-database-for-ducks.mdx
@@ -58,7 +58,7 @@ Check this out in the [Colab notebook](https://colab.research.google.com/drive/1
 
 Okay. It’s called DuckDB because ducks are amazing and [@hannes](https://github.com/hannes) once had a pet duck 🤣
 
-![Why "Duck" DB?](/img/why-duckdb.png)
+![Why "Duck" DB?](https://storage.googleapis.com/dlt-blog-images/why-duckdb.png)
 Source: [DuckDB: an Embeddable Analytical RDBMS](https://db.in.tum.de/teaching/ss19/moderndbs/duckdb-tum.pdf)
 
 ## Enjoy this blog post? Give data load tool (dlt) a ⭐ on GitHub [here](https://github.com/dlt-hub/dlt) 🤜🤛
diff --git a/docs/website/blog/2023-04-27-ga4-internal-dashboard-demo.md b/docs/website/blog/2023-04-27-ga4-internal-dashboard-demo.md
@@ -23,7 +23,7 @@ We decided to make a dashboard that helps us better understand data attribution
 
 ### Internal dashboard
 
-![Dashboard 1](/img/g4_dashboard_screen_grab_1.jpg) ![Dashboard 2](/img/g4_dashboard_screen_grab_2.jpg)
+![Dashboard 1](https://storage.googleapis.com/dlt-blog-images/g4_dashboard_screen_grab_1.jpg) ![Dashboard 2](https://storage.googleapis.com/dlt-blog-images/g4_dashboard_screen_grab_2.jpg)
 
 With the data loaded locally, we were able to build the dashboard on our system using Streamlit. You can also do this on your system by simply cloning [this repo](https://github.com/dlt-hub/ga4-internal-dashboard-demo) and following the steps listed [here](https://github.com/dlt-hub/ga4-internal-dashboard-demo/tree/main/intial-explorations).
 

diff --git a/docs/website/blog/2023-05-15-hacker-news-gpt-4-dashboard-demo.md b/docs/website/blog/2023-05-15-hacker-news-gpt-4-dashboard-demo.md
@@ -29,7 +29,7 @@ Now that the comments were loaded, we were ready to use GPT-4 to create a one se
 
 Since these comments were posted in response to stories or other comments, we fed in the story title and any parent comments as context in the prompt. To avoid hitting rate-limit error and losing all progress, we ran this for 100 comments at a time, saving the results in the CSV file each time. We then built a streamlit app to load and display them in a dashboard. Here is what the dashboard looks like:
 
-![dashboard.png](/img/hn_gpt_dashboard.png)
+![dashboard.png](https://storage.googleapis.com/dlt-blog-images/hn_gpt_dashboard.png)
 
 ## Deploying the pipeline, Google Bigquery, and Streamlit app
 

diff --git a/docs/website/blog/2023-05-25-postgresql-bigquery-metabase-demo.md b/docs/website/blog/2023-05-25-postgresql-bigquery-metabase-demo.md
@@ -40,10 +40,10 @@ With the database uploaded to BigQuery, we were now ready to build a dashboard.
 
 The DVD store database contains data on the products (film DVDs), product categories, existing inventory, customers, orders, order histories etc. For the purpose of the dashboard, we decided to explore the question: *How many orders are being placed each month and which films and film categories are the highest selling?*  
 
-![orders_chart.png](/img/experiment3_dashboard_orders_chart.png)   ![top_selling_tables.png](/img/experiment3_dashboard_top_selling_tables.png)
+![orders_chart.png](https://storage.googleapis.com/dlt-blog-images/experiment3_dashboard_orders_chart.png)   ![top_selling_tables.png](https://storage.googleapis.com/dlt-blog-images/experiment3_dashboard_top_selling_tables.png)
 In addition to this, we were also able to set up email alerts to get notified whenever the stock of a DVD was either empty or close to emptying.
 
-![low_stock_email_alert.png](/img/experiment3_low_stock_email_alert.png) 
+![low_stock_email_alert.png](https://storage.googleapis.com/dlt-blog-images/experiment3_low_stock_email_alert.png) 
 
 ### 3. Deploying the pipeline
 

diff --git a/...structured-data-lakes-through-schema-evolution-next-generation-data-platform.md b/...structured-data-lakes-through-schema-evolution-next-generation-data-platform.md
@@ -103,7 +103,7 @@ To try out schema evolution with `dlt`, check out our [colab demo.](https://cola
 
 
 
-![colab demo](/img/schema_evolution_colab_demo_light.png)
+![colab demo](https://storage.googleapis.com/dlt-blog-images/schema_evolution_colab_demo_light.png)
 
 ### Want more?
 

diff --git a/docs/website/blog/2023-06-05-google-sheets-to-data-warehouse-pipeline.md b/docs/website/blog/2023-06-05-google-sheets-to-data-warehouse-pipeline.md
@@ -1,7 +1,7 @@
 ---
 slug: google-sheets-to-data-warehouse-pipeline
 title: Using the Google Sheets `dlt` pipeline in analytics and ML workflows
-image: /img/experiment4-blog-image.png
+image: https://storage.googleapis.com/dlt-blog-images/experiment4-blog-image.png
 authors:
   name: Rahul Joshi
   title: Data Science Intern at dltHub
@@ -23,27 +23,27 @@ As an example of such a use-case, consider this very common scenario: You're the
 
 To demonstrate this process,  we created some sample data where we stored costs related to some campaigns in a Google Sheet and and the rest of the related data in BigQuery.  
 
-![campaign-roi-google-sheets](/img/experiment4-campaign-roi-google-sheets.png) ![campaign-roi-data-warehouse](/img/experiment4-campaign-roi-datawarehouse.png) 
+![campaign-roi-google-sheets](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-google-sheets.png) ![campaign-roi-data-warehouse](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-datawarehouse.png) 
 
 We then used the `dlt` google sheets pipeline by following [these](https://github.com/dlt-hub/google-sheets-bigquery-pipeline) simple steps to load the Google Sheets data into BigQuery.
 
 With the data loaded, we finally connected Metabase to the data warehouse and created a dashboard to understand the ROIs across each platform:
-![campaign-roi-dashboard-1](/img/experiment4-campaign-roi-dashboard-1.png)  
-![campaign-roi-dashboard-2](/img/experiment4-campaign-roi-dashboard-2.png)  
+![campaign-roi-dashboard-1](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-dashboard-1.png)  
+![campaign-roi-dashboard-2](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-dashboard-2.png)  
 
 ## Use-case #2: Evaluating the performance of your ML product using google sheets pipeline
 
 Another use-case for Google Sheets that we've come across frequently is to store annotated training data for building machine learning (ML) products. This process usually involves a human first manually doing the annotation and creating the training set in Google Sheets. Once there is sufficient data, the next step is to train and deploy the ML model. After the ML model is ready and deployed, the final step would be to create a workflow to measure its performance. Which, depending on the data and product, might involve combining the manually annotated Google Sheets data with the product usage data that is typically stored in some data warehouse
 
 A very common example for such a workflow is with customer support platforms that use text classfication models to categorize incoming customer support tickets into different issue categories for an efficient routing and resolution of the tickets. To illustrate this example, we created a Google Sheet with issues manually annotated with a category. We also included other manually annotated features that might help measure the effectiveness of the platform, such as priority level for the tickets and customer feedback.
 
-![customer-support-platform-google-sheets](/img/experiment4-customer-support-platform-google-sheets.png)
+![customer-support-platform-google-sheets](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-google-sheets.png)
 
 We then populated a BigQuery dataset with potential product usage data, such as: the status of the ticket (open or closed), response and resolution times, whether the ticket was escalated etc.
-![customer-support-platform-data-warehouse](/img/experiment4-customer-support-platform-data-warehouse.png)  
+![customer-support-platform-data-warehouse](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-data-warehouse.png)  
 
 Then, as before, we loaded the google sheets data to the data warehouse using the `dlt` google sheets pipeline and following [these](https://github.com/dlt-hub/google-sheets-bigquery-pipeline) steps.  
 
 Finally we connected Metabase to it and built a dashboard measuring the performance of the model over the period of a month:
 
-![customer-support-platform-dashboard](/img/experiment4-customer-support-platform-dashboard.png) 
+![customer-support-platform-dashboard](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-dashboard.png) 
diff --git a/docs/website/blog/2023-06-14-dlthub-gpt-accelerated learning_01.md b/docs/website/blog/2023-06-14-dlthub-gpt-accelerated learning_01.md
@@ -1,7 +1,7 @@
 ---
 slug: training-gpt-with-opensource-codebases
 title: "GPT-accelerated learning: Understanding open source codebases"
-image: /img/blog_gpt_1.jpg
+image: https://storage.googleapis.com/dlt-blog-images/blog_gpt_1.jpg
 authors:
   name: Tong Chen
   title: Data Engineer Intern at dltHub
@@ -150,7 +150,7 @@ After the walkthrough, we can start to experiment different questions and it wil
 
 Here, I asked " why should data teams use dlt? " 
 
-![chatgptq1](\img\chatgptQ1.png)  
+![chatgptq1](https://storage.googleapis.com/dlt-blog-images/chatgptQ1.png)  
 
 It outputted:
 
@@ -160,7 +160,7 @@ It outputted:
 
 Next, I asked " Who is dlt for? "  
 
-![chatgptq2](\img\chatgptQ2..png)  
+![chatgptq2](https://storage.googleapis.com/dlt-blog-images/chatgptQ2..png)  
 
 It outputted:
 1. `dlt` is meant to be accessible to every person on the data team, including data engineers, analysts, data scientists, and other stakeholders involved in data loading. It is designed to reduce knowledge requirements and enable collaborative working between engineers and analysts.

diff --git a/docs/website/blog/2023-06-15-automating-data-engineers.md b/docs/website/blog/2023-06-15-automating-data-engineers.md
@@ -12,7 +12,7 @@ tags: [data engineer shortage, structured data, schema evolution]
 # Automating the data engineer: Addressing the talent shortage
 
 
-![automated pipeline automaton](/img/pipeline-automaton.png)
+![automated pipeline automaton](https://storage.googleapis.com/dlt-blog-images/pipeline-automaton.png)
 
 
 ## Why is there a data engineer shortage?

diff --git a/docs/website/blog/2023-06-20-dlthub-gptquestion1-.md b/docs/website/blog/2023-06-20-dlthub-gptquestion1-.md
@@ -1,7 +1,7 @@
 ---
 slug: trained-gpt-q&a
 title: "Hey GPT, tell me about dlthub!"
-image: /img/traingptblog.jpg
+image: https://storage.googleapis.com/dlt-blog-images/traingptblog.jpg
 authors:
   name: Tong Chen
   title: Data Engineer Intern at dltHub

diff --git a/docs/website/blog/2023-06-26-dlthub-gptquestion2.md b/docs/website/blog/2023-06-26-dlthub-gptquestion2.md
@@ -1,7 +1,7 @@
 ---
 slug: trained-gpt-q&a-2
 title: "dlt AI Assistant provides answers you need!"
-image: /img/blog_gpt_QA2.jpg
+image: https://storage.googleapis.com/dlt-blog-images/blog_gpt_QA2.jpg
 authors:
   name: Tong Chen
   title: Data Engineer Intern at dltHub
@@ -100,7 +100,7 @@ Now we understand how `dlt` significantly improves our work efficiency!
 
 Want to ask your own questions to the `dlt` AI Assistant? Just click on the "Get Help" button located at the bottom right.
 
-![dlthelp](\img\dlthelp.jpg)  
+![dlthelp](https://storage.googleapis.com/dlt-blog-images/dlthelp.jpg)  
 
 *** 
 [ What's more? ]

diff --git a/docs/website/blog/2023-08-14-dlt-motherduck-blog.md b/docs/website/blog/2023-08-14-dlt-motherduck-blog.md
@@ -1,7 +1,7 @@
 ---
 slug: dlt-motherduck-demo
 title: "dlt-dbt-DuckDB-MotherDuck: My super simple and highly customizable approach to the Modern Data Stack in a box"
-image: /img/dlt-motherduck-logos.png
+image: https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-logos.png
 authors:
   name: Rahul Joshi
   title: Developer Relations at dltHub
@@ -25,11 +25,11 @@ In my example, I wanted to customize reports on top of Google Analytics 4 (GA4)
 
 By first pulling all the data from different sources into DuckDB files in my laptop, I was able to do my development and customization locally.  
 
-![local-workflow](/img/dlt-motherduck-local-workflow.png)  
+![local-workflow](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-local-workflow.png)  
 
 And then when I was ready to move to production, I was able to seamlessly switch from DuckDB to MotherDuck with almost no code re-writing!  
 
-![production-workflow](/img/dlt-motherduck-production-workflow.png)  
+![production-workflow](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-production-workflow.png)  
 
 Thus I got a super simple and highly customizable MDS in a box that is also close to company production setting.  
 
@@ -90,11 +90,11 @@ This is a perfect problem to test out my new super simple and highly customizabl
 
         `dlt` simplifies this process by automatically normalizing such nested data on load.
 
-        ![nested-bigquery](/img/dlt-motherduck-nested-bigquery.png)
+        ![nested-bigquery](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-nested-bigquery.png)
 
         Example of what the nested data in BigQuery looks like.
 
-        ![normalized-bigquery](/img/dlt-motherduck-normalized-bigquery.png)
+        ![normalized-bigquery](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-normalized-bigquery.png)
 
         `dlt` loads the main data into table `ga_events`, and creates another table `ga_events__event_params` for the nested data.
 
@@ -109,7 +109,7 @@ This is a perfect problem to test out my new super simple and highly customizabl
 
     In this example, after running the BigQuery pipeline, the data was loaded into a locally created DuckDB file called ‘bigquery.duckdb’, and this allowed me to use python to the explore the loaded data:
 
-    ![duckdb-explore](/img/dlt-motherduck-duckdb-explore.png)
+    ![duckdb-explore](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-duckdb-explore.png)
 
     The best thing about using DuckDB is that it provides a local testing and development environment. This means that you can quickly and without any additional costs test and validate your workflow before deploying it to production.
 
@@ -127,13 +127,13 @@ This is a perfect problem to test out my new super simple and highly customizabl
 
     Metabase OSS has a DuckDB driver, which meant that I could simply point it to the DuckDB files in my system and build a dashboard on top of this data.
 
-    ![dashboard-1](/img/dlt-motherduck-dashboard-1.png)
+    ![dashboard-1](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-1.png)
 
-    ![dashboard-2](/img/dlt-motherduck-dashboard-2.png)
+    ![dashboard-2](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-2.png)
 
-    ![dashboard-3](/img/dlt-motherduck-dashboard-3.png)
+    ![dashboard-3](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-3.png)
 
-    ![dashboard-4](/img/dlt-motherduck-dashboard-4.png)
+    ![dashboard-4](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-4.png)
 
 5. **Going to production: Using MotherDuck as the destination**
 
@@ -169,7 +169,7 @@ This is a perfect problem to test out my new super simple and highly customizabl
 
         In my example, after I load the data to MotherDuck, I can provide access to my team just by clicking on ‘Share’ in the menu of their web UI.
 
-        ![motherduck-share](/img/dlt-motherduck-share.png)
+        ![motherduck-share](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-share.png)
 
 **Conclusion:**
 

diff --git a/docs/website/blog/2023-08-21-dlt-lineage-support.md b/docs/website/blog/2023-08-21-dlt-lineage-support.md
@@ -1,7 +1,7 @@
 ---
 slug: dlt-lineage-support
 title: "Trust your data! Column and row level lineages, an explainer and a recipe."
-image: /img/eye_of_data_lineage.png
+image: https://storage.googleapis.com/dlt-blog-images/eye_of_data_lineage.png
 authors:
   name: Adrian Brudaru
   title: Open source data engineer

diff --git a/docs/website/blog/2023-08-24-dlt-etlt.md b/docs/website/blog/2023-08-24-dlt-etlt.md
@@ -1,7 +1,7 @@
 ---
 slug: dlt-etlt
 title: "The return of ETL in the Python age"
-image: /img/went-full-etltlt.png
+image: https://storage.googleapis.com/dlt-blog-images/went-full-etltlt.png
 authors:
   name: Adrian Brudaru
   title: Open source data engineer

diff --git a/docs/website/blog/2023-09-05-mongo-etl.md b/docs/website/blog/2023-09-05-mongo-etl.md
@@ -1,7 +1,7 @@
 ---
 slug: mongo-etl
 title: "Dumpster diving for data: The MongoDB experience"
-image: /img/data-dumpster.png
+image: https://storage.googleapis.com/dlt-blog-images/data-dumpster.png
 authors:
   name: Adrian Brudaru
   title: Open source data engineer

diff --git a/docs/website/blog/2023-09-20-data-engineering-cv.md b/docs/website/blog/2023-09-20-data-engineering-cv.md
@@ -1,7 +1,7 @@
 ---
 slug: data-engineering-cv
 title: "How to write a data engineering CV for Europe and America - A hiring manager’s perspective"
-image: /img/dall-e-de-cv.png
+image: https://storage.googleapis.com/dlt-blog-images/dall-e-de-cv.png
 authors:
   name: Adrian Brudaru
   title: Open source data engineer

diff --git a/docs/website/blog/2023-09-26-verba-dlt-zendesk.md b/docs/website/blog/2023-09-26-verba-dlt-zendesk.md
@@ -1,7 +1,7 @@
 ---
 slug: verba-dlt-zendesk
 title: "Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A Step by Step Guide"
-image: /img/dlt-business-knowledge-retrieval-augmented-generation-diagram.png
+image: https://storage.googleapis.com/dlt-blog-images/dlt-business-knowledge-retrieval-augmented-generation-diagram.png
 authors:
   name: Anton Burnashev
   title: Software Engineer
@@ -16,7 +16,7 @@ As businesses scale and the volume of internal knowledge grows, it becomes incre
 
 With the latest advancements in large language models (LLMs) and [vector databases](https://weaviate.io/blog/what-is-a-vector-database), it's now possible to build a new class of tools that can help get insights from this data. One approach to do so is Retrieval-Augmented Generation (RAG). The idea behind RAGs is to retrieve relevant information from your database and use LLMs to generate a customised response to a question. Leveraging RAG enables the LLM to tailor its responses based on your proprietary data.
 
-![Diagram illustrating the process of internal business knowledge retrieval and augmented generation (RAG), involving components like Salesforce, Zendesk, Asana, Jira, Notion, Slack and HubSpot, to answer user queries and generate responses.](/img/dlt-business-knowledge-retrieval-augmented-generation-diagram.png)
+![Diagram illustrating the process of internal business knowledge retrieval and augmented generation (RAG), involving components like Salesforce, Zendesk, Asana, Jira, Notion, Slack and HubSpot, to answer user queries and generate responses.](https://storage.googleapis.com/dlt-blog-images/dlt-business-knowledge-retrieval-augmented-generation-diagram.png)
 
 One such source of internal knowledge is help desk software. It contains a wealth of information about the company's customers and their interactions with the support team.
 
@@ -78,7 +78,7 @@ INFO:     Application startup complete.
 
 Now, open your browser and navigate to [http://localhost:8000](http://localhost:8000/).
 
-![A user interface screenshot showing Verba, retrieval and augmented generation chatbot, powered by Weaviate](/img/dlt-weaviate-verba-ui-1.png)
+![A user interface screenshot showing Verba, retrieval and augmented generation chatbot, powered by Weaviate](https://storage.googleapis.com/dlt-blog-images/dlt-weaviate-verba-ui-1.png)
 
 Great! Verba is up and running.
 
@@ -272,7 +272,7 @@ verba start
 
 Head back to [http://localhost:8000](http://localhost:8000/) and ask Verba a question. For example, "What are common issues our users report?".
 
-![A user interface screenshot of Verba showing Zendesk tickets with different issues like API problems and update failures, with responses powered by Weaviate](/img/dlt-weaviate-verba-ui-2.png)
+![A user interface screenshot of Verba showing Zendesk tickets with different issues like API problems and update failures, with responses powered by Weaviate](https://storage.googleapis.com/dlt-blog-images/dlt-weaviate-verba-ui-2.png)
 
 As you can see, Verba is able to retrieve relevant information from Zendesk Support and generate an answer to our question. It also displays the list of relevant documents for the question. You can click on them to see the full text.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -103,7 +103,7 @@ To try out schema evolution with `dlt`, check out our [colab demo.](https://cola



		![colab demo](/img/schema_evolution_colab_demo_light.png)
		![colab demo](https://storage.googleapis.com/dlt-blog-images/schema_evolution_colab_demo_light.png)

		### Want more?

Expand Down