Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed directory of all the blog images to google cloud storage. #1156

Merged
merged 2 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/website/blog/2023-03-09-duckdb-1M-downloads-users.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Like so many others, we are excited about the project, too. Recently, we attende

We aimed to identify the most popular reasons why people try out DuckDB with our research. We found five perspectives that people commonly have when trying out DuckDB.

![Marcin watching a MotherDuck presentation](/img/Marcin-dltHub-DuckDB-DuckCon-Brussels.jpg)
![Marcin watching a MotherDuck presentation](https://storage.googleapis.com/dlt-blog-images/Marcin-dltHub-DuckDB-DuckCon-Brussels.jpg)

dltHub co-founder Marcin watching a MotherDuck presentation at DuckCon in Brussels in February

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Check this out in the [Colab notebook](https://colab.research.google.com/drive/1

Okay. It’s called DuckDB because ducks are amazing and [@hannes](https://github.com/hannes) once had a pet duck 🤣

![Why "Duck" DB?](/img/why-duckdb.png)
![Why "Duck" DB?](https://storage.googleapis.com/dlt-blog-images/why-duckdb.png)
Source: [DuckDB: an Embeddable Analytical RDBMS](https://db.in.tum.de/teaching/ss19/moderndbs/duckdb-tum.pdf)

## Enjoy this blog post? Give data load tool (dlt) a ⭐ on GitHub [here](https://github.com/dlt-hub/dlt) 🤜🤛
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ We decided to make a dashboard that helps us better understand data attribution

### Internal dashboard

![Dashboard 1](/img/g4_dashboard_screen_grab_1.jpg) ![Dashboard 2](/img/g4_dashboard_screen_grab_2.jpg)
![Dashboard 1](https://storage.googleapis.com/dlt-blog-images/g4_dashboard_screen_grab_1.jpg) ![Dashboard 2](https://storage.googleapis.com/dlt-blog-images/g4_dashboard_screen_grab_2.jpg)

With the data loaded locally, we were able to build the dashboard on our system using Streamlit. You can also do this on your system by simply cloning [this repo](https://github.com/dlt-hub/ga4-internal-dashboard-demo) and following the steps listed [here](https://github.com/dlt-hub/ga4-internal-dashboard-demo/tree/main/intial-explorations).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Now that the comments were loaded, we were ready to use GPT-4 to create a one se

Since these comments were posted in response to stories or other comments, we fed in the story title and any parent comments as context in the prompt. To avoid hitting rate-limit error and losing all progress, we ran this for 100 comments at a time, saving the results in the CSV file each time. We then built a streamlit app to load and display them in a dashboard. Here is what the dashboard looks like:

![dashboard.png](/img/hn_gpt_dashboard.png)
![dashboard.png](https://storage.googleapis.com/dlt-blog-images/hn_gpt_dashboard.png)

## Deploying the pipeline, Google Bigquery, and Streamlit app

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ With the database uploaded to BigQuery, we were now ready to build a dashboard.

The DVD store database contains data on the products (film DVDs), product categories, existing inventory, customers, orders, order histories etc. For the purpose of the dashboard, we decided to explore the question: *How many orders are being placed each month and which films and film categories are the highest selling?*

![orders_chart.png](/img/experiment3_dashboard_orders_chart.png) ![top_selling_tables.png](/img/experiment3_dashboard_top_selling_tables.png)
![orders_chart.png](https://storage.googleapis.com/dlt-blog-images/experiment3_dashboard_orders_chart.png) ![top_selling_tables.png](https://storage.googleapis.com/dlt-blog-images/experiment3_dashboard_top_selling_tables.png)
In addition to this, we were also able to set up email alerts to get notified whenever the stock of a DVD was either empty or close to emptying.

![low_stock_email_alert.png](/img/experiment3_low_stock_email_alert.png)
![low_stock_email_alert.png](https://storage.googleapis.com/dlt-blog-images/experiment3_low_stock_email_alert.png)

### 3. Deploying the pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ To try out schema evolution with `dlt`, check out our [colab demo.](https://cola



![colab demo](/img/schema_evolution_colab_demo_light.png)
![colab demo](https://storage.googleapis.com/dlt-blog-images/schema_evolution_colab_demo_light.png)

### Want more?

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: google-sheets-to-data-warehouse-pipeline
title: Using the Google Sheets `dlt` pipeline in analytics and ML workflows
image: /img/experiment4-blog-image.png
image: https://storage.googleapis.com/dlt-blog-images/experiment4-blog-image.png
authors:
name: Rahul Joshi
title: Data Science Intern at dltHub
Expand All @@ -23,27 +23,27 @@ As an example of such a use-case, consider this very common scenario: You're the

To demonstrate this process, we created some sample data where we stored costs related to some campaigns in a Google Sheet and and the rest of the related data in BigQuery.

![campaign-roi-google-sheets](/img/experiment4-campaign-roi-google-sheets.png) ![campaign-roi-data-warehouse](/img/experiment4-campaign-roi-datawarehouse.png)
![campaign-roi-google-sheets](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-google-sheets.png) ![campaign-roi-data-warehouse](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-datawarehouse.png)

We then used the `dlt` google sheets pipeline by following [these](https://github.com/dlt-hub/google-sheets-bigquery-pipeline) simple steps to load the Google Sheets data into BigQuery.

With the data loaded, we finally connected Metabase to the data warehouse and created a dashboard to understand the ROIs across each platform:
![campaign-roi-dashboard-1](/img/experiment4-campaign-roi-dashboard-1.png)
![campaign-roi-dashboard-2](/img/experiment4-campaign-roi-dashboard-2.png)
![campaign-roi-dashboard-1](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-dashboard-1.png)
![campaign-roi-dashboard-2](https://storage.googleapis.com/dlt-blog-images/experiment4-campaign-roi-dashboard-2.png)

## Use-case #2: Evaluating the performance of your ML product using google sheets pipeline

Another use-case for Google Sheets that we've come across frequently is to store annotated training data for building machine learning (ML) products. This process usually involves a human first manually doing the annotation and creating the training set in Google Sheets. Once there is sufficient data, the next step is to train and deploy the ML model. After the ML model is ready and deployed, the final step would be to create a workflow to measure its performance. Which, depending on the data and product, might involve combining the manually annotated Google Sheets data with the product usage data that is typically stored in some data warehouse

A very common example for such a workflow is with customer support platforms that use text classfication models to categorize incoming customer support tickets into different issue categories for an efficient routing and resolution of the tickets. To illustrate this example, we created a Google Sheet with issues manually annotated with a category. We also included other manually annotated features that might help measure the effectiveness of the platform, such as priority level for the tickets and customer feedback.

![customer-support-platform-google-sheets](/img/experiment4-customer-support-platform-google-sheets.png)
![customer-support-platform-google-sheets](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-google-sheets.png)

We then populated a BigQuery dataset with potential product usage data, such as: the status of the ticket (open or closed), response and resolution times, whether the ticket was escalated etc.
![customer-support-platform-data-warehouse](/img/experiment4-customer-support-platform-data-warehouse.png)
![customer-support-platform-data-warehouse](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-data-warehouse.png)

Then, as before, we loaded the google sheets data to the data warehouse using the `dlt` google sheets pipeline and following [these](https://github.com/dlt-hub/google-sheets-bigquery-pipeline) steps.

Finally we connected Metabase to it and built a dashboard measuring the performance of the model over the period of a month:

![customer-support-platform-dashboard](/img/experiment4-customer-support-platform-dashboard.png)
![customer-support-platform-dashboard](https://storage.googleapis.com/dlt-blog-images/experiment4-customer-support-platform-dashboard.png)
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: training-gpt-with-opensource-codebases
title: "GPT-accelerated learning: Understanding open source codebases"
image: /img/blog_gpt_1.jpg
image: https://storage.googleapis.com/dlt-blog-images/blog_gpt_1.jpg
authors:
name: Tong Chen
title: Data Engineer Intern at dltHub
Expand Down Expand Up @@ -150,7 +150,7 @@ After the walkthrough, we can start to experiment different questions and it wil

Here, I asked " why should data teams use dlt? "

![chatgptq1](\img\chatgptQ1.png)
![chatgptq1](https://storage.googleapis.com/dlt-blog-images/chatgptQ1.png)

It outputted:

Expand All @@ -160,7 +160,7 @@ It outputted:

Next, I asked " Who is dlt for? "

![chatgptq2](\img\chatgptQ2..png)
![chatgptq2](https://storage.googleapis.com/dlt-blog-images/chatgptQ2..png)

It outputted:
1. `dlt` is meant to be accessible to every person on the data team, including data engineers, analysts, data scientists, and other stakeholders involved in data loading. It is designed to reduce knowledge requirements and enable collaborative working between engineers and analysts.
Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-06-15-automating-data-engineers.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ tags: [data engineer shortage, structured data, schema evolution]
# Automating the data engineer: Addressing the talent shortage


![automated pipeline automaton](/img/pipeline-automaton.png)
![automated pipeline automaton](https://storage.googleapis.com/dlt-blog-images/pipeline-automaton.png)


## Why is there a data engineer shortage?
Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-06-20-dlthub-gptquestion1-.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: trained-gpt-q&a
title: "Hey GPT, tell me about dlthub!"
image: /img/traingptblog.jpg
image: https://storage.googleapis.com/dlt-blog-images/traingptblog.jpg
authors:
name: Tong Chen
title: Data Engineer Intern at dltHub
Expand Down
4 changes: 2 additions & 2 deletions docs/website/blog/2023-06-26-dlthub-gptquestion2.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: trained-gpt-q&a-2
title: "dlt AI Assistant provides answers you need!"
image: /img/blog_gpt_QA2.jpg
image: https://storage.googleapis.com/dlt-blog-images/blog_gpt_QA2.jpg
authors:
name: Tong Chen
title: Data Engineer Intern at dltHub
Expand Down Expand Up @@ -100,7 +100,7 @@ Now we understand how `dlt` significantly improves our work efficiency!

Want to ask your own questions to the `dlt` AI Assistant? Just click on the "Get Help" button located at the bottom right.

![dlthelp](\img\dlthelp.jpg)
![dlthelp](https://storage.googleapis.com/dlt-blog-images/dlthelp.jpg)

***
[ What's more? ]
Expand Down
22 changes: 11 additions & 11 deletions docs/website/blog/2023-08-14-dlt-motherduck-blog.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: dlt-motherduck-demo
title: "dlt-dbt-DuckDB-MotherDuck: My super simple and highly customizable approach to the Modern Data Stack in a box"
image: /img/dlt-motherduck-logos.png
image: https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-logos.png
authors:
name: Rahul Joshi
title: Developer Relations at dltHub
Expand All @@ -25,11 +25,11 @@ In my example, I wanted to customize reports on top of Google Analytics 4 (GA4)

By first pulling all the data from different sources into DuckDB files in my laptop, I was able to do my development and customization locally.

![local-workflow](/img/dlt-motherduck-local-workflow.png)
![local-workflow](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-local-workflow.png)

And then when I was ready to move to production, I was able to seamlessly switch from DuckDB to MotherDuck with almost no code re-writing!

![production-workflow](/img/dlt-motherduck-production-workflow.png)
![production-workflow](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-production-workflow.png)

Thus I got a super simple and highly customizable MDS in a box that is also close to company production setting.

Expand Down Expand Up @@ -90,11 +90,11 @@ This is a perfect problem to test out my new super simple and highly customizabl

`dlt` simplifies this process by automatically normalizing such nested data on load.

![nested-bigquery](/img/dlt-motherduck-nested-bigquery.png)
![nested-bigquery](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-nested-bigquery.png)

Example of what the nested data in BigQuery looks like.

![normalized-bigquery](/img/dlt-motherduck-normalized-bigquery.png)
![normalized-bigquery](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-normalized-bigquery.png)

`dlt` loads the main data into table `ga_events`, and creates another table `ga_events__event_params` for the nested data.

Expand All @@ -109,7 +109,7 @@ This is a perfect problem to test out my new super simple and highly customizabl

In this example, after running the BigQuery pipeline, the data was loaded into a locally created DuckDB file called ‘bigquery.duckdb’, and this allowed me to use python to the explore the loaded data:

![duckdb-explore](/img/dlt-motherduck-duckdb-explore.png)
![duckdb-explore](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-duckdb-explore.png)

The best thing about using DuckDB is that it provides a local testing and development environment. This means that you can quickly and without any additional costs test and validate your workflow before deploying it to production.

Expand All @@ -127,13 +127,13 @@ This is a perfect problem to test out my new super simple and highly customizabl

Metabase OSS has a DuckDB driver, which meant that I could simply point it to the DuckDB files in my system and build a dashboard on top of this data.

![dashboard-1](/img/dlt-motherduck-dashboard-1.png)
![dashboard-1](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-1.png)

![dashboard-2](/img/dlt-motherduck-dashboard-2.png)
![dashboard-2](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-2.png)

![dashboard-3](/img/dlt-motherduck-dashboard-3.png)
![dashboard-3](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-3.png)

![dashboard-4](/img/dlt-motherduck-dashboard-4.png)
![dashboard-4](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-dashboard-4.png)

5. **Going to production: Using MotherDuck as the destination**

Expand Down Expand Up @@ -169,7 +169,7 @@ This is a perfect problem to test out my new super simple and highly customizabl

In my example, after I load the data to MotherDuck, I can provide access to my team just by clicking on ‘Share’ in the menu of their web UI.

![motherduck-share](/img/dlt-motherduck-share.png)
![motherduck-share](https://storage.googleapis.com/dlt-blog-images/dlt-motherduck-share.png)

**Conclusion:**

Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-08-21-dlt-lineage-support.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: dlt-lineage-support
title: "Trust your data! Column and row level lineages, an explainer and a recipe."
image: /img/eye_of_data_lineage.png
image: https://storage.googleapis.com/dlt-blog-images/eye_of_data_lineage.png
authors:
name: Adrian Brudaru
title: Open source data engineer
Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-08-24-dlt-etlt.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: dlt-etlt
title: "The return of ETL in the Python age"
image: /img/went-full-etltlt.png
image: https://storage.googleapis.com/dlt-blog-images/went-full-etltlt.png
authors:
name: Adrian Brudaru
title: Open source data engineer
Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-09-05-mongo-etl.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: mongo-etl
title: "Dumpster diving for data: The MongoDB experience"
image: /img/data-dumpster.png
image: https://storage.googleapis.com/dlt-blog-images/data-dumpster.png
authors:
name: Adrian Brudaru
title: Open source data engineer
Expand Down
2 changes: 1 addition & 1 deletion docs/website/blog/2023-09-20-data-engineering-cv.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: data-engineering-cv
title: "How to write a data engineering CV for Europe and America - A hiring manager’s perspective"
image: /img/dall-e-de-cv.png
image: https://storage.googleapis.com/dlt-blog-images/dall-e-de-cv.png
authors:
name: Adrian Brudaru
title: Open source data engineer
Expand Down
8 changes: 4 additions & 4 deletions docs/website/blog/2023-09-26-verba-dlt-zendesk.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
slug: verba-dlt-zendesk
title: "Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A Step by Step Guide"
image: /img/dlt-business-knowledge-retrieval-augmented-generation-diagram.png
image: https://storage.googleapis.com/dlt-blog-images/dlt-business-knowledge-retrieval-augmented-generation-diagram.png
authors:
name: Anton Burnashev
title: Software Engineer
Expand All @@ -16,7 +16,7 @@ As businesses scale and the volume of internal knowledge grows, it becomes incre

With the latest advancements in large language models (LLMs) and [vector databases](https://weaviate.io/blog/what-is-a-vector-database), it's now possible to build a new class of tools that can help get insights from this data. One approach to do so is Retrieval-Augmented Generation (RAG). The idea behind RAGs is to retrieve relevant information from your database and use LLMs to generate a customised response to a question. Leveraging RAG enables the LLM to tailor its responses based on your proprietary data.

![Diagram illustrating the process of internal business knowledge retrieval and augmented generation (RAG), involving components like Salesforce, Zendesk, Asana, Jira, Notion, Slack and HubSpot, to answer user queries and generate responses.](/img/dlt-business-knowledge-retrieval-augmented-generation-diagram.png)
![Diagram illustrating the process of internal business knowledge retrieval and augmented generation (RAG), involving components like Salesforce, Zendesk, Asana, Jira, Notion, Slack and HubSpot, to answer user queries and generate responses.](https://storage.googleapis.com/dlt-blog-images/dlt-business-knowledge-retrieval-augmented-generation-diagram.png)

One such source of internal knowledge is help desk software. It contains a wealth of information about the company's customers and their interactions with the support team.

Expand Down Expand Up @@ -78,7 +78,7 @@ INFO: Application startup complete.

Now, open your browser and navigate to [http://localhost:8000](http://localhost:8000/).

![A user interface screenshot showing Verba, retrieval and augmented generation chatbot, powered by Weaviate](/img/dlt-weaviate-verba-ui-1.png)
![A user interface screenshot showing Verba, retrieval and augmented generation chatbot, powered by Weaviate](https://storage.googleapis.com/dlt-blog-images/dlt-weaviate-verba-ui-1.png)

Great! Verba is up and running.

Expand Down Expand Up @@ -272,7 +272,7 @@ verba start

Head back to [http://localhost:8000](http://localhost:8000/) and ask Verba a question. For example, "What are common issues our users report?".

![A user interface screenshot of Verba showing Zendesk tickets with different issues like API problems and update failures, with responses powered by Weaviate](/img/dlt-weaviate-verba-ui-2.png)
![A user interface screenshot of Verba showing Zendesk tickets with different issues like API problems and update failures, with responses powered by Weaviate](https://storage.googleapis.com/dlt-blog-images/dlt-weaviate-verba-ui-2.png)

As you can see, Verba is able to retrieve relevant information from Zendesk Support and generate an answer to our question. It also displays the list of relevant documents for the question. You can click on them to see the full text.

Expand Down
Loading
Loading