first 15 pages

dlt-hub · Sep 17, 2024 · b6922d5 · b6922d5
1 parent 466fdf4
commit b6922d5
Show file tree

Hide file tree

Showing 17 changed files with 808 additions and 1,136 deletions.
diff --git a/docs/website/docs/_book-onboarding-call.md b/docs/website/docs/_book-onboarding-call.md
@@ -1 +1,2 @@
-<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">book a call</a> with a dltHub Solutions Engineer
+<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">Book a call</a> with a dltHub Solutions Engineer
+
diff --git a/docs/website/docs/build-a-pipeline-tutorial.md b/docs/website/docs/build-a-pipeline-tutorial.md
diff --git a/...ebsite/docs/general-usage/data-enrichments/user_agent_device_data_enrichment.md b/...ebsite/docs/general-usage/data-enrichments/user_agent_device_data_enrichment.md
@@ -1,44 +1,38 @@
 ---
-title:  User-agent device data enrichment
+title: User-agent device data enrichment
 description: Enriching the user-agent device data with average device price.
 keywords: [data enrichment, user-agent data, device enrichment]
 ---
 
 # Data enrichment part one: User-agent device data enrichment
 
-Data enrichment enhances raw data with valuable information from multiple sources, increasing its
-analytical and decision-making value.
+Data enrichment enhances raw data with valuable information from multiple sources, increasing its analytical and decision-making value.
 
-This part covers enriching sample data with device price. Understanding the price segment
-of the device that the user used to access your service can be helpful in personalized marketing,
-customer segmentation, and many more.
+This part covers enriching sample data with device price. Understanding the price segment of the device that the user used to access your service can be helpful in personalized marketing, customer segmentation, and many more.
 
-This documentation will discuss how to enrich the user device information with the average market
-price.
+This documentation will discuss how to enrich the user device information with the average market price.
 
-## Setup Guide
+## Setup guide
 
-We use SerpAPI to retrieve device prices using Google Shopping, but alternative services or APIs are
-viable.
+We use SerpAPI to retrieve device prices using Google Shopping, but alternative services or APIs are viable.
 
 :::note
-SerpAPI free tier offers 100 free calls monthly. For production, consider upgrading to a higher
-plan.
+SerpAPI free tier offers 100 free calls monthly. For production, consider upgrading to a higher plan.
 :::
 
-
 ## Creating data enrichment pipeline
-You can either follow the example in the linked Colab notebook or follow this documentation to
-create the user-agent device data enrichment pipeline.
+
+You can either follow the example in the linked Colab notebook or follow this documentation to create the user-agent device data enrichment pipeline.
 
 ### A. Colab notebook
-The Colab notebook combines three data enrichment processes for a sample dataset, starting with "Data
-enrichment part one: User-agent device data".
+
+The Colab notebook combines three data enrichment processes for a sample dataset, starting with "Data enrichment part one: User-agent device data."
 
 Here's the link to the notebook:
 **[Colab Notebook](https://colab.research.google.com/drive/1ZKEkf1LRSld7CWQFS36fUXjhJKPAon7P?usp=sharing).**
 
 ### B. Create a pipeline
+
 Alternatively, to create a data enrichment pipeline, you can start by creating the following directory structure:
 
 ```text
@@ -47,80 +41,72 @@ user_device_enrichment/
 │   └── secrets.toml
 └── device_enrichment_pipeline.py
 ```
+
 ### 1. Creating resource
 
-   `dlt` works on the principle of [sources](https://dlthub.com/docs/general-usage/source)
-   and [resources.](https://dlthub.com/docs/general-usage/resource)
+`dlt` works on the principle of [sources](https://dlthub.com/docs/general-usage/source) and [resources.](https://dlthub.com/docs/general-usage/resource)
 
-   This data resource yields data typical of what many web analytics and
-   tracking tools can collect. However, the specifics of what data is collected
-   and how it's used can vary significantly among different tracking services.
+This data resource yields data typical of what many web analytics and tracking tools can collect. However, the specifics of what data is collected and how it's used can vary significantly among different tracking services.
 
-   Let's examine a synthetic dataset created for this article. It includes:
+Let's examine a synthetic dataset created for this article. It includes:
 
-   `user_id`: Web trackers typically assign unique ID to users for
-   tracking their journeys and interactions over time.
+`user_id`: Web trackers typically assign a unique ID to users for tracking their journeys and interactions over time.
 
-   `device_name`: User device information helps in understanding the user base's device.
+`device_name`: User device information helps in understanding the user base's device.
 
-   `page_refer`: The referer URL is tracked to analyze traffic sources and user navigation behavior.
+`page_refer`: The referer URL is tracked to analyze traffic sources and user navigation behavior.
 
-   Here's the resource that yields the sample data as discussed above:
+Here's the resource that yields the sample data as discussed above:
 
-   ```py
-    import dlt
-
-    @dlt.resource(write_disposition="append")
-    def tracked_data():
-        """
-        A generator function that yields a series of dictionaries, each representing
-        user tracking data.
-
-        This function is decorated with `dlt.resource` to integrate into the DLT (Data
-        Loading Tool) pipeline. The `write_disposition` parameter is set to "append" to
-        ensure that data from this generator is appended to the existing data in the
-        destination table.
-
-        Yields:
-            dict: A dictionary with keys 'user_id', 'device_name', and 'page_referer',
-            representing the user's tracking data including their device and the page
-            they were referred from.
-        """
-
-        # Sample data representing tracked user data
-        sample_data = [
-            {"user_id": 1, "device_name": "Sony Experia XZ", "page_referer":
-            "https://b2venture.lightning.force.com/"},
-            {"user_id": 2, "device_name": "Samsung Galaxy S23 Ultra 5G",
-            "page_referer": "https://techcrunch.com/2023/07/20/can-dlthub-solve-the-python-library-problem-for-ai-dig-ventures-thinks-so/"},
-            {"user_id": 3, "device_name": "Apple iPhone 14 Pro Max",
-            "page_referer": "https://dlthub.com/success-stories/freelancers-perspective/"},
-            {"user_id": 4, "device_name": "OnePlus 11R",
-            "page_referer": "https://www.reddit.com/r/dataengineering/comments/173kp9o/ideas_for_data_validation_on_data_ingestion/"},
-            {"user_id": 5, "device_name": "Google Pixel 7 Pro", "page_referer": "https://pypi.org/"},
-        ]
-
-        # Yielding each user's data as a dictionary
-        for user_data in sample_data:
-            yield user_data
-   ```
+```py
+import dlt
+
+@dlt.resource(write_disposition="append")
+def tracked_data():
+    """
+    A generator function that yields a series of dictionaries, each representing
+    user tracking data.
+
+    This function is decorated with `dlt.resource` to integrate into the DLT (Data
+    Loading Tool) pipeline. The `write_disposition` parameter is set to "append" to
+    ensure that data from this generator is appended to the existing data in the
+    destination table.
+
+    Yields:
+        dict: A dictionary with keys 'user_id', 'device_name', and 'page_referer',
+        representing the user's tracking data including their device and the page
+        they were referred from.
+    """
+
+    # Sample data representing tracked user data
+    sample_data = [
+        {"user_id": 1, "device_name": "Sony Experia XZ", "page_referer":
+        "https://b2venture.lightning.force.com/"},
+        {"user_id": 2, "device_name": "Samsung Galaxy S23 Ultra 5G",
+        "page_referer": "https://techcrunch.com/2023/07/20/can-dlthub-solve-the-python-library-problem-for-ai-dig-ventures-thinks-so/"},
+        {"user_id": 3, "device_name": "Apple iPhone 14 Pro Max",
+        "page_referer": "https://dlthub.com/success-stories/freelancers-perspective/"},
+        {"user_id": 4, "device_name": "OnePlus 11R",
+        "page_referer": "https://www.reddit.com/r/dataengineering/comments/173kp9o/ideas_for_data_validation_on_data_ingestion/"},
+        {"user_id": 5, "device_name": "Google Pixel 7 Pro", "page_referer": "https://pypi.org/"},
+    ]
+
+    # Yielding each user's data as a dictionary
+    for user_data in sample_data:
+        yield user_data
+```
 
 ### 2. Create `fetch_average_price` function
 
-This particular function retrieves the average price of a device by utilizing SerpAPI and Google
-shopping listings. To filter the data, the function uses `dlt` state, and only fetches prices
-from SerpAPI for devices that have not been updated in the most recent run or for those that were
-loaded more than 180 days in the past.
+This particular function retrieves the average price of a device by utilizing SerpAPI and Google shopping listings. To filter the data, the function uses `dlt` state and only fetches prices from SerpAPI for devices that have not been updated in the most recent run or for those that were loaded more than 180 days in the past.
 
 The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the API token key.
 
-1. In the `.dlt`folder, there's a file called `secrets.toml`. It's where you store sensitive
-   information securely, like access tokens. Keep this file safe. Here's its format for service
-   account authentication:
+1. In the `.dlt` folder, there's a file called `secrets.toml`. It's where you store sensitive information securely, like access tokens. Keep this file safe. Here's its format for service account authentication:
 
    ```py
    [sources]
-   api_key= "Please set me up!"  #Serp Api key.
+   api_key= "Please set me up!"  # Serp API key.
    ```
 
 1. Replace the value of the `api_key`.
@@ -229,21 +215,13 @@ The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the
    - Add map function
    - Transformer function
 
+   The `dlt` library's `transformer` and `add_map` functions serve distinct purposes in data processing.
 
-   The `dlt` library's `transformer` and `add_map` functions serve distinct purposes in data
-   processing.
-
-   `Transformers` used to process a resource and are ideal for post-load data transformations in a
-   pipeline, compatible with tools like `dbt`, the `dlt SQL client`, or Pandas for intricate data
-   manipulation. To read more:
+   `Transformers` are used to process a resource and are ideal for post-load data transformations in a pipeline, compatible with tools like `dbt`, the `dlt SQL client`, or Pandas for intricate data manipulation. To read more:
    [Click here.](../../general-usage/resource#process-resources-with-dlttransformer)
 
-   Conversely, `add_map` used to customize a resource applies transformations at an item level
-   within a resource. It's useful for tasks like anonymizing individual data records. More on this
-   can be found under
-   [Customize resources](../../general-usage/resource#customize-resources) in the
-   documentation.
-
+   Conversely, `add_map` is used to customize a resource and applies transformations at an item level within a resource. It's useful for tasks like anonymizing individual data records. More on this can be found under
+   [Customize resources](../../general-usage/resource#customize-resources) in the documentation.
 
 1. Here, we create the pipeline and use the `add_map` functionality:
 
@@ -262,9 +240,7 @@ The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the
    ```
 
    :::info
-   Please note that the same outcome can be achieved by using the transformer function. To
-   do so, you need to add the transformer decorator at the top of the `fetch_average_price` function.
-   For `pipeline.run`, you can use the following code:
+   Please note that the same outcome can be achieved by using the transformer function. To do so, you need to add the transformer decorator at the top of the `fetch_average_price` function. For `pipeline.run`, you can use the following code:
 
    ```py
    # using fetch_average_price as a transformer function
@@ -274,14 +250,13 @@ The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the
    )
    ```
 
-   This will execute the `fetch_average_price` function with the tracked data and return the average
-   price.
+   This will execute the `fetch_average_price` function with the tracked data and return the average price.
    :::
 
 ### Run the pipeline
 
 1. Install necessary dependencies for the preferred
-   [destination](https://dlthub.com/docs/dlt-ecosystem/destinations/), For example, duckdb:
+   [destination](https://dlthub.com/docs/dlt-ecosystem/destinations/), for example, duckdb:
 
    ```sh
    pip install "dlt[duckdb]"
@@ -299,7 +274,5 @@ The first step is to register on [SerpAPI](https://serpapi.com/) and obtain the
    dlt pipeline <pipeline_name> show
    ```
 
-   For example, the "pipeline_name" for the above pipeline example is `data_enrichment_one`; you can use
-   any custom name instead.
-
+   For example, the "pipeline_name" for the above pipeline example is `data_enrichment_one`; you can use any custom name instead.
Original file line number	Diff line number	Diff line change
		@@ -1 +1,2 @@
		<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">book a call</a> with a dltHub Solutions Engineer
		<a href="https://calendar.app.google/EMZRS6YhM11zTGQw7">Book a call</a> with a dltHub Solutions Engineer