run grammar checker

dlt-hub · Nov 25, 2024 · 87f50cc · 87f50cc
1 parent 460020c
commit 87f50cc
Show file tree

Hide file tree

Showing 6 changed files with 78 additions and 78 deletions.
diff --git a/docs/website/docs/general-usage/dataset-access/data-quality-dashboard.md b/docs/website/docs/general-usage/dataset-access/data-quality-dashboard.md
@@ -6,8 +6,7 @@ keywords: [destination, schema, data, monitoring, testing, quality]
 
 # Data quality dashboards
 
-After deploying a `dlt` pipeline, you might ask yourself: How can we know if the data is and remains
-high quality?
+After deploying a `dlt` pipeline, you might ask yourself: How can we know if the data is and remains high quality?
 
 There are two ways to catch errors:
 
@@ -16,9 +15,7 @@ There are two ways to catch errors:
 
 ## Tests
 
-The first time you load data from a pipeline you have built, you will likely want to test it. Plot
-the data on time series line charts and look for any interruptions or spikes, which will highlight
-any gaps or loading issues.
+The first time you load data from a pipeline you have built, you will likely want to test it. Plot the data on time series line charts and look for any interruptions or spikes, which will highlight any gaps or loading issues.
 
 ### Data usage as monitoring
 

diff --git a/docs/website/docs/general-usage/dataset-access/dataset.md b/docs/website/docs/general-usage/dataset-access/dataset.md
@@ -4,13 +4,13 @@ description: Conveniently accessing the data loaded to any destination in python
 keywords: [destination, schema, data, access, retrieval]
 ---
 
-# Accessing Loaded Data in Python
+# Accessing loaded data in Python
 
 This guide explains how to access and manipulate data that has been loaded into your destination using the `dlt` Python library. After running your pipelines and loading data, you can use the `ReadableDataset` and `ReadableRelation` classes to interact with your data programmatically.
 
 **Note:** The `ReadableDataset` and `ReadableRelation` objects are **lazy-loading**. They will only query and retrieve data when you perform an action that requires it, such as fetching data into a DataFrame or iterating over the data. This means that simply creating these objects does not load data into memory, making your code more efficient.
 
-## Quick Start Example
+## Quick start example
 
 Here's a full example of how to retrieve data from a pipeline and load it into a Pandas DataFrame or a PyArrow Table.
 
@@ -31,7 +31,7 @@ df = items_relation.df()
 arrow_table = items_relation.arrow()
 ```
 
-## Getting Started
+## Getting started
 
 Assuming you have a `Pipeline` object (let's call it `pipeline`), you can obtain a `ReadableDataset` and access your tables as `ReadableRelation` objects.
 
@@ -42,7 +42,7 @@ Assuming you have a `Pipeline` object (let's call it `pipeline`), you can obtain
 dataset = pipeline._dataset()
 ```
 
-### Access Tables as `ReadableRelation`
+### Access tables as `ReadableRelation`
 
 You can access tables in your dataset using either attribute access or item access.
 
@@ -54,11 +54,11 @@ items_relation = dataset.items
 items_relation = dataset["items"]
 ```
 
-## Reading Data
+## Reading data
 
 Once you have a `ReadableRelation`, you can read data in various formats and sizes.
 
-### Fetch the Entire Table
+### Fetch the entire table
 
 :::caution
 Loading full tables into memory without limiting or iterating over them can consume a large amount of memory and may cause your program to crash if the table is too large. It's recommended to use chunked iteration or apply limits when dealing with large datasets. 
@@ -76,17 +76,17 @@ df = items_relation.df()
 arrow_table = items_relation.arrow()
 ```
 
-#### As a List of Python Tuples
+#### As a list of Python tuples
 
 ```py
 items_list = items_relation.fetchall()
 ```
 
-## Lazy Loading Behavior
+## Lazy loading behavior
 
 The `ReadableDataset` and `ReadableRelation` objects are **lazy-loading**. This means that they do not immediately fetch data when you create them. Data is only retrieved when you perform an action that requires it, such as calling `.df()`, `.arrow()`, or iterating over the data. This approach optimizes performance and reduces unnecessary data loading.
 
-## Iterating Over Data in Chunks
+## Iterating over data in chunks
 
 To handle large datasets efficiently, you can process data in smaller chunks.
 
@@ -106,48 +106,48 @@ for arrow_chunk in items_relation.iter_arrow(chunk_size=500):
     pass
 ```
 
-### Iterate as Lists of Tuples
+### Iterate as lists of tuples
 
 ```py
 for items_chunk in items_relation.iter_fetch(chunk_size=500):
     # Process each chunk of tuples
     pass
 ```
 
-The methods availableon the ReadableRelation correspond to the methods available on the cursor returned by the sql client. Please refer to the [sql client](./sql-client.md#supported-methods-on-the-cursor) guide for more information.
+The methods available on the ReadableRelation correspond to the methods available on the cursor returned by the SQL client. Please refer to the [SQL client](./sql-client.md#supported-methods-on-the-cursor) guide for more information.
 
-## Modifying Queries
+## Modifying queries
 
 You can refine your data retrieval by limiting the number of records, selecting specific columns, or chaining these operations.
 
-### Limit the Number of Records
+### Limit the number of records
 
 ```py
 # Get the first 50 items as a PyArrow table
 arrow_table = items_relation.limit(50).arrow()
 ```
 
-#### Using `head()` to Get the First 5 Records
+#### Using `head()` to get the first 5 records
 
 ```py
 df = items_relation.head().df()
 ```
 
-### Select Specific Columns
+### Select specific columns
 
 ```py
 # Select only 'col1' and 'col2' columns
 items_list = items_relation.select("col1", "col2").fetchall()
 
-# alternate notation with brackets
+# Alternate notation with brackets
 items_list = items_relation[["col1", "col2"]].fetchall()
 
-# only get one column
+# Only get one column
 items_list = items_relation["col1"].fetchall()
 
 ```
 
-### Chain Operations
+### Chain operations
 
 You can combine `select`, `limit`, and other methods.
 
@@ -156,47 +156,47 @@ You can combine `select`, `limit`, and other methods.
 arrow_table = items_relation.select("col1", "col2").limit(50).arrow()
 ```
 
-## Supported Destinations
+## Supported destinations
 
-All SQL and filesystem destinations supported by `dlt` can utilize this data access interface. For filesystem destinations, `dlt` [uses **DuckDB** under the hood](./sql-client.md#the-filesystem-sql-client) to create views from Parquet or JSONL files dynamically. This allows you to query data stored in files using the same interface as you would with SQL databases. If you plan on accessing data in buckets or the filesystem a lot this way, it is adviced to load data as parquet instead of jsonl, as **DuckDB** is able to only load the parts of the data actually needed for the query to work.
+All SQL and filesystem destinations supported by `dlt` can utilize this data access interface. For filesystem destinations, `dlt` [uses **DuckDB** under the hood](./sql-client.md#the-filesystem-sql-client) to create views from Parquet or JSONL files dynamically. This allows you to query data stored in files using the same interface as you would with SQL databases. If you plan on accessing data in buckets or the filesystem a lot this way, it is advised to load data as parquet instead of jsonl, as **DuckDB** is able to only load the parts of the data actually needed for the query to work.
 
 ## Examples
 
-### Fetch One Record as a Tuple
+### Fetch one record as a tuple
 
 ```py
 record = items_relation.fetchone()
 ```
 
-### Fetch Many Records as Tuples
+### Fetch many records as tuples
 
 ```py
 records = items_relation.fetchmany(chunk_size=10)
 ```
 
-### Iterate Over Data with Limit and Column Selection
+### Iterate over data with limit and column selection
 
-**Note:** When iterating over filesystem tables, the underlying DuckDB may give you a different chunksize depending on the size of the parquet files the table is based on.
+**Note:** When iterating over filesystem tables, the underlying DuckDB may give you a different chunk size depending on the size of the parquet files the table is based on.
 
 ```py
 
-# dataframes
+# Dataframes
 for df_chunk in items_relation.select("col1", "col2").limit(100).iter_df(chunk_size=20):
     ...
 
-# arrow tables
+# Arrow tables
 for arrow_table in items_relation.select("col1", "col2").limit(100).iter_arrow(chunk_size=20):
     ...
 
-# python tuples
+# Python tuples
 for records in items_relation.select("col1", "col2").limit(100).iter_fetch(chunk_size=20):
     # Process each modified DataFrame chunk
     ...
 ```
 
-## Advanced Usage
+## Advanced usage
 
-### Using custom sql queries to create `ReadableRelations`
+### Using custom SQL queries to create `ReadableRelations`
 
 You can use custom SQL queries directly on the dataset to create a `ReadableRelation`:
 
@@ -211,27 +211,28 @@ arrow_table = custom_relation.arrow()
 
 ### Loading a `ReadableRelation` into a pipeline table
 
-Since the iter_arrow and iter_df methods are generators that iterate over the full ReadableRelation in chunks, you can load use them as a resource for another (or even the same) dlt pipeline:
+Since the iter_arrow and iter_df methods are generators that iterate over the full ReadableRelation in chunks, you can use them as a resource for another (or even the same) dlt pipeline:
 
 ```py
-# create a readable relation with a limit of 1m rows
+# Create a readable relation with a limit of 1m rows
 limited_items_relation = dataset.items.limit(1_000_000)
 
-# create a new pipeline
+# Create a new pipeline
 other_pipeline = ...
 
-# we can now load these 1m rows into this pipeline in 10k chunks
+# We can now load these 1m rows into this pipeline in 10k chunks
 other_pipeline.run(limited_items_relation.iter_arrow(chunk_size=10_000), table_name="limited_items")
 ```
 
 ### Using `ibis` to query the data
 
 Visit the [Native Ibis integration](./ibis-backend.md) guide to learn more.
 
-## Important Considerations
+## Important considerations
 
-- **Memory Usage:** Loading full tables into memory without iterating or limiting can consume significant memory, potentially leading to crashes if the dataset is large. Always consider using limits or chunked iteration.
+- **Memory usage:** Loading full tables into memory without iterating or limiting can consume significant memory, potentially leading to crashes if the dataset is large. Always consider using limits or chunked iteration.
 
-- **Lazy Evaluation:** `ReadableDataset` and `ReadableRelation` objects delay data retrieval until necessary. This design improves performance and resource utilization.
+- **Lazy evaluation:** `ReadableDataset` and `ReadableRelation` objects delay data retrieval until necessary. This design improves performance and resource utilization.
+
+- **Custom SQL queries:** When executing custom SQL queries, remember that additional methods like `limit()` or `select()` won't modify the query. Include all necessary clauses directly in your SQL statement.
 
-- **Custom SQL Queries:** When executing custom SQL queries, remember that additional methods like `limit()` or `select()` won't modify the query. Include all necessary clauses directly in your SQL statement.
diff --git a/docs/website/docs/general-usage/dataset-access/ibis-backend.md b/docs/website/docs/general-usage/dataset-access/ibis-backend.md
@@ -8,10 +8,10 @@ keywords: [data, dataset, ibis]
 
 Ibis is a powerful portable Python dataframe library. Learn more about what it is and how to use it in the [official documentation](https://ibis-project.org/). 
 
-`dlt` provides an easy way to handoveor your loaded dataset to an Ibis backend connection.
+`dlt` provides an easy way to hand over your loaded dataset to an Ibis backend connection.
 
 :::tip
-Not all destinations supported by `dlt` have an equivalent Ibis backend. Natively supported destinations include DuckDB (including Motherduck), Postgres, Redshift, Snowflake, Clickhouse, MSSQL (including Synapse) and BigQuery. The filesystem destination is supported via the [Filesystem SQL client](./sql-client#the-filesystem-sql-client), please install the duckdb backend for ibis to use it. Mutating data with ibis on the filesystem will not result in any actual changes to the persisted files.
+Not all destinations supported by `dlt` have an equivalent Ibis backend. Natively supported destinations include DuckDB (including Motherduck), Postgres, Redshift, Snowflake, Clickhouse, MSSQL (including Synapse), and BigQuery. The filesystem destination is supported via the [Filesystem SQL client](./sql-client#the-filesystem-sql-client); please install the duckdb backend for ibis to use it. Mutating data with ibis on the filesystem will not result in any actual changes to the persisted files.
 :::
 
 ## Prerequisites
@@ -24,15 +24,15 @@ pip install ibis-framework[duckdb]
 
 ## Get an ibis connection from your dataset
 
-Dlt datasets have a helper method to return an ibis connection to the destination they live on. The returned object is a native ibis connection to the destination which you can use to read and even transform data. Please consult the [ibis documentation](https://ibis-project.org/docs/backends/) to learn more about what you can do with ibis.
+dlt datasets have a helper method to return an ibis connection to the destination they live on. The returned object is a native ibis connection to the destination, which you can use to read and even transform data. Please consult the [ibis documentation](https://ibis-project.org/docs/backends/) to learn more about what you can do with ibis.
 
 ```py
 
 # get the dataset from the pipeline
 dataset = pipeline._dataset()
 dataset_name = pipeline.dataset_name
 
-# get the native ibis connection form the dataset
+# get the native ibis connection from the dataset
 ibis_connection = dataset.ibis()
 
 # list all tables in the dataset
@@ -46,4 +46,5 @@ table = ibis_connection.table("items", database=dataset_name)
 print(table.limit(10).execute())
 
 # Visit the ibis docs to learn more about the available methods
-```
+```
+
diff --git a/docs/website/docs/general-usage/dataset-access/index.md b/docs/website/docs/general-usage/dataset-access/index.md
@@ -10,9 +10,9 @@ import DocCardList from '@theme/DocCardList';
 After one or more successful runs of your pipeline, you can inspect or access the loaded data in various ways:
 
 * We have a simple [`streamlit` app](./streamlit.md) that you can use to view your data locally in your webapp.
-* We have a [python interface](./dataset.md) that allows you to access your data in python as python tuples, `arrow` tables or `pandas` dataframes with a simple dataset object or an sql interface. You can even run sql commands on the filesystem destination via `DuckDB` or forward data from any table into another pipeline.
-* We have an [`ibis` interface](./ibis-backend.md) that allows you to use hand over your loaded data to the powerful [ibis-framework](https://ibis-project.org/) library.
-* Lastly we have some advice for [monitoring and ensuring the quality of your data](./data-quality-dashboard.md).
+* We have a [Python interface](./dataset.md) that allows you to access your data in Python as Python tuples, `arrow` tables, or `pandas` dataframes with a simple dataset object or an SQL interface. You can even run SQL commands on the filesystem destination via `DuckDB` or forward data from any table into another pipeline.
+* We have an [`ibis` interface](./ibis-backend.md) that allows you to hand over your loaded data to the powerful [ibis-framework](https://ibis-project.org/) library.
+* Lastly, we have some advice for [monitoring and ensuring the quality of your data](./data-quality-dashboard.md).
 
 # Learn more
 <DocCardList />