fixing grammar files 60-80

dlt-hub · Sep 24, 2024 · 07979f8 · 07979f8
1 parent 12a314e
commit 07979f8
Show file tree

Hide file tree

Showing 20 changed files with 350 additions and 438 deletions.
diff --git a/docs/website/docs/dlt-ecosystem/file-formats/insert-format.md b/docs/website/docs/dlt-ecosystem/file-formats/insert-format.md
@@ -5,7 +5,7 @@ keywords: [insert values, file formats]
 ---
 import SetTheFormat from './_set_the_format.mdx';
 
-# SQL INSERT File Format
+# SQL INSERT file format
 
 This file format contains an INSERT...VALUES statement to be executed on the destination during the `load` stage.
 
@@ -18,12 +18,13 @@ Additional data types are stored as follows:
 
 This file format is [compressed](../../reference/performance.md#disabling-and-enabling-file-compression) by default.
 
-## Supported Destinations
+## Supported destinations
 
 This format is used by default by: **DuckDB**, **Postgres**, **Redshift**, **Synapse**, **MSSQL**, **Motherduck**
 
-It is also supported by: **Filesystem** if you'd like to store INSERT VALUES statements for some reason
+It is also supported by: **Filesystem** if you'd like to store INSERT VALUES statements for some reason.
 
 ## How to configure
 
 <SetTheFormat file_type="insert_values"/>
+
diff --git a/docs/website/docs/dlt-ecosystem/file-formats/jsonl.md b/docs/website/docs/dlt-ecosystem/file-formats/jsonl.md
@@ -5,10 +5,9 @@ keywords: [jsonl, file formats]
 ---
 import SetTheFormat from './_set_the_format.mdx';
 
-# jsonl - JSON Delimited
+# jsonl - JSON delimited
 
-JSON Delimited is a file format that stores several JSON documents in one file. The JSON
-documents are separated by a new line.
+JSON delimited is a file format that stores several JSON documents in one file. The JSON documents are separated by a new line.
 
 Additional data types are stored as follows:
 
@@ -18,13 +17,13 @@ Additional data types are stored as follows:
 - `HexBytes` is stored as a hex encoded string;
 - `json` is serialized as a string.
 
-This file format is
-[compressed](../../reference/performance.md#disabling-and-enabling-file-compression) by default.
+This file format is [compressed](../../reference/performance.md#disabling-and-enabling-file-compression) by default.
 
-## Supported Destinations
+## Supported destinations
 
 This format is used by default by: **BigQuery**, **Snowflake**, **Filesystem**.
 
 ## How to configure
 
 <SetTheFormat file_type="jsonl"/>
+
diff --git a/docs/website/docs/dlt-ecosystem/file-formats/parquet.md b/docs/website/docs/dlt-ecosystem/file-formats/parquet.md
@@ -9,21 +9,21 @@ import SetTheFormat from './_set_the_format.mdx';
 
 [Apache Parquet](https://en.wikipedia.org/wiki/Apache_Parquet) is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. `dlt` is capable of storing data in this format when configured to do so.
 
-To use this format, you need a `pyarrow` package. You can get this package as a `dlt` extra as well:
+To use this format, you need the `pyarrow` package. You can get this package as a `dlt` extra as well:
 
 ```sh
 pip install "dlt[parquet]"
 ```
 
-## Supported Destinations
+## Supported destinations
 
 Supported by: **BigQuery**, **DuckDB**, **Snowflake**, **Filesystem**, **Athena**, **Databricks**, **Synapse**
 
 ## How to configure
 
 <SetTheFormat file_type="parquet"/>
 
-## Destination AutoConfig
+## Destination autoconfig
 `dlt` uses [destination capabilities](../../walkthroughs/create-new-destination.md#3-set-the-destination-capabilities) to configure the parquet writer:
 * It uses decimal and wei precision to pick the right **decimal type** and sets precision and scale.
 * It uses timestamp precision to pick the right **timestamp type** resolution (seconds, micro, or nano).
@@ -32,17 +32,17 @@ Supported by: **BigQuery**, **DuckDB**, **Snowflake**, **Filesystem**, **Athena*
 
 Under the hood, `dlt` uses the [pyarrow parquet writer](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html) to create the files. The following options can be used to change the behavior of the writer:
 
-- `flavor`: Sanitize schema or set other compatibility options to work with various target systems. Defaults to None which is **pyarrow** default.
+- `flavor`: Sanitize schema or set other compatibility options to work with various target systems. Defaults to None, which is the **pyarrow** default.
 - `version`: Determine which Parquet logical types are available for use, whether the reduced set from the Parquet 1.x.x format or the expanded logical types added in later format versions. Defaults to "2.6".
-- `data_page_size`: Set a target threshold for the approximate encoded size of data pages within a column chunk (in bytes). Defaults to None which is **pyarrow** default.
+- `data_page_size`: Set a target threshold for the approximate encoded size of data pages within a column chunk (in bytes). Defaults to None, which is the **pyarrow** default.
 - `row_group_size`: Set the number of rows in a row group. [See here](#row-group-size) how this can optimize parallel processing of queries on your destination over the default setting of `pyarrow`.
-- `timestamp_timezone`: A string specifying timezone, default is UTC.
-- `coerce_timestamps`: resolution to which coerce timestamps, choose from **s**, **ms**, **us**, **ns**
-- `allow_truncated_timestamps` - will raise if precision is lost on truncated timestamp.
+- `timestamp_timezone`: A string specifying the timezone, default is UTC.
+- `coerce_timestamps`: resolution to which to coerce timestamps, choose from **s**, **ms**, **us**, **ns**
+- `allow_truncated_timestamps` - will raise if precision is lost on truncated timestamps.
 
 :::tip
-Default parquet version used by `dlt` is 2.4. It coerces timestamps to microseconds and truncates nanoseconds silently. Such setting
-provides best interoperability with database systems, including loading panda frames which have nanosecond resolution by default
+The default parquet version used by `dlt` is 2.4. It coerces timestamps to microseconds and truncates nanoseconds silently. Such a setting
+provides the best interoperability with database systems, including loading panda frames which have nanosecond resolution by default.
 :::
 
 Read the [pyarrow parquet docs](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html) to learn more about these settings.
@@ -68,28 +68,27 @@ NORMALIZE__DATA_WRITER__TIMESTAMP_TIMEZONE
 ```
 
 ### Timestamps and timezones
-`dlt` adds timezone (UTC adjustment) to all timestamps regardless of a precision (from seconds to nanoseconds). `dlt` will also create TZ aware timestamp columns in
-the destinations. [duckdb is an exception here](../destinations/duckdb.md#supported-file-formats)
+`dlt` adds timezone (UTC adjustment) to all timestamps regardless of the precision (from seconds to nanoseconds). `dlt` will also create TZ-aware timestamp columns in
+the destinations. [DuckDB is an exception here](../destinations/duckdb.md#supported-file-formats).
 
-### Disable timezones / utc adjustment flags
+### Disable timezones / UTC adjustment flags
 You can generate parquet files without timezone adjustment information in two ways:
-1. Set the **flavor** to spark. All timestamps will be generated via deprecated `int96` physical data type, without the logical one
-2. Set the **timestamp_timezone** to empty string (ie. `DATA_WRITER__TIMESTAMP_TIMEZONE=""`) to generate logical type without UTC adjustment.
+1. Set the **flavor** to spark. All timestamps will be generated via the deprecated `int96` physical data type, without the logical one.
+2. Set the **timestamp_timezone** to an empty string (i.e., `DATA_WRITER__TIMESTAMP_TIMEZONE=""`) to generate a logical type without UTC adjustment.
 
-To our best knowledge, arrow will convert your timezone aware DateTime(s) to UTC and store them in parquet without timezone information.
+To our best knowledge, Arrow will convert your timezone-aware DateTime(s) to UTC and store them in parquet without timezone information.
 
 
 ### Row group size
-The `pyarrow` parquet writer writes each item, i.e. table or record batch, in a separate row group.
-This may lead to many small row groups which may not be optimal for certain query engines. For example, `duckdb` parallelizes on a row group.
-`dlt` allows controlling the size of the row group by
-[buffering and concatenating tables](../../reference/performance.md#controlling-in-memory-buffers) and batches before they are written. The concatenation is done as a zero-copy to save memory.
-You can control the size of the row group by setting the maximum number of rows kept in the buffer.
+
+The `pyarrow` parquet writer writes each item, i.e., table or record batch, in a separate row group. This may lead to many small row groups, which may not be optimal for certain query engines. For example, `duckdb` parallelizes on a row group. `dlt` allows controlling the size of the row group by [buffering and concatenating tables](../../reference/performance.md#controlling-in-memory-buffers) and batches before they are written. The concatenation is done as a zero-copy to save memory. You can control the size of the row group by setting the maximum number of rows kept in the buffer.
+
 ```toml
 [extract.data_writer]
 buffer_max_items=10e6
 ```
-Mind that `dlt` holds the tables in memory. Thus, 1,000,000 rows in the example above may consume a significant amount of RAM.
 
-`row_group_size` configuration setting has limited utility with `pyarrow` writer. It may be useful when you write single very large pyarrow tables
-or when your in memory buffer is really large.
+Keep in mind that `dlt` holds the tables in memory. Thus, 1,000,000 rows in the example above may consume a significant amount of RAM.
+
+The `row_group_size` configuration setting has limited utility with the `pyarrow` writer. It may be useful when you write single very large pyarrow tables or when your in-memory buffer is really large.
+
diff --git a/docs/website/docs/dlt-ecosystem/verified-sources/_source-info-header.md b/docs/website/docs/dlt-ecosystem/verified-sources/_source-info-header.md
@@ -1,6 +1,7 @@
 import Admonition from "@theme/Admonition";
 import Link from '../../_book-onboarding-call.md';
 
-<Admonition title="Need help deploying these sources, or figuring out how to run them in your data stack?">
+<Admonition title="Need help deploying these sources or figuring out how to run them in your data stack?">
 <a href="https://dlthub.com/community">Join our Slack community</a> or <Link/>.
-</Admonition>
+</Admonition>
+