Docs: fix capitalization of some terms, fix typos (#1988)

dlt-hub · Oct 24, 2024 · 00f631d · 00f631d
1 parent 774ad5c
commit 00f631d
Show file tree

Hide file tree

Showing 21 changed files with 80 additions and 71 deletions.
diff --git a/docs/website/docs/dlt-ecosystem/destinations/athena.md b/docs/website/docs/dlt-ecosystem/destinations/athena.md
@@ -100,7 +100,9 @@ Athena tables store timestamps with millisecond precision, and with that precisi
 
 Athena does not support JSON fields, so JSON is stored as a string.
 
-> ❗**Athena does not support TIME columns in parquet files**. `dlt` will fail such jobs permanently. Convert `datetime.time` objects to `str` or `datetime.datetime` to load them.
+:::caution
+**Athena does not support TIME columns in parquet files**. `dlt` will fail such jobs permanently. Convert `datetime.time` objects to `str` or `datetime.datetime` to load them.
+:::
 
 ### Table and column identifiers
 
@@ -137,9 +139,10 @@ For every table created as an Iceberg table, the Athena destination will create
 
 The `merge` write disposition is supported for Athena when using Iceberg tables.
 
-> Note that:
-> 1. There is a risk of tables ending up in an inconsistent state in case a pipeline run fails mid-flight because Athena doesn't support transactions, and `dlt` uses multiple DELETE/UPDATE/INSERT statements to implement `merge`.
-> 2. `dlt` creates additional helper tables called `insert_<table name>` and `delete_<table name>` in the staging schema to work around Athena's lack of temporary tables.
+:::note
+1. There is a risk of tables ending up in an inconsistent state in case a pipeline run fails mid-flight because Athena doesn't support transactions, and `dlt` uses multiple DELETE/UPDATE/INSERT statements to implement `merge`.
+2. `dlt` creates additional helper tables called `insert_<table name>` and `delete_<table name>` in the staging schema to work around Athena's lack of temporary tables.
+:::
 
 ### dbt support
 
@@ -156,8 +159,7 @@ aws_data_catalog="awsdatacatalog"
 
 ## Supported file formats
 
-You can choose the following file formats:
-* [parquet](../file-formats/parquet.md) is used by default
+* [Parquet](../file-formats/parquet.md) is used by default.
 
 ## Athena adapter
 

diff --git a/docs/website/docs/dlt-ecosystem/destinations/bigquery.md b/docs/website/docs/dlt-ecosystem/destinations/bigquery.md
@@ -146,8 +146,8 @@ this moment (they are stored as JSON), may be created. You can select certain re
 [destination.bigquery]
 autodetect_schema=true
 ```
-We recommend yielding [arrow tables](../verified-sources/arrow-pandas.md) from your resources and using the `parquet` file format to load the data. In that case, the schemas generated by `dlt` and BigQuery
-will be identical. BigQuery will also preserve the column order from the generated parquet files. You can convert `json` data into arrow tables with [pyarrow or duckdb](../verified-sources/arrow-pandas.md#loading-json-documents).
+We recommend yielding [Arrow tables](../verified-sources/arrow-pandas.md) from your resources and using the Parquet file format to load the data. In that case, the schemas generated by `dlt` and BigQuery
+will be identical. BigQuery will also preserve the column order from the generated parquet files. You can convert JSON data into Arrow tables with [pyarrow or duckdb](../verified-sources/arrow-pandas.md#loading-json-documents).
 
 ```py
 import pyarrow.json as paj
@@ -187,25 +187,25 @@ pipeline.run(
 In the example below, we represent JSON data as tables up to nesting level 1. Above this nesting level, we let BigQuery create nested fields.
 
 :::caution
-If you yield data as Python objects (dicts) and load this data as `parquet`, the nested fields will be converted into strings. This is one of the consequences of
+If you yield data as Python objects (dicts) and load this data as Parquet, the nested fields will be converted into strings. This is one of the consequences of
 `dlt` not being able to infer nested fields.
 :::
 
 ## Supported file formats
 
 You can configure the following file formats to load data to BigQuery:
 
-* [jsonl](../file-formats/jsonl.md) is used by default.
-* [parquet](../file-formats/parquet.md) is supported.
+* [JSONL](../file-formats/jsonl.md) is used by default.
+* [Parquet](../file-formats/parquet.md) is supported.
 
 When staging is enabled:
 
-* [jsonl](../file-formats/jsonl.md) is used by default.
-* [parquet](../file-formats/parquet.md) is supported.
+* [JSONL](../file-formats/jsonl.md) is used by default.
+* [Parquet](../file-formats/parquet.md) is supported.
 
 :::caution
 **BigQuery cannot load JSON columns from Parquet files**. `dlt` will fail such jobs permanently. Instead:
-* Switch to `jsonl` to load and parse JSON properly.
+* Switch to JSONL to load and parse JSON properly.
 * Use schema [autodetect and nested fields](#use-bigquery-schema-autodetect-for-nested-fields)
 :::
 
@@ -344,7 +344,8 @@ Some things to note with the adapter's behavior:
 - You can cluster on as many columns as you would like.
 - Sequential adapter calls on the same resource accumulate parameters, akin to an OR operation, for a unified execution.
 
-> ❗ At the time of writing, table level options aren't supported for `ALTER` operations.
+:::caution
+At the time of writing, table level options aren't supported for `ALTER` operations.
 
 Note that `bigquery_adapter` updates the resource *in place*, but returns the resource for convenience, i.e., both the following are valid:
 
@@ -354,6 +355,7 @@ my_resource = bigquery_adapter(my_resource, partition="partition_column_name")
 ```
 
 Refer to the [full API specification](../../api_reference/destinations/impl/bigquery/bigquery_adapter) for more details.
+:::
 
 <!--@@@DLT_TUBA bigquery-->
 
diff --git a/docs/website/docs/dlt-ecosystem/destinations/clickhouse.md b/docs/website/docs/dlt-ecosystem/destinations/clickhouse.md
@@ -24,7 +24,7 @@ Let's start by initializing a new `dlt` project as follows:
 dlt init chess clickhouse
 ```
 
-> 💡 This command will initialize your pipeline with chess as the source and ClickHouse as the destination.
+`dlt init` command will initialize your pipeline with chess as the source and ClickHouse as the destination.
 
 The above command generates several files and directories, including `.dlt/secrets.toml` and a requirements file for ClickHouse. You can install the necessary dependencies specified in the requirements file by executing it as follows:
 
@@ -118,29 +118,28 @@ Data is loaded into ClickHouse using the most efficient method depending on the
 
 ## Datasets
 
-`Clickhouse` does not support multiple datasets in one database; dlt relies on datasets to exist for multiple reasons.
-To make `clickhouse` work with `dlt`, tables generated by `dlt` in your `clickhouse` database will have their names prefixed with the dataset name, separated by
+ClickHouse does not support multiple datasets in one database; dlt relies on datasets to exist for multiple reasons.
+To make ClickHouse work with `dlt`, tables generated by `dlt` in your ClickHouse database will have their names prefixed with the dataset name, separated by
 the configurable `dataset_table_separator`.
 Additionally, a special sentinel table that doesn't contain any data will be created, so dlt knows which virtual datasets already exist in a
 clickhouse
 destination.
 
 ## Supported file formats
 
-- [jsonl](../file-formats/jsonl.md) is the preferred format for both direct loading and staging.
-- [parquet](../file-formats/parquet.md) is supported for both direct loading and staging.
+- [JSONL](../file-formats/jsonl.md) is the preferred format for both direct loading and staging.
+- [Parquet](../file-formats/parquet.md) is supported for both direct loading and staging.
 
 The `clickhouse` destination has a few specific deviations from the default SQL destinations:
 
-1. `Clickhouse` has an experimental `object` datatype, but we've found it to be a bit unpredictable, so the dlt clickhouse destination will load the `json` datatype to a `text` column.
+1. ClickHouse has an experimental `object` datatype, but we've found it to be a bit unpredictable, so the dlt `clickhouse` destination will load the `json` datatype to a `text` column.
    If you need
    this feature, get in touch with our Slack community, and we will consider adding it.
-2. `Clickhouse` does not support the `time` datatype. Time will be loaded to a `text` column.
-3. `Clickhouse` does not support the `binary` datatype. Binary will be loaded to a `text` column. When loading from `jsonl`, this will be a base64 string; when loading from parquet, this will be
+2. ClickHouse does not support the `time` datatype. Time will be loaded to a `text` column.
+3. ClickHouse does not support the `binary` datatype. Binary will be loaded to a `text` column. When loading from JSONL, this will be a base64 string; when loading from parquet, this will be
    the `binary` object converted to `text`.
-4. `Clickhouse` accepts adding columns to a populated table that aren’t null.
-5. `Clickhouse` can produce rounding errors under certain conditions when using the float/double datatype. Make sure to use decimal if you can’t afford to have rounding errors. Loading the value
-   12.7001 to a double column with the loader file format jsonl set will predictably produce a rounding error, for example.
+4. ClickHouse accepts adding columns to a populated table that aren’t null.
+5. ClickHouse can produce rounding errors under certain conditions when using the float/double datatype. Make sure to use decimal if you can’t afford to have rounding errors. Loading the value 12.7001 to a double column with the loader file format jsonl set will predictably produce a rounding error, for example.
 
 ## Supported column hints
 

diff --git a/docs/website/docs/dlt-ecosystem/destinations/databricks.md b/docs/website/docs/dlt-ecosystem/destinations/databricks.md
@@ -143,7 +143,7 @@ The JSONL format has some limitations when used with Databricks:
 
 1. Compression must be disabled to load jsonl files in Databricks. Set `data_writer.disable_compression` to `true` in the dlt config when using this format.
 2. The following data types are not supported when using the JSONL format with `databricks`: `decimal`, `json`, `date`, `binary`. Use `parquet` if your data contains these types.
-3. The `bigint` data type with precision is not supported with the `jsonl` format.
+3. The `bigint` data type with precision is not supported with the JSONL format.
 
 ## Staging support
 

diff --git a/docs/website/docs/dlt-ecosystem/destinations/dremio.md b/docs/website/docs/dlt-ecosystem/destinations/dremio.md
@@ -74,13 +74,17 @@ profile_name="dlt-ci-user"
 - `replace`
 - `merge`
 
-> The `merge` write disposition uses the default DELETE/UPDATE/INSERT strategy to merge data into the destination. Be aware that Dremio does not support transactions, so a partial pipeline failure can result in the destination table being in an inconsistent state. The `merge` write disposition will eventually be implemented using [MERGE INTO](https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/apache-iceberg-merge/) to resolve this issue.
+:::note
+The `merge` write disposition uses the default DELETE/UPDATE/INSERT strategy to merge data into the destination. Be aware that Dremio does not support transactions, so a partial pipeline failure can result in the destination table being in an inconsistent state. The `merge` write disposition will eventually be implemented using [MERGE INTO](https://docs.dremio.com/current/reference/sql/commands/apache-iceberg-tables/apache-iceberg-merge/) to resolve this issue.
+:::
 
 ## Data loading
 
 Data loading happens by copying staged parquet files from an object storage bucket to the destination table in Dremio using [COPY INTO](https://docs.dremio.com/cloud/reference/sql/commands/copy-into-table/) statements. The destination table format is specified by the storage format for the data source in Dremio. Typically, this will be Apache Iceberg.
 
-> ❗ **Dremio cannot load `fixed_len_byte_array` columns from `parquet` files**.
+:::caution
+Dremio cannot load `fixed_len_byte_array` columns from Parquet files.
+:::
 
 ## Dataset creation
 

diff --git a/docs/website/docs/dlt-ecosystem/destinations/duckdb.md b/docs/website/docs/dlt-ecosystem/destinations/duckdb.md
@@ -33,7 +33,7 @@ python3 chess_pipeline.py
 All write dispositions are supported.
 
 ## Data loading
-`dlt` will load data using large INSERT VALUES statements by default. Loading is multithreaded (20 threads by default). If you are okay with installing `pyarrow`, we suggest switching to `parquet` as the file format. Loading is faster (and also multithreaded).
+`dlt` will load data using large INSERT VALUES statements by default. Loading is multithreaded (20 threads by default). If you are okay with installing `pyarrow`, we suggest switching to Parquet as the file format. Loading is faster (and also multithreaded).
 
 ### Data types
 `duckdb` supports various [timestamp types](https://duckdb.org/docs/sql/data_types/timestamp.html). These can be configured using the column flags `timezone` and `precision` in the `dlt.resource` decorator or the `pipeline.run` method.
@@ -95,11 +95,11 @@ dlt.config["schema.naming"] = "duck_case"
 ## Supported file formats
 You can configure the following file formats to load data into duckdb:
 * [insert-values](../file-formats/insert-format.md) is used by default.
-* [parquet](../file-formats/parquet.md) is supported.
+* [Parquet](../file-formats/parquet.md) is supported.
 :::note
-`duckdb` cannot COPY many parquet files to a single table from multiple threads. In this situation, `dlt` serializes the loads. Still, that may be faster than INSERT.
+`duckdb` cannot COPY many Parquet files to a single table from multiple threads. In this situation, dlt serializes the loads. Still, that may be faster than INSERT.
 :::
-* [jsonl](../file-formats/jsonl.md)
+* [JSONL](../file-formats/jsonl.md)
 
 :::tip
 `duckdb` has [timestamp types](https://duckdb.org/docs/sql/data_types/timestamp.html) with resolutions from milliseconds to nanoseconds. However,

diff --git a/docs/website/docs/dlt-ecosystem/destinations/filesystem.md b/docs/website/docs/dlt-ecosystem/destinations/filesystem.md
@@ -612,9 +612,9 @@ Adopting this layout offers several advantages:
 ## Supported file formats
 
 You can choose the following file formats:
-* [jsonl](../file-formats/jsonl.md) is used by default
-* [parquet](../file-formats/parquet.md) is supported
-* [csv](../file-formats/csv.md) is supported
+* [JSONL](../file-formats/jsonl.md) is used by default
+* [Parquet](../file-formats/parquet.md) is supported
+* [CSV](../file-formats/csv.md) is supported
 
 ## Supported table formats
 
@@ -643,7 +643,7 @@ def my_delta_resource():
     ...
 ```
 
-> `dlt` always uses `parquet` as `loader_file_format` when using the `delta` table format. Any setting of `loader_file_format` is disregarded.
+> `dlt` always uses Parquet as `loader_file_format` when using the `delta` table format. Any setting of `loader_file_format` is disregarded.
 
 #### Delta table partitioning
 A Delta table can be partitioned ([Hive-style partitioning](https://delta.io/blog/pros-cons-hive-style-partionining/)) by specifying one or more `partition` column hints. This example partitions the Delta table by the `foo` column:
@@ -709,7 +709,7 @@ When a load generates a new state, for example when using incremental loads, a n
 When running your pipeline, you might encounter an error like `[Errno 36] File name too long Error`. This error occurs because the generated file name exceeds the maximum allowed length on your filesystem.
 
 To prevent the file name length error, set the `max_identifier_length` parameter for your destination. This truncates all identifiers (including filenames) to a specified maximum length.
-For example: 
+For example:
 
 ```py
 from dlt.destinations import duckdb

diff --git a/docs/website/docs/dlt-ecosystem/destinations/redshift.md b/docs/website/docs/dlt-ecosystem/destinations/redshift.md
@@ -75,16 +75,18 @@ All [write dispositions](../../general-usage/incremental-loading#choosing-a-writ
 [SQL Insert](../file-formats/insert-format) is used by default.
 
 When staging is enabled:
-* [jsonl](../file-formats/jsonl.md) is used by default.
-* [parquet](../file-formats/parquet.md) is supported.
+* [JSONL](../file-formats/jsonl.md) is used by default.
+* [Parquet](../file-formats/parquet.md) is supported.
 
-> ❗ **Redshift cannot load `VARBYTE` columns from `json` files**. `dlt` will fail such jobs permanently. Switch to `parquet` to load binaries.
+:::caution
+- **Redshift cannot load `VARBYTE` columns from JSON files**. `dlt` will fail such jobs permanently. Switch to Parquet to load binaries.
 
-> ❗ **Redshift cannot load `TIME` columns from `json` or `parquet` files**. `dlt` will fail such jobs permanently. Switch to direct `insert_values` to load time columns.
+- **Redshift cannot load `TIME` columns from JSON or Parquet files**. `dlt` will fail such jobs permanently. Switch to direct `insert_values` to load time columns.
 
-> ❗ **Redshift cannot detect compression type from `json` files**. `dlt` assumes that `jsonl` files are gzip compressed, which is the default.
+- **Redshift cannot detect compression type from JSON files**. `dlt` assumes that JSONL files are gzip compressed, which is the default.
 
-> ❗ **Redshift loads `json` types as strings into SUPER with `parquet`**. Use `jsonl` format to store JSON in SUPER natively or transform your SUPER columns with `PARSE_JSON`.
+- **Redshift loads JSON types as strings into SUPER with Parquet**. Use JSONL format to store JSON in SUPER natively or transform your SUPER columns with `PARSE_JSON`.
+:::
 
 ## Supported column hints
 
@@ -147,7 +149,7 @@ pipeline = dlt.pipeline(
 
 ## Supported loader file formats
 
-Supported loader file formats for Redshift are `sql` and `insert_values` (default). When using a staging location, Redshift supports `parquet` and `jsonl`.
+Supported loader file formats for Redshift are `sql` and `insert_values` (default). When using a staging location, Redshift supports Parquet and JSONL.
 
 <!--@@@DLT_TUBA redshift-->
 
diff --git a/docs/website/docs/dlt-ecosystem/destinations/snowflake.md b/docs/website/docs/dlt-ecosystem/destinations/snowflake.md
@@ -170,17 +170,17 @@ pipeline.run(events())
 
 ## Supported file formats
 * [insert-values](../file-formats/insert-format.md) is used by default.
-* [parquet](../file-formats/parquet.md) is supported.
-* [jsonl](../file-formats/jsonl.md) is supported.
-* [csv](../file-formats/csv.md) is supported.
+* [Parquet](../file-formats/parquet.md) is supported.
+* [JSONL](../file-formats/jsonl.md) is supported.
+* [CSV](../file-formats/csv.md) is supported.
 
 When staging is enabled:
-* [jsonl](../file-formats/jsonl.md) is used by default.
-* [parquet](../file-formats/parquet.md) is supported.
-* [csv](../file-formats/csv.md) is supported.
+* [JSONL](../file-formats/jsonl.md) is used by default.
+* [Parquet](../file-formats/parquet.md) is supported.
+* [CSV](../file-formats/csv.md) is supported.
 
 :::caution
-When loading from `parquet`, Snowflake will store `json` types (JSON) in `VARIANT` as a string. Use the `jsonl` format instead or use `PARSE_JSON` to update the `VARIANT` field after loading.
+When loading from Parquet, Snowflake will store `json` types (JSON) in `VARIANT` as a string. Use the JSONL format instead or use `PARSE_JSON` to update the `VARIANT` field after loading.
 :::
 
 ### Custom CSV formats

diff --git a/docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md b/docs/website/docs/dlt-ecosystem/destinations/sqlalchemy.md
@@ -154,7 +154,7 @@ For example, SQLite does not have `DATETIME` or `TIMESTAMP` types, so `timestamp
 ## Supported file formats
 
 * [typed-jsonl](../file-formats/jsonl.md) is used by default. JSON-encoded data with typing information included.
-* [parquet](../file-formats/parquet.md) is supported.
+* [Parquet](../file-formats/parquet.md) is supported.
 
 ## Supported column hints
 

diff --git a/docs/website/docs/dlt-ecosystem/destinations/synapse.md b/docs/website/docs/dlt-ecosystem/destinations/synapse.md
@@ -138,10 +138,10 @@ Data is loaded via `INSERT` statements by default.
 
 ## Supported file formats
 * [insert-values](../file-formats/insert-format.md) is used by default
-* [parquet](../file-formats/parquet.md) is used when [staging](#staging-support) is enabled
+* [Parquet](../file-formats/parquet.md) is used when [staging](#staging-support) is enabled
 
 ## Data type limitations
-* **Synapse cannot load `TIME` columns from `parquet` files**. `dlt` will fail such jobs permanently. Use the `insert_values` file format instead, or convert `datetime.time` objects to `str` or `datetime.datetime` to load `TIME` columns.
+* **Synapse cannot load `TIME` columns from Parquet files**. `dlt` will fail such jobs permanently. Use the `insert_values` file format instead, or convert `datetime.time` objects to `str` or `datetime.datetime` to load `TIME` columns.
 * **Synapse does not have a nested/JSON/struct data type**. The `dlt` `json` data type is mapped to the `nvarchar` type in Synapse.
 
 ## Table index type

diff --git a/docs/website/docs/dlt-ecosystem/file-formats/csv.md b/docs/website/docs/dlt-ecosystem/file-formats/csv.md
@@ -11,7 +11,7 @@ import SetTheFormat from './_set_the_format.mdx';
 `dlt` uses it for specific use cases - mostly for performance and compatibility reasons.
 
 Internally, we use two implementations:
-- **pyarrow** csv writer - a very fast, multithreaded writer for [arrow tables](../verified-sources/arrow-pandas.md)
+- **pyarrow** CSV writer - a very fast, multithreaded writer for [Arrow tables](../verified-sources/arrow-pandas.md)
 - **python stdlib writer** - a csv writer included in the Python standard library for Python objects
 
 ## Supported destinations