Skip to content

Commit

Permalink
Merge pull request #883 from dlt-hub/dlt-issue-881
Browse files Browse the repository at this point in the history
Update field type coercion with more specific examples
  • Loading branch information
sultaniman authored Jan 9, 2024
2 parents 7d9baf1 + 0e77855 commit 4d16a1f
Showing 1 changed file with 40 additions and 1 deletion.
41 changes: 40 additions & 1 deletion docs/website/docs/general-usage/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,46 @@ Postgres ignore it when creating tables.
### Variant columns

Variant columns are generated by a normalizer when it encounters data item with type that cannot be
coerced in existing column.
coerced in existing column. Please see our [`coerce_row`](https://github.com/dlt-hub/dlt/blob/7d9baf1b8fdf2813bcf7f1afe5bb3558993305ca/dlt/common/schema/schema.py#L205) if you are interested to see how internally it works.

Let's consider our [getting started](../getting-started#quick-start) example with slightly different approach,
where `id` is an integer type at the beginning

```py
data = [
{"id": 1, "human_name": "Alice"}
]
```

once pipeline runs we will have the following schema:

| name | data_type | nullable |
| ------------- | ------------- | -------- |
| id | bigint | true |
| human_name | text | true |

Now imagine the data has changed and `id` field also contains strings

```py
data = [
{"id": 1, "human_name": "Alice"}
{"id": "idx-nr-456", "human_name": "Bob"}
]
```

So after you run the pipeline `dlt` will automatically infer type changes and will add a new field in the schema `id__v_text`
to reflect that new data type for `id` so for any type which is not compatible with integer it will create a new field.

| name | data_type | nullable |
| ------------- | ------------- | -------- |
| id | bigint | true |
| human_name | text | true |
| id__v_text | text | true |

On the other hand if `id` field was already a string then introducing new data with `id` containing other types
will not change schema because they can be coerced to string.

Now go ahead and try to add a new record where `id` is float number, you should see a new field `id__v_double` in the schema.

### Data types

Expand Down

0 comments on commit 4d16a1f

Please sign in to comment.