Skip to content

Commit

Permalink
add the dat tables to the readme
Browse files Browse the repository at this point in the history
  • Loading branch information
MrPowers committed Jan 27, 2024
1 parent 96e1950 commit 2ba721b
Show file tree
Hide file tree
Showing 2 changed files with 178 additions and 0 deletions.
173 changes: 173 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,179 @@ For an example implementation of this, see the example PySpark tests in `tests/p

TBD.

## Generated tables

**all_primitive_types**

Table containing all non-nested types.

```
+----+-----+-----+-----+----+-------+-------+-----+-------------+-------+----------+-------------------+
|utf8|int64|int32|int16|int8|float32|float64| bool| binary|decimal| date32| timestamp|
+----+-----+-----+-----+----+-------+-------+-----+-------------+-------+----------+-------------------+
| 0| 0| 0| 0| 0| 0.0| 0.0| true| []| 10.000|1970-01-01|1970-01-01 00:00:00|
| 1| 1| 1| 1| 1| 1.0| 1.0|false| [00]| 11.000|1970-01-02|1970-01-01 01:00:00|
| 2| 2| 2| 2| 2| 2.0| 2.0| true| [00 00]| 12.000|1970-01-03|1970-01-01 02:00:00|
| 3| 3| 3| 3| 3| 3.0| 3.0|false| [00 00 00]| 13.000|1970-01-04|1970-01-01 03:00:00|
| 4| 4| 4| 4| 4| 4.0| 4.0| true|[00 00 00 00]| 14.000|1970-01-05|1970-01-01 04:00:00|
+----+-----+-----+-----+----+-------+-------+-----+-------------+-------+----------+-------------------+
```

**basic_append**

A basic table with two append writes.

```
+------+------+-------+
|letter|number|a_float|
+------+------+-------+
| a| 1| 1.1|
| b| 2| 2.2|
| c| 3| 3.3|
| d| 4| 4.4|
| e| 5| 5.5|
+------+------+-------+
```

**basic_partitioned**

A basic partitioned table.

```
+------+------+-------+
|letter|number|a_float|
+------+------+-------+
| b| 2| 2.2|
| NULL| 6| 6.6|
| c| 3| 3.3|
| a| 1| 1.1|
| a| 4| 4.4|
| e| 5| 5.5|
+------+------+-------+
```

**multi_partitioned**

Multiple levels of partitioning, with boolean, timestamp, and decimal partition columns.

```
+-----+-------------------+--------------------+---+
| bool| time| amount|int|
+-----+-------------------+--------------------+---+
|false|1970-01-02 08:45:00|12.00000000000000...| 3|
| true|1970-01-01 00:00:00|200.0000000000000...| 1|
| true|1970-01-01 12:30:00|200.0000000000000...| 2|
+-----+-------------------+--------------------+---+
```

**multi_partitioned_2**

Multiple levels of partitioning, with boolean, timestamp, and decimal partition columns.

```
+-----+-------------------+--------------------+---+
| bool| time| amount|int|
+-----+-------------------+--------------------+---+
|false|1970-01-02 08:45:00|12.00000000000000...| 3|
| true|1970-01-01 00:00:00|200.0000000000000...| 1|
| true|1970-01-01 12:30:00|200.0000000000000...| 2|
+-----+-------------------+--------------------+---+
```

**nested_types**

Table containing various nested types.

```
+---+------------+---------------+--------------------+
| pk| struct| array| map|
+---+------------+---------------+--------------------+
| 0| {0.0, true}| [0]| {}|
| 1|{1.0, false}| [0, 1]| {0 -> 0}|
| 2| {2.0, true}| [0, 1, 2]| {0 -> 0, 1 -> 1}|
| 3|{3.0, false}| [0, 1, 2, 3]|{0 -> 0, 1 -> 1, ...|
| 4| {4.0, true}|[0, 1, 2, 3, 4]|{0 -> 0, 1 -> 1, ...|
+---+------------+---------------+--------------------+
```

**no_replay**

Table with a checkpoint and prior commits cleaned up.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**no_stats**

Table with no stats.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**stats_as_structs**

Table with stats only written as struct (not JSON) with Checkpoint.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**with_checkpoint**

Table with a checkpoint.

```
+------+---+----------+
|letter|int| date|
+------+---+----------+
| a| 93|1975-06-01|
| b|753|2012-05-01|
| c|620|1983-10-01|
| a|595|2013-03-01|
| NULL|653|1995-12-01|
+------+---+----------+
```

**with_schema_change**

Table which has schema change using overwriteSchema=True.

```
+----+----+
|num1|num2|
+----+----+
| 22| 33|
| 44| 55|
| 66| 77|
+----+----+
```

## Models

The test cases contain several JSON files to be read by connector tests. To make it easier to read them, we provide [JSON schemas](https://json-schema.org/) for each of the file types in `out/schemas/`. They can be read to understand
Expand Down
5 changes: 5 additions & 0 deletions tests/pyspark_delta/test_pyspark_delta.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,11 @@ def test_readers_dat(spark_session, case: ReadCase):
query.load(str(case.delta_root))
else:
actual_df = query.load(str(case.delta_root))
print("***")
print(case.delta_root)
print(case.description)
print(actual_df.show())
print("***")

expected_df = spark_session.read.format('parquet').load(
str(case.parquet_root) + '/*.parquet')
Expand Down

0 comments on commit 2ba721b

Please sign in to comment.