Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Variant Type #1453

Merged
merged 6 commits into from
Jan 10, 2025
Merged

Add Variant Type #1453

merged 6 commits into from
Jan 10, 2025

Conversation

SpencerTorres
Copy link
Member

@SpencerTorres SpencerTorres commented Dec 20, 2024

Summary

Implement Variant column type. Partially resolves #1430.
closes #1195

Implementation

This implementation adds 2 major types to the module:

  • column.Variant - the column implementation for (de)serialization
  • clickhouse.Variant - a container to hold variant values (optional for (de)serialization). This type also has the ability to provide a preferred type in cases where it is ambiguous to existing column type detection (such as Array(UInt8) vs String)

column.Variant

Serialization

// Variant(Array(Map(String, String)), Array(UInt8), Bool, Int64, String)
batch, err := conn.PrepareBatch(ctx, "INSERT INTO test_variant (c)")
require.NoError(t, err)
require.NoError(t, batch.Append(int64(42))) // Accepts primitives
require.NoError(t, batch.Append(clickhouse.NewVariantWithType("test", "String"))) // Accepts Variants with type preference
require.NoError(t, batch.Append(true))
require.NoError(t, batch.Append(clickhouse.NewVariant([]uint8{0xA, 0xB, 0xC}).WithType("Array(UInt8)"))) 
require.NoError(t, batch.Append(nil)) // Accepts nil
require.NoError(t, batch.Append([]map[string]string{{"key1": "val1"}, {"key2": "val2"}})) // Accepts complex types

When values are appended via col.AppendRow(), the input v interface{} type is checked. If it is nil, a Null discriminator is appended. If it is a clickhouse.Variant with a preferred type, then the specified column type will be appended along with its matching discriminator. The underlying column's AppendRow function is re-used so that we don't need to re-implement its logic.

As a catch-all, the input value will be tested against each column type until it succeeds. For example, Variant(Bool, Int64, String) will try to append as bool, int64, then string. If a value does not fit into any column type, it will return an error.

Sometimes types will conflict. Due to alphabetical sorting of the type, Array(UInt8) would be used before String since Array allows for string input. I have researched different solutions to this, including a type priority system, but it would be complex to implement. For now it is easiest to let the user simply input NewVariantWithType(int64(42), "Int64") or NewVariant(int64(42)).WithType("Int64") if they want a specific type within the variant. For complex types like maps, reflection will be used if a type isn't specified.

After all rows are appended, the Native format is used to serialize the data into the buffer. First with serializationVersion, then the uint8 array for discriminators, then each column's Encode function is re-used as usual (similar to Tuple).

Deserialization

The Native format deserializes the discriminators and builds a set of offsets for each column. This allows for storing multiple columns with mixed lengths. When the user wants to read a row, we can index into the correct row of each column to get the corresponding type.

In practice this looks like this:

var row clickhouse.Variant // Scan into variant

require.True(t, rows.Next())
err = rows.Scan(&row)
require.NoError(t, err)
require.Equal(t, int64(42), row.Any())

Or, if you know your types ahead of time, you can also scan directly into it:

var i int64 // Scan directly into int64
require.True(t, rows.Next())
err = rows.Scan(&i)
require.NoError(t, err)
require.Equal(t, int64(84), i)

This pattern works by simply calling the underlying column's ScanRow function. It is safest to scan into Variant however.
If you need to switch types on Variant for your own type detection, you can use variantRow.Any() to return any.
You can also switch on ClickHouse type strings by using variantRow.Type().

clickhouse.Variant

clickhouse.Variant is simply a wrapper around any that also includes an optional string to represent the preferred ClickHouse type. It implements stdlib sql interfaces such as driver.Value and Scan. If you need to access the underlying value you can use Any(). This type can be constructed with the clickhouse.NewVariant(v) function, or with clickhouse.NewVariantWithType(v, "Array(String)") if you need to provide a preferred type.

The clickhouse.Variant type should be used in structs and when scanning from column.Variant. It can also be used for insertion, although a variant type may be required if there's overlap between types.

You can use the preferred type for insertion when the Variant column has types that overlap. For example if you had Variant(Array(UInt8), String), a Go string would be inserted as an Array(UInt8). If you wanted to force this to be a ClickHouse String, you could use clickhouse.NewVariantWithType(v, "String") to provide the preferred type. If the preferred type is not present in the Variant, the row will fail to append to the block. Types can be added on an existing Variant by calling exampleVariant.WithType(t string), which will return a new clickhouse.Variant with the preferred type set.

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added

@SpencerTorres SpencerTorres mentioned this pull request Dec 20, 2024
3 tasks
Copy link
Contributor

@jkaflik jkaflik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st review

lib/chcol/variant.go Outdated Show resolved Hide resolved
lib/chcol/variant.go Outdated Show resolved Hide resolved
lib/column/variant.go Outdated Show resolved Hide resolved
lib/column/variant.go Outdated Show resolved Hide resolved
lib/column/variant_test.go Outdated Show resolved Hide resolved
tests/variant_test.go Show resolved Hide resolved
@SpencerTorres SpencerTorres mentioned this pull request Dec 26, 2024
1 task
Copy link
Contributor

@jkaflik jkaflik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Let's push ongoing conversations before merging.

@SpencerTorres
Copy link
Member Author

Updated to resolve all review notes. Thanks! 👍

I will apply these changes to the other PRs too

@SpencerTorres SpencerTorres merged commit 308ff76 into main Jan 10, 2025
12 checks passed
@SpencerTorres SpencerTorres deleted the variant_type branch January 10, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for new JSON type Add support for experimental Variant data type
2 participants