-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Variant Type #1453
Add Variant Type #1453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Let's push ongoing conversations before merging.
…to variant_type
…dlib tests, fix nits
Updated to resolve all review notes. Thanks! 👍 I will apply these changes to the other PRs too |
Summary
Implement
Variant
column type. Partially resolves #1430.closes #1195
Implementation
This implementation adds 2 major types to the module:
column.Variant
- the column implementation for (de)serializationclickhouse.Variant
- a container to hold variant values (optional for (de)serialization). This type also has the ability to provide a preferred type in cases where it is ambiguous to existing column type detection (such asArray(UInt8)
vsString
)column.Variant
Serialization
When values are appended via
col.AppendRow()
, the inputv interface{}
type is checked. If it isnil
, aNull
discriminator is appended. If it is aclickhouse.Variant
with a preferred type, then the specified column type will be appended along with its matching discriminator. The underlying column'sAppendRow
function is re-used so that we don't need to re-implement its logic.As a catch-all, the input value will be tested against each column type until it succeeds. For example,
Variant(Bool, Int64, String)
will try to append asbool
,int64
, thenstring
. If a value does not fit into any column type, it will return an error.Sometimes types will conflict. Due to alphabetical sorting of the type,
Array(UInt8)
would be used beforeString
sinceArray
allows forstring
input. I have researched different solutions to this, including a type priority system, but it would be complex to implement. For now it is easiest to let the user simply inputNewVariantWithType(int64(42), "Int64")
orNewVariant(int64(42)).WithType("Int64")
if they want a specific type within the variant. For complex types like maps, reflection will be used if a type isn't specified.After all rows are appended, the Native format is used to serialize the data into the buffer. First with
serializationVersion
, then theuint8
array fordiscriminators
, then each column'sEncode
function is re-used as usual (similar toTuple
).Deserialization
The Native format deserializes the
discriminators
and builds a set ofoffsets
for each column. This allows for storing multiple columns with mixed lengths. When the user wants to read a row, we can index into the correct row of each column to get the corresponding type.In practice this looks like this:
Or, if you know your types ahead of time, you can also scan directly into it:
This pattern works by simply calling the underlying column's
ScanRow
function. It is safest to scan intoVariant
however.If you need to switch types on
Variant
for your own type detection, you can usevariantRow.Any()
to returnany
.You can also switch on ClickHouse type strings by using
variantRow.Type()
.clickhouse.Variant
clickhouse.Variant
is simply a wrapper aroundany
that also includes an optionalstring
to represent the preferred ClickHouse type. It implements stdlib sql interfaces such asdriver.Value
andScan
. If you need to access the underlying value you can useAny()
. This type can be constructed with theclickhouse.NewVariant(v)
function, or withclickhouse.NewVariantWithType(v, "Array(String)")
if you need to provide a preferred type.The
clickhouse.Variant
type should be used in structs and when scanning fromcolumn.Variant
. It can also be used for insertion, although a variant type may be required if there's overlap between types.You can use the preferred type for insertion when the Variant column has types that overlap. For example if you had
Variant(Array(UInt8), String)
, a Gostring
would be inserted as anArray(UInt8)
. If you wanted to force this to be a ClickHouseString
, you could useclickhouse.NewVariantWithType(v, "String")
to provide the preferred type. If the preferred type is not present in the Variant, the row will fail to append to the block. Types can be added on an existingVariant
by callingexampleVariant.WithType(t string)
, which will return a newclickhouse.Variant
with the preferred type set.Checklist
Delete items not relevant to your PR: