You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using pguint package (can be found in github) for supporting unsigned integers in postgres. pguint package adds following postgres data types:
int1 (8 bits signed integer)
uint1 (8 bits unsigned integer)
uint2 (16 bits unsigned integer)
uint4 (32 bits unsigned integer)
uint8 (64 bits unsigned integer)
I modified parquet_fdw code to support these new datatypes.
My proposal is to make TypeInfo to behave more as a C++ class than a C structure. It eliminates the C switches on parquet datatypes (maybe will work a little bit faster) and makes it easier to add support for new datatypes.
TypeInfo will be a base class. There will be derived classes from it for each parquet data type, for example: BoolTypeInfo, Int8TypeInfo, DoubleTypeInfo, TimestampTypeInfo, etc. TypeInfo base class will have (virtual) interface methods that are implemented in the derived classes.
Example on read_primitive_type function:
Add following interface:
virtual Datum TypeInfo::read_primitive_type(arrow::Array *array, int64_t i) const = 0;
Implement it in derived classes:
virtual Datum BoolTypeInfo::read_primitive_type(arrow::Array *array, int64_t i) const
{
arrow::BooleanArray *boolarray = (arrow::BooleanArray *) array;
Datum res = BoolGetDatum(boolarray->Value(i));
return res;
}
And modify ParquetReader::read_primitive_type accordingly:
Datum ParquetReader::read_primitive_type(arrow::Array *array, const TypeInfo *typinfo, int64_t i)
{
if (typinfo == nullptr)
{
throw Error("parquet_fdw: unsupported column type");
}
Datum res = typinfo->read_primitive_type(array, i);
...
}
A "special" code that does not seem to fit above proposal is row_group_matches_filter function where it calls bytes_to_postgres_type function since there is no TypeInfo involved there (maybe it can be changed).
Thanks,
Maoz
The text was updated successfully, but these errors were encountered:
thanks for the suggestion. TBH i doubt this will work faster. This would require using virtual functions which implies extra memory access (to vtable). And one extra function call. Which may be significant for such trivial work, that read_primitive_type does.
Anyway this requires benchmarking. Unfortunately i currently don't have much time as there are higher priority tasks. Maybe later.
Hi,
I'm using pguint package (can be found in github) for supporting unsigned integers in postgres. pguint package adds following postgres data types:
I modified parquet_fdw code to support these new datatypes.
My proposal is to make TypeInfo to behave more as a C++ class than a C structure. It eliminates the C switches on parquet datatypes (maybe will work a little bit faster) and makes it easier to add support for new datatypes.
TypeInfo will be a base class. There will be derived classes from it for each parquet data type, for example: BoolTypeInfo, Int8TypeInfo, DoubleTypeInfo, TimestampTypeInfo, etc. TypeInfo base class will have (virtual) interface methods that are implemented in the derived classes.
Example on read_primitive_type function:
Add following interface:
Implement it in derived classes:
And modify ParquetReader::read_primitive_type accordingly:
A "special" code that does not seem to fit above proposal is row_group_matches_filter function where it calls bytes_to_postgres_type function since there is no TypeInfo involved there (maybe it can be changed).
Thanks,
Maoz
The text was updated successfully, but these errors were encountered: