Skip to content

Add support for Avro's timestamp-millis LogicalType in DataReader #12395

@armitage420

Description

@armitage420

Feature Request / Improvement

Description
Currently, Iceberg's DataReader(Avro) lacks support for Avro's timestamp-millis LogicalType. This limitation causes issues when migrating Avro tables created with Hive 4 (which might use timestamp-millis logicalType) to Iceberg tables. Implementing support for timestamp-millis will improve compatibility and ease the migration process for users.

Current behavior
When performing an in-place migration of an Avro table created with Hive 4 containing a timestamp column to an Iceberg table, an IllegalArgumentException is thrown during SELECT operations. The error occurs as Iceberg attempts to map the Avro schema to the Iceberg table schema.

Error message
An IllegalArgumentException: Unknown logical type: org.apache.hive.iceberg.org.apache.avro.LogicalTypes$TimestampMillis is thrown.

Steps to reproduce

  1. Create an Avro table in Hive with a timestamp column:
CREATE EXTERNAL TABLE hive_test(`id` int, `name` string, `dt` timestamp) STORED AS AVRO;
  1. Insert test data:
INSERT INTO hive_test VALUES (1, "test name", CAST('2024-08-09 14:08:26.326107' AS TIMESTAMP));
  1. Verify the data:
SELECT * FROM hive_test;
  1. Migrate the table to Iceberg:
ALTER TABLE hive_test SET TBLPROPERTIES ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler', 'format-version' = '2');
  1. Attempt to query the migrated table:
SELECT * FROM hive_test;

Step 5 results in the IllegalArgumentException mentioned above.

Additional context
Debugging the Iceberg code reveals that DataReader has timestamp support for microseconds, and not for milliseconds.
In Iceberg's TypeToSchema.java, timestamps are converted to timestamp-micros logical type:

private static final Schema TIMESTAMP_SCHEMA =
      LogicalTypes.timestampMicros().addToSchema(Schema.create(Schema.Type.LONG));
private static final Schema TIMESTAMPTZ_SCHEMA =
      LogicalTypes.timestampMicros().addToSchema(Schema.create(Schema.Type.LONG));

This issue may not occur for tables originally created in Iceberg, but it affects the migration process from Hive Avro tables to Iceberg.
Other engines using Iceberg connectors (e.g., Hive) may encounter this issue during table migration.

Query engine

Hive

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions