Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: implement types timestamp_ns and timestamptz_ns #9008

Merged
merged 42 commits into from
Sep 3, 2024

Conversation

jacobmarble
Copy link
Contributor

@jacobmarble jacobmarble commented Nov 8, 2023

Closes #8657
Closes #10775

This change adds field ChronoUnit unit to TimestampType, such that TimestampType now represents four specified types:

Note that TimestampType.with[out]Zone() are marked as deprecated in this change. In future PRs, I'll remove usage of these static methods.

@github-actions github-actions bot added the API label Nov 8, 2023
@jacobmarble
Copy link
Contributor Author

Do these need to be addressed in this PR?

TestSpark3Util > testDescribeSortOrder FAILED
    org.junit.ComparisonFailure: Sort order isn't correct. expected:<[hours(time) DESC NULLS FIRST]> but was:<[]>
        at org.junit.Assert.assertEquals(Assert.java:117)
        at org.apache.iceberg.spark.TestSpark3Util.testDescribeSortOrder(TestSpark3Util.java:84)

TestFilteredScan > [format = parquet, vectorized = false, planningMode = LOCAL] > testHourPartitionedTimestampFilters[format = parquet, vectorized = false, planningMode = LOCAL] FAILED
    java.lang.AssertionError: Primitive value should be equal to expected expected:<8> but was:<5>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:119)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:68)
        at org.apache.iceberg.spark.source.TestFilteredScan.assertEqualsSafe(TestFilteredScan.java:573)
        at org.apache.iceberg.spark.source.TestFilteredScan.testHourPartitionedTimestampFilters(TestFilteredScan.java:374)

@nastra
Copy link
Contributor

nastra commented Nov 14, 2023

Do these need to be addressed in this PR?

TestSpark3Util > testDescribeSortOrder FAILED
    org.junit.ComparisonFailure: Sort order isn't correct. expected:<[hours(time) DESC NULLS FIRST]> but was:<[]>
        at org.junit.Assert.assertEquals(Assert.java:117)
        at org.apache.iceberg.spark.TestSpark3Util.testDescribeSortOrder(TestSpark3Util.java:84)

TestFilteredScan > [format = parquet, vectorized = false, planningMode = LOCAL] > testHourPartitionedTimestampFilters[format = parquet, vectorized = false, planningMode = LOCAL] FAILED
    java.lang.AssertionError: Primitive value should be equal to expected expected:<8> but was:<5>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:119)
        at org.apache.iceberg.spark.data.GenericsHelpers.assertEqualsSafe(GenericsHelpers.java:68)
        at org.apache.iceberg.spark.source.TestFilteredScan.assertEqualsSafe(TestFilteredScan.java:573)
        at org.apache.iceberg.spark.source.TestFilteredScan.testHourPartitionedTimestampFilters(TestFilteredScan.java:374)

@jacobmarble are you sure those failures aren't caused by the changes introduced in this PR?

Copy link

@epgif epgif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all!

I've recently joined Jacob's team at Influx and will be taking over this pull request. I've started by addressing the test failures and asking some questions about the review comments. I'd like to address everything this week.

Thanks!

@epgif epgif force-pushed the jgm-timestamp-nanos-api branch 4 times, most recently from 547aadd to f3dad15 Compare January 30, 2024 19:45
@jacobmarble jacobmarble requested a review from rdblue February 2, 2024 19:42
@epgif
Copy link

epgif commented Feb 3, 2024

https://github.com/apache/iceberg/actions/runs/7717197684/job/21172136322?pr=9008
looks like a spurious failure? All the rest passed, and even
spark-3x-java-17-tests (3.5, 2.13) passes when I run locally:

% JAVA_HOME=/usr/lib64/jvm/java-17-openjdk-17 SPARK_LOCAL_IP=localhost ./gradlew -DsparkVersions=3.5 -DscalaVersion=2.12 -DhiveVersions= -DflinkVersions= :iceberg-spark:iceberg-spark-3.5_2.12:check  -Pquick=true -x javadoc
> Task :iceberg-data:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-spark:iceberg-spark-3.5_2.12:scalastyleMainCheck
Processed 6 file(s)
Found 0 errors
Found 0 warnings
Finished in 802 ms

> Task :iceberg-spark:iceberg-spark-3.5_2.12:compileScala
Unexpected javac output: warning: [options] bootstrap class path not set in conjunction with -source 8
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
1 warning.

> Task :iceberg-spark:iceberg-spark-3.5_2.12:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-spark:iceberg-spark-3.5_2.12:test
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/8.1.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 37m 54s
40 actionable tasks: 11 executed, 29 up-to-date

@jbonofre
Copy link
Member

jbonofre commented Feb 5, 2024

I wonder why not using something similar to what we have for decimal with (P,S) for timestamp ?
If we want to have "open precision" for timestamp we could imagine to have second/millisecond/microsecond/nanosecond/picosecond/...

For backward compatible, we can keep timestamp/timestamptz and add in Spec V3 'timestampp/timestampptz`.

Just wondering 😄

@epgif epgif force-pushed the jgm-timestamp-nanos-api branch 2 times, most recently from b1e18f1 to d8688f2 Compare February 14, 2024 22:19
Copy link

@epgif epgif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take another look @rdblue -- thanks!

@epgif epgif force-pushed the jgm-timestamp-nanos-api branch from d8688f2 to 0edf1d8 Compare February 20, 2024 15:03
Helps apache#8657

This change adds field `TimestampType.Unit unit` to `TimestampType`,
such that `TimestampType` now represents four specified types:
- `timestamp` (existing)
- `timestamptz` (existing)
- `timestamp_ns` (new apache#8683)
- `timestamptz_ns` (new apache#8683)

Note that TimestampType.with[out]Zone() are marked as deprecated in this
change. In future PRs, I'll remove usage of these static methods.
@epgif epgif force-pushed the jgm-timestamp-nanos-api branch from 0edf1d8 to 0e098f0 Compare February 20, 2024 15:20
format/spec.md Outdated Show resolved Hide resolved
Review fixes for timestamp_ns API changes
@jacobmarble
Copy link
Contributor Author

Thanks for the direct feedback and added commits @rdblue!

@rdblue
Copy link
Contributor

rdblue commented Aug 26, 2024

@jacobmarble looks like we need to run spotless.

@epgif
Copy link

epgif commented Aug 26, 2024

@jacobmarble looks like we need to run spotless.

@rdblue Done.

@rdblue
Copy link
Contributor

rdblue commented Sep 2, 2024

I opened jacobmarble#2 to fix the remaining issue, which is that there was no check that prevented the new type from being used in v1 or v2 tables.

Prevent creating table metadata with nanosecond timestamps before v3
@github-actions github-actions bot added the core label Sep 3, 2024
*/
public static void checkCompatibility(Schema schema, int formatVersion) {
// check the type in each field
for (NestedField field : schema.lazyIdToField().values()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Now that I'm thinking about this more, we may want to accumulate a full set of problems and then show them in one message. That can be done as a follow-up though.

@rdblue rdblue merged commit 113c6e7 into apache:main Sep 3, 2024
47 checks passed
@rdblue
Copy link
Contributor

rdblue commented Sep 3, 2024

Thanks, @jacobmarble and @epgif!

@jacobmarble jacobmarble deleted the jgm-timestamp-nanos-api branch September 3, 2024 16:12
@jacobmarble
Copy link
Contributor Author

Thank you for helping us get across the finish line @rdblue!
Thank you for all the effort reviewing @nastra @Fokko @amogh-jahagirdar!

nk1506 pushed a commit to nk1506/iceberg that referenced this pull request Sep 17, 2024
jenbaldwin pushed a commit to Teradata/iceberg that referenced this pull request Sep 17, 2024
* main: (208 commits)
  Docs: Fix Flink 1.20 support versions (apache#11065)
  Flink: Fix compile warning (apache#11072)
  Docs: Initial committer guidelines and requirements for merging (apache#10780)
  Core: Refactor ZOrderByteUtils (apache#10624)
  API: implement types timestamp_ns and timestamptz_ns (apache#9008)
  Build: Bump com.google.errorprone:error_prone_annotations (apache#11055)
  Build: Bump mkdocs-material from 9.5.33 to 9.5.34 (apache#11062)
  Flink: Backport PR apache#10526 to v1.18 and v1.20 (apache#11018)
  Kafka Connect: Disable publish tasks in runtime project (apache#11032)
  Flink: add unit tests for range distribution on bucket partition column (apache#11033)
  Spark 3.5: Use FileGenerationUtil in PlanningBenchmark (apache#11027)
  Core: Add benchmark for appending files (apache#11029)
  Build: Ignore benchmark output folders across all modules (apache#11030)
  Spec: Add RemovePartitionSpecsUpdate REST update type (apache#10846)
  Docs: bump latest version to 1.6.1 (apache#11036)
  OpenAPI, Build: Apply spotless to testFixtures source code (apache#11024)
  Core: Generate realistic bounds in benchmarks (apache#11022)
  Add REST Compatibility Kit (apache#10908)
  Flink: backport PR apache#10832 of inferring parallelism in FLIP-27 source (apache#11009)
  Docs: Add Druid docs url to sidebar (apache#10997)
  ...
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API core Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New types can break older Iceberg agents add type: Timestamp with nanosecond units
8 participants