HIVE-29040: Add UUIDV7 built in function #5891

BsoBird · 2025-06-24T04:27:40Z

What changes were proposed in this pull request?

support uuid_v7 function
https://issues.apache.org/jira/browse/HIVE-29040

Why are the changes needed?

Does this PR introduce any user-facing change?

No.

How was this patch tested?

See TestUDFUUID.class

BsoBird · 2025-06-24T04:41:37Z

@zhangbutao hello,Sir.
Can you help check this?tks.

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDV7.java

deniskuzZ · 2025-06-24T16:04:54Z

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDV7.java

+  }
+
+
+  private UUID randomUUIDV7() {


@BsoBird, where did you get the UUIDv7 impl?

https://bugs.openjdk.org/browse/JDK-8357251, 8334015: Add Support for UUID Version 7 (UUIDv7) defined in RFC 9562 openjdk/jdk#25303

https://antonz.org/uuidv7/#java

import https://github.com/f4b6a3/uuid-creator mvn dependency

UUID uuid = UuidCreator.getTimeOrderedEpoch();

@deniskuzZ https://github.com/robsonkades/uuidv7
This is a fast method for calculating UUIDV7.

https://github.com/f4b6a3/uuid-creator seems to have higher rating score

@deniskuzZ Okay,let me see.

@deniskuzZ I reviewed the code and conducted some simple tests. Currently, I have selected a UUIDv7 generation algorithm with relatively the best performance (approximately generating 18 million UUIDs per second in a single thread). Of course, if we need to switch to a different implementation library, I have no objections. What do you think? After all, the UUIDv7 algorithm isn't very complex, so perhaps there's no need to introduce a complicated library?After all, the performance of using SecureRandom + ByteBuffer to generate UUIDV7 is quite poor.

UUID users expect every generated ID to be unique. The random generator needs to be as secure as we can guarantee the uniqueness in my opinion.
As for the dependency, using mvn would make it easier to maintain licenses.

It really depends if we are talking about adding a new dependency to a huge library or if we are adding a dependency with a single Java class. Copying code is not a good practice and has various disadvantages.

If for instance you copy the code in Hive then we cannot take advantage of improvements/bug fixes/security patches that will land in the dependent library. Nobody, will care to bring these changes to Hive and it may be too late when we realize that somethings is needed.

Moreover, when you copy code from open-source projects and you have to follow the ASF policies (i.e., https://www.apache.org/legal/src-headers.html#3party) that I don't think you did in this case. This has legal implications that make our life more difficult.

Dependencies come and go so its fine to add/remove things when it makes sense to do so.

@zabetak @deniskuzZ @okumin
I replaced the UUIDv7 generation code with the uuid-creator dependency introduced via Maven.
Currently, I am still using the high-performance code to generate UUIDv7: Guid.v7().toString().
If we need to enforce the use of cryptographically secure UUIDv7, please let me know.

-- Meters ---------------------------------------------------------------------- uuid-creator:use-threadlocal-random count = 165457936 mean rate = 18360701.04 events/second 1-minute rate = 17878586.40 events/second 5-minute rate = 17878586.40 events/second 15-minute rate = 17878586.40 events/second

UUID users expect every generated ID to be unique. The random generator needs to be as secure as we can guarantee the uniqueness in my opinion. As for the dependency, using mvn would make it easier to maintain licenses.

@okumin If the generation speed of UUIDV7 using security-random could be a bit faster (for example, 100,000 pieces of data per second), I wouldn't mind using it. The biggest problem now is that its generation speed is too slow.

In the vast majority of database/data warehouse cases, uuid_v7 is only used as a unique key for records. In this scenario, even if the UUID is predicted, it doesn't seem to have much impact.

Assuming we generate 20 billion uuid_v7 ids, with 300 running in parallel, it would take a full 3.7 hours to complete. However, using the original uuid (uuidv4) function only takes 0.009 hours. This gap is simply too large. I can hardly imagine how the inefficient UUID_V7 could be effectively used in a production environment.

uuid-creator dependency should be good and project has also multiple stars

okumin · 2025-06-25T14:06:09Z

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDv7.java

+                  > SELECT _FUNC_();
+                  '36718a53-84f5-45d6-8796-4f79983ad49d'""")
+@UDFType(deterministic = false)
+public class UDFUUIDv7 extends UDF {


I'd like to use GenericUDF for new one because UDF is deprecated.

okumin · 2025-06-25T14:19:51Z

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDV7.java

+  }
+
+
+  private UUID randomUUIDV7() {


UUID users expect every generated ID to be unique. The random generator needs to be as secure as we can guarantee the uniqueness in my opinion.
As for the dependency, using mvn would make it easier to maintain licenses.

ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFUUID.java

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDv7.java

deniskuzZ

+1

deniskuzZ · 2025-06-25T15:46:53Z

ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFUUID.java

+  public void testUUIDv7() throws Exception {
+    UDFUUIDv7 udf = new UDFUUIDv7();
+    String id1 = udf.evaluate(null).toString();
+    TimeUnit.MILLISECONDS.sleep(1);


@BsoBird why do you need sleep? please see https://sonarcloud.io/project/issues?impactSeverities=MEDIUM&sinceLeakPeriod=true&issueStatuses=OPEN%2CCONFIRMED&pullRequest=5891&id=apache_hive

@deniskuzZ Because UUIDV7 is incrementing over time, I want to prove that the UUID generated after 1 millisecond must have a larger lexicographical order than the previous UUID.

@deniskuzZ There is no such thing as asynchronously waiting for the previous task to complete before executing the next task here. I just want to delay the time by 1 millisecond before generating the second UUIDV7 and ensure that the latter UUID is definitely greater than the former one. If this condition is not met, then the UUID we generated must not comply with the UUIDV7 specification.

String v7_uuid_id1 = udf.evaluate(null).toString(); TimeUnit.MILLISECONDS.sleep(1); String v7_uuid_id2 = udf.evaluate(null).toString(); assertTrue(v7_uuid_id2.compareTo(v7_uuid_id1)>0); //id2 must be greater than ID1

If we don't verify the time-sequential characteristic of UUIDV7 in unit tests, that seems too strange.

I am not even sure we need such kind of tests in this repo. This seems to be testing the correctness of the underlying implementation in com.github.f4b6a3 so such tests should be contributed there if they don't exist already.

+1 on the above point

Removed the test code for verifying UUIDV7 increment

@deniskuzZ @zabetak
Sir, I have replaced the test cases and removed Thread.sleep(). However, I still personally believe that some basic test cases should be added. Not adding any tests feels very strange. Based on consistent experience, we can assume that code without test cases simply does not work properly. Therefore, not adding test cases makes me feel very uncomfortable. (Maybe I have OCD.)😝

@BsoBird, I think we need a functional test that validates the actual UDF invocation end-to-end.
I've only found 1 q tests (ppd2.q) that uses the uuid().
Can we extend one of the existing udf*.q tests or create a new one?

An end-to-end test in a .q file is the way to go.

Given that we depend on an external library I would also suggest to run a smoke test by starting a Hive docker container (with these changes), set tez.local.mode=false, and execute a simple query making use of the new function. It's quite common to see ClassNotFoundException when new libraries are added in the ql module.

sonarqubecloud · 2025-06-26T13:55:40Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added the tests pending label Jun 24, 2025

BsoBird changed the title ~~support uuid_v7 function~~ HIVE-29040: Add UUIDV7 built in function Jun 24, 2025

asf-ci-hive added tests failed tests pending tests unstable and removed tests pending tests failed tests unstable labels Jun 24, 2025

BsoBird force-pushed the add-uuidv7 branch from c1526b8 to c6a8c62 Compare June 24, 2025 14:55

asf-ci-hive added tests failed and removed tests pending labels Jun 24, 2025

deniskuzZ reviewed Jun 24, 2025

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDV7.java Outdated Show resolved Hide resolved

deniskuzZ reviewed Jun 24, 2025

View reviewed changes

asf-ci-hive added tests pending tests passed and removed tests failed tests pending labels Jun 24, 2025

okumin reviewed Jun 25, 2025

View reviewed changes

asf-ci-hive added tests pending and removed tests passed labels Jun 25, 2025

BsoBird force-pushed the add-uuidv7 branch from 0e2005f to 6c9a617 Compare June 25, 2025 15:12

deniskuzZ reviewed Jun 25, 2025

View reviewed changes

ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFUUID.java Outdated Show resolved Hide resolved

deniskuzZ reviewed Jun 25, 2025

View reviewed changes

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDv7.java Outdated Show resolved Hide resolved

deniskuzZ approved these changes Jun 25, 2025

View reviewed changes

deniskuzZ reviewed Jun 25, 2025

View reviewed changes

asf-ci-hive added tests failed and removed tests pending labels Jun 25, 2025

asf-ci-hive added tests pending tests passed and removed tests failed tests pending tests passed labels Jun 25, 2025

use uuid-creator and use ThreadLocalRandom

8b4e3af

BsoBird force-pushed the add-uuidv7 branch from edad468 to 8b4e3af Compare June 26, 2025 05:43

asf-ci-hive added tests failed tests pending and removed tests pending tests failed labels Jun 26, 2025

BsoBird added 3 commits June 26, 2025 22:21

fix sonarcloud issue

45479da

add uuid_v7() sql test

151d261

add uuid_v7() sql test

4a1cada

asf-ci-hive added tests passed tests pending and removed tests pending tests passed labels Jun 26, 2025

aturoczy approved these changes Jun 26, 2025

View reviewed changes

HIVE-29040: Add UUIDV7 built in function #5891

Are you sure you want to change the base?

HIVE-29040: Add UUIDV7 built in function #5891

Conversation

BsoBird commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

BsoBird commented Jun 24, 2025

Uh oh!

Uh oh!

deniskuzZ Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BsoBird Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BsoBird Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BsoBird Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

deniskuzZ left a comment

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BsoBird Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jun 26, 2025

BsoBird commented Jun 24, 2025 •

edited

Loading

deniskuzZ Jun 24, 2025 •

edited

Loading

BsoBird Jun 24, 2025 •

edited

Loading

BsoBird Jun 25, 2025 •

edited

Loading

BsoBird Jun 25, 2025 •

edited

Loading

deniskuzZ Jun 25, 2025 •

edited

Loading

deniskuzZ Jun 25, 2025 •

edited

Loading

BsoBird Jun 25, 2025 •

edited

Loading

deniskuzZ Jun 26, 2025 •

edited

Loading