Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Sorting with UUID Data Type in MariaDB #51883

Closed
Karem-sobhy opened this issue Jun 21, 2024 · 18 comments
Closed

Incorrect Sorting with UUID Data Type in MariaDB #51883

Karem-sobhy opened this issue Jun 21, 2024 · 18 comments

Comments

@Karem-sobhy
Copy link
Contributor

Karem-sobhy commented Jun 21, 2024

Laravel Version

11.11.1

PHP Version

8.3.8

Database Driver & Version

MariaDB 11.4.2

Description

When using the new UUID data type in MariaDB, the sorting is incorrect due to byte swapping performed by MariaDB to optimize UUIDv1 storage and indexing.

Details:

MariaDB stores UUID in an index-friendly manner. A UUID of the form llllllll-mmmm-Vhhh-vsss-nnnnnnnnnnnn is stored as:
nnnnnnnnnnnn-vsss-Vhhh-mmmm-llllllll

This format provides a sorting order assuming a UUIDv1 (node and timestamp) is used, where the node is followed by the timestamp. The key aspect is that the timestamps are sequential.

MariaDB does not perform this byte-swapping for UUID versions >= 6:

MariaDB starting with 10.10
Starting from MariaDB 10.10.6 and MariaDB 10.11.5, considering that UUIDv7 and other versions are designed around time ordering, UUIDs version >= 6 are now stored without byte-swapping. Additionally, UUIDs with version >= 8 and variant=0 are now considered invalid (as the standard expects).

Problem:

When Laravel uses Str::orderedUuid() to generate a new ID for a model and insert it, MariaDB sees the version as < 6 and inserts it swapped. Consequently, the UUID is not ordered and not index-friendly, even though the purpose of the ordered UUID function is to ensure order and End of the table insert (Index friendly insert).

Conclusion:

  • Every insert now is "Out of order insert", making it slower.
  • Sorting by the UUID column will not work because it's stored in a different way than the sorted string format.
  • This issue makes the new UUID data type in MariaDB useless and not compatible with Laravel.
  • This makes the old CHAR(36) preferable over the new UUID type, at least when storage wasn't an issue (since UUID is BIN(16)).
  • This behavior is not mentioned in the documentation.

Steps To Reproduce

  1. in .env DB_CONNECTION=mariadb
  2. make model with migration that contains $table->uuid('id')->primary();
  3. use the HasUuids trait inside the model
  4. create some of that model and observe the wrong order that it's stored in the table
@Karem-sobhy Karem-sobhy changed the title HasUuid wrong sorting behaviour with the new native uuid datatype in maraidb Incorrect Sorting with UUID Data Type in MariaDB Jun 21, 2024
@driesvints
Copy link
Member

Thanks @Karem-sobhy. Pinging you @staudenmeir as you were the author of #50192. Seems UUID columns aren't compatible with Laravel? I think we'll need more extensive tests in our MariaDB test suite for this one.

Copy link

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@staudenmeir
Copy link
Contributor

Interesting issue. I don't order by UUID columns in my apps and so haven't run into it.

This issue makes the new UUID data type in MariaDB useless and not compatible with Laravel.

Not a very constructive statement. The issue is not that Laravel supports a native database data type. When you don't need your UUIDs to be sortable or already use v7 UUIDs, the native type is a great benefit. MariaDB added the type for a reason.

IMO, the best solution would be using v7 UUIDs instead the current of the v4 ordered UUIDs, but altering the HasUuids trait's default behavior is basically an unsolvable breaking change.

Instead, I would say that this needs to be addressed by MariaDB users in their applications:

  • We add a helper method for generating v7 UUIDs and explain in the documentation that users should override newUniqueId() in their models and generate v7 UUIDs.
  • Or we recommend to use $table->char('uuid', 36); columns when records need to be sortable.

@driesvints
Copy link
Member

Thanks @staudenmeir. I do feel we need some documentation around this and would be grateful if anyone could provide a PR.

@Karem-sobhy please see @staudenmeir's reply around this topic.

@Karem-sobhy
Copy link
Contributor Author

@staudenmeir using the old char(36) may be a fix but using the orderedUuid with mariadb is just bad for indexing and performance of the insertion and also non informed users maybe don't know all this because the documentation says clearly the uuid will be ordered which is not true when using mariadb default uuids

I can help with a pr but don't know what will be best action to do so can you help me to chose from:

  • if the driver is mariadb then use v7 by default
  • make another HasV7Uuid trait
  • make new Str helper for v7 uuid and ask users to implement it in the newUniqueId method
    @driesvints @staudenmeir I think you are more informed and can chose the best solution for this to implement it.

@driesvints
Copy link
Member

I honestly don't know. We can't make the breaking change and having a HasV7Uuid which should technically be the better default sits a bit ill with me. I just wouldn't change anything atm.

@staudenmeir
Copy link
Contributor

Thanks for the PR @Karem-sobhy. IMO, the HasV7Uuid trait is the best solution to the MariaDB issue right now.

For Laravel 12, I think we should consider again to switch to UUID v7 by default. Just last week, there was a tweet about the performance gains from v7 on PostgreSQL.

@driesvints Do you remember why #44210 got reverted in the end?

@driesvints
Copy link
Member

Hey @staudenmeir. I guess it's mainly this remark from @tillkruss:

IMO it makes no sense to change the default Str::uuid() to v7, which will always reveal the timestamp (minor data leak),

Seems v7 has a data leak that's unwanted?

@staudenmeir
Copy link
Contributor

Thanks @driesvints. I was thinking about "only" switching the HasUuids trait to UUID v7 while leaving Str::uuid() unchanged. This would improve performance (at least with the DBMS we've checked so far) but not introduce any data leaks.

When upgrading to Laravel 12, existing users of the HasUuids trait that rely on the sortability would have to override newUniqueId() and continue using Str::orderedUuid() to avoid the breaking change.

@driesvints
Copy link
Member

@staudenmeir thanks for explaining. Right now I don't have the time to dig deeper into this matter. The best way forward is that you attempt a PR with whatever you want to try so Taylor can review it. Thanks 👍

@Karem-sobhy
Copy link
Contributor Author

@staudenmeir exactly what I was thinking about when creating the PR we leave Str::Uuid as is and have Str:Uuid7 instead but like you i think that HasUuid trait should use uuid7 by default because that if uuid7 leaks date so ordered uuid4 leaks the same data and is not a standard uuid also.

@hafezdivandari
Copy link
Contributor

@staudenmeir @Karem-sobhy I just saw PR #52029 with the new HasVersion7Uuids trait and I wonder how it's gonna solve the ordering issue? I mean assuming a model with HasUuid now switches to the new HasVersion7Uuids trait, the records will have wrong order because Str::uuid7() generates lower values than Str::orderedUuid right?

I mean this test is failing on MySQL / MariaDB:

public function testUuidOrder()
{
    Schema::create('orders', function (Blueprint $table) {
        $table->id();
        $table->uuid();
    });

    // we were using orderedUuid
    DB::table('orders')->insert(['uuid' => (string) Str::orderedUuid()]);

    // now switching to uuid7
    DB::table('orders')->insert(['uuid' => (string) Str::uuid7()]);

    // trying to get records in order, the actual value is [2, 1] and the test fails!
    $this->assertEquals([1, 2], DB::table('orders')->orderBy('uuid')->pluck('id')->toArray());
}

What am I missing here?

@staudenmeir
Copy link
Contributor

@hafezdivandari The new trait can only be used for models that don't have any records yet. You can't switch to it.

@hafezdivandari
Copy link
Contributor

@staudenmeir thank you for explaining, I was so confused.

@Karem-sobhy
Copy link
Contributor Author

@hafezdivandari as @staudenmeir said
The uuid7 is not compatible with the "ordered" uuid4
They can't be used interchangeabley but if you really want to you can change all you old model ids first
You can get them all chunked and oredered by created at and assign a new id for them by the new Str::uuid7 method then you can use the new trait
P.S you should be careful with the relationship have the old ordered uuid4 as the foreign key

@hafezdivandari
Copy link
Contributor

Thanks @Karem-sobhy, actually I was looking for a better solution to be used here on upcoming Passport 13.x. However this model's sortability doesn't rely on it's ID and I think we can use new uuid7 instead!

@Karem-sobhy
Copy link
Contributor Author

@hafezdivandari
UUID7 is the standard ordered UUID implementation, making it more advantageous than the non-standard ordered UUID4. This is particularly beneficial when used with the new MariaDB UUID datatype, as it is stored as BIN(16) instead of CHAR(36). Although ordered UUID4 will provide similar advantages in terms of storage efficiency with MariaDB, it does not offer optimized ordering. Consequently, every insert will be out of order, adding a slight overhead, although not significant.

If this approach is applied to Passport, it won't significantly impact current users because the structure remains the same, and it is not used for ordering. However, new Passport users will benefit from optimized order insertion and the ability to sort by ID when using MariaDB.

@staudenmeir
Copy link
Contributor

I created a PR for Laravel 12: #52433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants