findLastChangeRevision() returns the wrong result in a multi-node env #3643

wacker · 2024-10-16T11:55:07Z

There is a regression in v3.3.4 introduced by #3579, which leads to the wrong revision returned by a call to findLastChangeRevision().

The assumption of strictly increasing revision numbers over time holds true for a single instance but not in a multi-node environment, as Hibernate sequence numbers are allocated per node in batches of 50.

Cases where the latest revision actually has the highest revision number become increasingly rare as the number of nodes increases. To illustrate this, here are real-world example contents of a revinfo table:

In this light, whether two transactions are from two differing nodes of your application or from two differing sessions within the same application instance is in fact not as irrelevant as the forum answer (referred to in the other issue thread) suggests in its conclusion.

The 'order by' inside the implementation should probably be composed of timestamp AND rev in order to handle the issue of exactly the same timestamps.

The text was updated successfully, but these errors were encountered:

mp911de · 2024-10-17T12:47:00Z

Thanks for reaching out. If neither timestamp nor the revision number (when batched) are good candidates to get the latest change, what is then the solution?

In the same sense as timestamps might deviate across multiple nodes, neither is ideal. It seems that the previous variant has been less of an issue, still with a batched revision number, we cannot guarantee that the first item sorted by timestamp and number is the most recent one.

quaff · 2024-10-18T02:41:37Z

as Hibernate sequence numbers are allocated per node in batches of 50.

Have you tried to disable pre-allocating sequence numbers?

wacker · 2024-10-18T07:38:02Z

If neither timestamp nor the revision number (when batched) are good candidates to get the latest change, what is then the solution?

To undo the change.

Previously, the result of findLastChangeRevision() was basically correct, except for the corner case where two nodes make changes at exactly the same time and the accuracy of the timestamps is not sufficient to distinguish the changes correctly. As long as the workload is well partitioned between the nodes and one entity is usually processed by the same node, this corner case is unlikely to occur.

Currently, findLastChangeRevision() only returns the correct change revision if no node that made an earlier change did have a more recent revision sequence number batch allocated at the time, which is a gamble. In a distributed environment, the return value is therefore essentially useless unless pre-allocation/pooling of the revision sequence numbers is disabled, which has performance implications.

Sorting by revision number alone failed for distributed systems using batched sequences for revision numbers. Closes #3643 See #3579

schauder · 2024-10-21T12:44:16Z

The assumption of strictly increasing revision numbers over time holds true for a single instance but not in a multi-node environment, as Hibernate sequence numbers are allocated per node in batches of 50.

It's not really an assumption, but something that probably should be a requirement and might actually be intended as a requirement, and one might argue that use of a batched sequence is a mistake on your/Hibernates side.

As long as the workload is well partitioned between the nodes and one entity is usually processed by the same node, this corner case is unlikely to occur.

Here you are kinda arguing against yourself. if and entity is processed by the same node, it should work with the same batch of ids and not have a problem.
But since you obviously have a problem, this does not hold.
This isn't surprising since there really isn't a reason why an entity should be processed by the same node.
Even when one uses some partitioning, e.g. by users, they will still often access entity that cross these borders.
For examples users might place orders which then affect stock levels of products.

An additional potential complication is that clocks on different nodes aren't synchronized. But AFAIK nobody has complained about that so far, so I guess it's not an issue for most users.

All that said: We decided to go with sorting by timestamp first and revision second. Given that sorting by timestamp worked for a long time without complains, I'm somewhat confident that this solution is good enough

wacker · 2024-10-21T14:52:12Z

Thanks for taking care of the issue. To be honest, I was surprised when I discovered non-continuous revision numbers with Envers and only then learned about pooling sequence numbers in Hibernate.

It's not really an assumption, but something that probably should be a requirement and might actually be intended as a requirement, and one might argue that use of a batched sequence is a mistake on your/Hibernates side.

I don't see how this 'mistake' could be avoided as the mapping of the revinfo table is not explicitly defined or configurable(?) Sure, hibernate.id.optimizer.pooled.preferred=none would probably help, but this affects the whole application and not just Envers.

Anyway, IMO it should be mentioned in the Envers docs.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 16, 2024

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Oct 17, 2024

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Oct 18, 2024

mp911de added type: regression A regression from a previous release and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Oct 21, 2024

mp911de assigned schauder Oct 21, 2024

schauder closed this as completed in c4b1c18 Oct 21, 2024

schauder added a commit that referenced this issue Oct 21, 2024

Sort revisions by timestamp and revision number.

4a02924

Sorting by revision number alone failed for distributed systems using batched sequences for revision numbers. Closes #3643 See #3579

schauder added a commit that referenced this issue Oct 21, 2024

Sort revisions by timestamp and revision number.

57bf1fb

Sorting by revision number alone failed for distributed systems using batched sequences for revision numbers. Closes #3643 See #3579

schauder added this to the 3.2.12 (2023.1.12) milestone Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

findLastChangeRevision() returns the wrong result in a multi-node env #3643

findLastChangeRevision() returns the wrong result in a multi-node env #3643

wacker commented Oct 16, 2024 •

edited

Loading

mp911de commented Oct 17, 2024

quaff commented Oct 18, 2024

wacker commented Oct 18, 2024 •

edited

Loading

schauder commented Oct 21, 2024

wacker commented Oct 21, 2024 •

edited

Loading

findLastChangeRevision() returns the wrong result in a multi-node env #3643

findLastChangeRevision() returns the wrong result in a multi-node env #3643

Comments

wacker commented Oct 16, 2024 • edited Loading

mp911de commented Oct 17, 2024

quaff commented Oct 18, 2024

wacker commented Oct 18, 2024 • edited Loading

schauder commented Oct 21, 2024

wacker commented Oct 21, 2024 • edited Loading

wacker commented Oct 16, 2024 •

edited

Loading

wacker commented Oct 18, 2024 •

edited

Loading

wacker commented Oct 21, 2024 •

edited

Loading