Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing to elementId from Id causes a more than 6x increase in query traffic #2957

Closed
gliwka opened this issue Oct 14, 2024 · 5 comments
Closed
Assignees
Labels
status: waiting-for-feedback We need additional information before we can continue

Comments

@gliwka
Copy link

gliwka commented Oct 14, 2024

We've updated our Spring Data Neo4j version from 7.0.0 to the latest version 7.3.4. After the update we've noticed network issues, connection read timeouts and a CPU-saturated AuraDB Enterprise instance on regular repository.findById() query on an entity with many nested relationships.

Further investigation showed, that we suddenly had more than 6x outgoing traffic to our Neo4j instance. A few megabytes per second suddenly turned into 60-80MB/s. That's outgoing traffic (queries) - not the data we fetch, which is much lower.

Enabling logging on the driver level revealed, that the sent queries are indeed multiples larger. The issue seems to be with the queries SDN generates once it wants to fetch a graph with a structure possibly containing cycles, thus SDN falling back on the cascading / N+1 query generation for each level as documented here.

In our case those queries often contain thousands of IDs. Those IDs before (7.0.0) were 64-Bit Longs, occupying only ~8 Bytes - i.e. "2023663". With the introduction of the elementId (i.e. "4:2da1ca6d-a09f-41be-8dff-230d14780c2b:2023663"), which is a string with ~45 bytes on average, the queries are significantly larger since the hundreds of ids are now the large part of the payload. (no. of bytes is just a naive approximation, the bolt protocol level might handle those differently, but the same principle applies).

The bad thing is that all ids for all nodes in our database start with the prefix "4:2da1ca6d-a09f-41be-8dff-230d14780c2b" - so this information is highly redundant in our query.

We could furthermore verify this by changing the Dialect from "Dialect.NEO4J_5" to "Dialect.NEO4J_4" to trigger the logic here switching back to the Id function:

public static Function<Dialect, Function<Named, FunctionInvocation>> elementIdOrIdFunction = dialect -> {
if (dialect == Dialect.NEO4J_5) {
return SpringDataCypherDsl::elementId;
} else if (dialect == Dialect.NEO4J_4) {
return SpringDataCypherDsl::id;

This has the desired effect, our bandwidth usage is now more than 6x lower and the database and network connection is happy again.

Here's one of the more problematic queries:
https://gist.github.com/gliwka/96415da55e69429f9c4a5033c54ab124

I understand, that id() is deprecated, however the elementId() in combination with the N+1 query model for potentially circular data models (ours certainly is an DAG, but that's another issue - #2840) and the resulting transfer of hundreds or thousands of elementsIds, that are now multiple times larger, is a bad combination, that does not bring any additional value to us.

For the meantime we've downgraded to the Neo4j 4 dialect and are investigating the use of projections to fetch the data in a different way to be able to upgrade back to Neo4j v5, however we're thankful for any other pointers.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 14, 2024
@michael-simons
Copy link
Collaborator

Hello. We will answer here in more detail soon, but right now just a couple of things

  • I hear you in terms of traffic and overhead. It’s terrible, but I can’t change it
  • id() is deprecated and as such you bring yourself into a terrible situation using the old dialect. We from SDN can’t and won’t give any guarantees how long it will be available in the database still. Neo4j next will soon be out
  • Another solution is using externally generated UUIDs, SDN provides a generator for that. They can be used as String or UUID on the model. I would go with them. It’s generally the way better idea than using the database physical id

Please be aware that your log will be swamped now with depreciation warnings. You might wanna adapt the level for those notifications https://docs.spring.io/spring-data/neo4j/reference/appendix/logging.html

@gliwka
Copy link
Author

gliwka commented Oct 15, 2024

Looking forward to it, let me know if any more details would be helpful 😄

We're already using external generated UUIDs on our entities as identifiers.

The id/elementId discussion is just part of the generated query by SDN for those cascading queries whenever we try to load the entity using the finder methods. They're not part of our business logic or entities - there we use exclusively UUIDs. However using our UUIDs for the same purposes would pose the same challenge.

Staying on the older dialect is not a long term solution, just one that allows us to update our Spring Boot Version and also get the fix for cascading updates=false on the Relationships to solve another problem we have with SDN + Optimistic Locking (currently we have a lot of projections in our codebase for the write path to stop cascading updates).

We're trying to migrate to projections for the read path and/or custom queries get those generated queries under control, if no other solution should become available to switch the dialect back to 5 as soon as possible, however we would appreciate another solution.

@meistermeier
Copy link
Collaborator

The cascading queries are using the internal id reference (id or elementId, depending on the selected dialect) to fetch all defined and connected nodes. This has been proven to be logical-wise the best solution to move through the graph.
If we would start to respect each defined id type, the query creation would get more complex than it is already, and foremost the collecting query at the end would explode because we would fetch all related nodes with a set of ids of their types.
You said, that you are using UUIDs. With such a solution your queries (more concrete the parameters are the ones we are talking about) would also become much larger. Yes, it's true that the elementId itself adds a little bit more because it also includes the actual (old) id (at the moment!).
I can only second Michael for the identifier representation here:
id() was deprecated and elementId() is its replacement in Neo4j 5. The CypherDSL with its dialect respects this. And also in SDN we need to ensure that users have a path forward and will have a working library with future versions of Neo4j that might or might not contain the deprecated id() function. In this terms we are just the messengers of the changed Cypher behaviour.
This does not mean that I don't agree with you that the implicit effect of having more traffic isn't something to consider, it is just that we cannot do anything to work around this limitation if we want to play nice with the database in terms of query language usage and our users in terms of giving them an option to not be limited to deprecated functions (and the flood of warning coming from the database).
I am sorry, that I don't have a solution for you. It's even more like that I don't even have an idea how to mitigate this in the longer run within SDN. Of course, defining non-cyclic domain models e.g. by using projections, as you have already mentioned might help because you won't get into this cascading queries.

@meistermeier meistermeier self-assigned this Oct 24, 2024
@meistermeier meistermeier added status: waiting-for-feedback We need additional information before we can continue and removed status: waiting-for-triage An issue we've not yet triaged labels Oct 24, 2024
@gliwka
Copy link
Author

gliwka commented Oct 25, 2024

@meistermeier Thanks for looking into this. I understand SDN has to work within the Neo4j ecosystem constraints. We will be looking if we want to mitigate this with projections and the impact of that on our architecture.

It might be valuable to document this significant performance impact for others who might encounter similar issues with wide graph traversals.

Perhaps there's room for protocol-level / database-level optimizations in the future to handle these cases more efficiently? Where could that be addressed?

@gliwka gliwka closed this as not planned Won't fix, can't repro, duplicate, stale Oct 25, 2024
@michael-simons
Copy link
Collaborator

We'll forward this to Core Database PM's @gliwka !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-feedback We need additional information before we can continue
Projects
None yet
Development

No branches or pull requests

4 participants