Description
We've updated our Spring Data Neo4j version from 7.0.0 to the latest version 7.3.4. After the update we've noticed network issues, connection read timeouts and a CPU-saturated AuraDB Enterprise instance on regular repository.findById() query on an entity with many nested relationships.
Further investigation showed, that we suddenly had more than 6x outgoing traffic to our Neo4j instance. A few megabytes per second suddenly turned into 60-80MB/s. That's outgoing traffic (queries) - not the data we fetch, which is much lower.
Enabling logging on the driver level revealed, that the sent queries are indeed multiples larger. The issue seems to be with the queries SDN generates once it wants to fetch a graph with a structure possibly containing cycles, thus SDN falling back on the cascading / N+1 query generation for each level as documented here.
In our case those queries often contain thousands of IDs. Those IDs before (7.0.0) were 64-Bit Longs, occupying only ~8 Bytes - i.e. "2023663". With the introduction of the elementId (i.e. "4:2da1ca6d-a09f-41be-8dff-230d14780c2b:2023663"), which is a string with ~45 bytes on average, the queries are significantly larger since the hundreds of ids are now the large part of the payload. (no. of bytes is just a naive approximation, the bolt protocol level might handle those differently, but the same principle applies).
The bad thing is that all ids for all nodes in our database start with the prefix "4:2da1ca6d-a09f-41be-8dff-230d14780c2b" - so this information is highly redundant in our query.
We could furthermore verify this by changing the Dialect from "Dialect.NEO4J_5" to "Dialect.NEO4J_4" to trigger the logic here switching back to the Id function:
This has the desired effect, our bandwidth usage is now more than 6x lower and the database and network connection is happy again.
Here's one of the more problematic queries:
https://gist.github.com/gliwka/96415da55e69429f9c4a5033c54ab124
I understand, that id() is deprecated, however the elementId() in combination with the N+1 query model for potentially circular data models (ours certainly is an DAG, but that's another issue - #2840) and the resulting transfer of hundreds or thousands of elementsIds, that are now multiple times larger, is a bad combination, that does not bring any additional value to us.
For the meantime we've downgraded to the Neo4j 4 dialect and are investigating the use of projections to fetch the data in a different way to be able to upgrade back to Neo4j v5, however we're thankful for any other pointers.