Cache issues #148

greg-rychlewski · 2022-01-24T17:20:07Z

If you guys still want to keep the cache changes from the 2 reverted commits, this branch could be tested by the affected people. Though if you'd rather just get rid of them I completely understand. The first 2 commits here are the revert of the reverts and the last is the fix.

~~Since the test suite passes and I can't replicate the issues locally, I really feel like it's a multi-node setup issue. To summarize from #147:~~

In multi-node setups prepared statements with the same name can be saved with different refs. If it was prepared on one node then executed on another it will be re-prepared and the next time it's executed on the original node, the refs won't match and it will be re-prepared again.

~~Before the cache change commit, executions would check for the name and not the ref. So each node would only be able to re-prepare an existing statement once.~~

~~Though there are still some complications with multi-node set-ups for re-preparing statements with the same name but a different query. But that happens with or without the reverted commits.~~

Edit: actually thinking more about the multi-node setup. I think it's important for cache names too be able to be re-prepared. Say you have 2 nodes both with the same prepared query cached on them. Then this sequence happens:

1. Close the prepared statement on node 1
2. Re-prepare a new statement with the same name on node 2. It's still in the cache so the old one will be picked up and returned.

~~So always closing the old statements on re-prepare circumvents that issue.~~

^ don't believe this is correct anymore. i gave a detailed comment below of what i believe the issue is, with tests

…o#144)"" This reverts commit 989111e.

…xir-ecto#146)"" This reverts commit 919eabe.

josevalim · 2022-01-24T17:44:59Z

Yeah, I think we need to wait for more general feedback before we can merge it, for sure!

greg-rychlewski · 2022-01-25T04:14:20Z

I looked into this more and did a bunch of testing and feel like I know what's going on. It looks like the MyXQL cache and the Ecto cache are becoming out of sync. I'll give you my spiel in 2 parts:

my explanation of what's going on
the tests I ran to back it up

Based on what I'm going to say below, I believe this PR solves the issue. But of course you guys should do whatever you feel is best.

The main culprits are:

0.6.0 started comparing ref for cache lookups in the execute callback.
0.6.0 is not closing the previous query on a cache miss in the execute callback
The MyXQL adapter doesn't update the Ecto cache if the ref changes (https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/myxql/connection.ex#L37.).

Here is an example sequence that might happen when the same query is called concurrently before it has been cached in Ecto:

Query A enters MyXQL.prepare_execute and sets the MyXQL cache with refA
Query B enters MyXQL.prepare_execute and sets the MyXQL cache with refB
Query A exits MyXQL.prepare_execute and sets the Ecto cache with refA
Query C enters MyXQL.execute with refA (picked up from the Ecto cache)
There is a cache conflict between Ecto and MyXQL and the query is re-prepared. The MyXQL cache is set with refC.
Query B exits MyXQL.prepare_execute and sets the Ecto cache with refB
Query C exits MyXQL.execute and doesn't set the Ecto cache with the new reference.
Ecto cache is stuck with refB and the MyXQL cache can never have that reference again. So new executions are re-prepared indefinitely without closing any of the old ones.

I ran the test below on 0.6.0 and printed out the Ecto/MyXQL refs when execute is ran. You can see the behaviour I described above in the results.

test "why" do
    one = TestRepo.insert!(%Post{title: "one"})
    two = TestRepo.insert!(%Post{title: "two"})

    IO.inspect TestRepo.query("show global status like 'Prepared_stmt_count'")

    t = Task.async_stream(1..20, fn _ -> TestRepo.all(Post) end)
    Enum.each(t ,fn _ -> nil end)

    IO.inspect TestRepo.query("show global status like 'Prepared_stmt_count'")
  end

Beginning prepared statement count: 4
"--------------------------"
Execute 1:
"ecto: #Reference<0.4038475985.365953025.207354>"
"myxql: #Reference<0.4038475985.365953030.204170>"
"--------------------------"
Execute 2:
"ecto: #Reference<0.4038475985.365953029.204483>"
"myxql: #Reference<0.4038475985.365953030.204171>"
"--------------------------"
Execute 3:
"ecto: #Reference<0.4038475985.365953030.204156>"
"myxql: #Reference<0.4038475985.365953030.204181>"
"--------------------------"
Execute 4:
"ecto: #Reference<0.4038475985.365953031.207247>"
"myxql: #Reference<0.4038475985.365953030.204192>"
"--------------------------"
Execute 5:
"ecto: #Reference<0.4038475985.365953030.204159>"
"myxql: #Reference<0.4038475985.365953030.204198>"
"--------------------------"
Execute 6:
"ecto: #Reference<0.4038475985.365953030.204163>"
"myxql: #Reference<0.4038475985.365953030.204204>"
"--------------------------"
Execute 7:
"ecto: #Reference<0.4038475985.365953030.204168>"
"myxql: #Reference<0.4038475985.365953029.204520>"
"--------------------------"
Execute 8:
"ecto: #Reference<0.4038475985.365953030.204170>" <---- all prepare executes done, Echo cache stabilizes
"myxql: #Reference<0.4038475985.365953030.204217>"
"--------------------------"
Execute 9:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953030.204224>"
"--------------------------"
Execute 10:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953029.204523>"
"--------------------------"
Execute 11:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953029.204526>"
"--------------------------"
Execute 12:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953030.204238>"
"--------------------------"
Ending prepared statement count: 17

If I run the same test on the changes in this PR, the execute cache always hits because it's comparing by name and only 1 prepared statement is added.

Postgrex compares by ref and gets around it by updating the Ecto cache when the ref changes (https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L101). But when I made MyXQL update the Ecto cache and ran the test, it seems like if the requests come too fast without a lull you can be perpetually out of sync. This can happen if one of the processes doesn't get to update the MyXQL cache and the Ecto cache before another execution begins:

"ecto: #Reference<0.2753011.4138467329.182193>"
"myxql: #Reference<0.2753011.4138467336.176849>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176838>"
"myxql: #Reference<0.2753011.4138467329.182223>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467335.176871>"
"myxql: #Reference<0.2753011.4138467329.182233>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176842>"
"myxql: #Reference<0.2753011.4138467329.182243>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182210>"
"myxql: #Reference<0.2753011.4138467329.182249>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176845>"
"myxql: #Reference<0.2753011.4138467333.179038>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467333.179028>"
"myxql: #Reference<0.2753011.4138467329.182260>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176849>"
"myxql: #Reference<0.2753011.4138467329.182262>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182223>"
"myxql: #Reference<0.2753011.4138467333.179044>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182233>"
"myxql: #Reference<0.2753011.4138467329.182269>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182243>"
"myxql: #Reference<0.2753011.4138467329.182275>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182249>"
"myxql: #Reference<0.2753011.4138467329.182281>"

If I run this modified test where there is a lull before the last query I get a cache hit:

test "why" do
    one = TestRepo.insert!(%Post{title: "one"})
    two = TestRepo.insert!(%Post{title: "two"})

    t = Task.async_stream(1..20, fn x -> TestRepo.all(Post) end)


    Enum.each(t ,fn _ -> nil end)

    # Put this Repo.one query so that the Repo.all query does not hit the "last query" cache
    TestRepo.one(from p in Post, limit: 1)
    TestRepo.all(Post)
  end


"ecto: #Reference<0.3998318333.654311427.208239>"
"myxql: #Reference<0.3998318333.654311430.207635>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207599>"
"myxql: #Reference<0.3998318333.654311427.208259>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207607>"
"myxql: #Reference<0.3998318333.654311427.208265>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311428.207665>"
"myxql: #Reference<0.3998318333.654311427.208273>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207613>"
"myxql: #Reference<0.3998318333.654311427.208283>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207622>"
"myxql: #Reference<0.3998318333.654311427.208288>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208249>"
"myxql: #Reference<0.3998318333.654311427.208294>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207635>"
"myxql: #Reference<0.3998318333.654311430.207650>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208259>"
"myxql: #Reference<0.3998318333.654311427.208301>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208265>"
"myxql: #Reference<0.3998318333.654311427.208307>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208273>"
"myxql: #Reference<0.3998318333.654311427.208313>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208283>"
"myxql: #Reference<0.3998318333.654311428.207682>"
"--------------------------"
"cache hit"

If you made it through all this, thank you for listening :)

josevalim · 2022-01-25T09:28:00Z

Thank you @greg-rychlewski, i think the analysis is on point. I am not sure what is the solution though. We can probably keep one of the two commits, but I will ship 0.6.1 first without them to have a more stable base.

sigurdsa · 2022-12-12T14:19:04Z

We had a problem "ER_MAX_PREPARED_STMT_COUNT_REACHED" with code like this:

query = MyXQL.prepare!(conn, "", ..)
result = MyXQL.execute(conn, query ..)

but when I changed the code to:

result = MyQXL.prepare_execute(conn, "", ...)

The problem went away..

This was on MySQL 5.6.24-72.2-log

Greg Rychlewski added 3 commits January 24, 2022 11:18

Revert "Revert "Allow a given cache name to be reprepared (elixir-ect…

378c762

…o#144)"" This reverts commit 989111e.

Revert "Revert "Reuse prepared statements in prepare: :unnamed (eli…

46bdf09

…xir-ecto#146)"" This reverts commit 919eabe.

fix cache for multi-node setup

de038d5

leak test

372a1ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache issues #148

Cache issues #148

Uh oh!

greg-rychlewski commented Jan 24, 2022 •

edited

Loading

Uh oh!

josevalim commented Jan 24, 2022

Uh oh!

greg-rychlewski commented Jan 25, 2022 •

edited

Loading

Uh oh!

josevalim commented Jan 25, 2022

Uh oh!

sigurdsa commented Dec 12, 2022 •

edited

Loading

Uh oh!

Uh oh!

Cache issues #148

Are you sure you want to change the base?

Cache issues #148

Uh oh!

Conversation

greg-rychlewski commented Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Jan 24, 2022

Uh oh!

greg-rychlewski commented Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Jan 25, 2022

Uh oh!

sigurdsa commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

greg-rychlewski commented Jan 24, 2022 •

edited

Loading

greg-rychlewski commented Jan 25, 2022 •

edited

Loading

sigurdsa commented Dec 12, 2022 •

edited

Loading