Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache issues #148

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

greg-rychlewski
Copy link
Member

@greg-rychlewski greg-rychlewski commented Jan 24, 2022

If you guys still want to keep the cache changes from the 2 reverted commits, this branch could be tested by the affected people. Though if you'd rather just get rid of them I completely understand. The first 2 commits here are the revert of the reverts and the last is the fix.

Since the test suite passes and I can't replicate the issues locally, I really feel like it's a multi-node setup issue. To summarize from #147:

In multi-node setups prepared statements with the same name can be saved with different refs. If it was prepared on one node then executed on another it will be re-prepared and the next time it's executed on the original node, the refs won't match and it will be re-prepared again.

Before the cache change commit, executions would check for the name and not the ref. So each node would only be able to re-prepare an existing statement once.

Though there are still some complications with multi-node set-ups for re-preparing statements with the same name but a different query. But that happens with or without the reverted commits.

Edit: actually thinking more about the multi-node setup. I think it's important for cache names too be able to be re-prepared. Say you have 2 nodes both with the same prepared query cached on them. Then this sequence happens:

1. Close the prepared statement on node 1
2. Re-prepare a new statement with the same name on node 2. It's still in the cache so the old one will be picked up and returned.

So always closing the old statements on re-prepare circumvents that issue.

^ don't believe this is correct anymore. i gave a detailed comment below of what i believe the issue is, with tests

@josevalim
Copy link
Member

Yeah, I think we need to wait for more general feedback before we can merge it, for sure!

@greg-rychlewski
Copy link
Member Author

greg-rychlewski commented Jan 25, 2022

I looked into this more and did a bunch of testing and feel like I know what's going on. It looks like the MyXQL cache and the Ecto cache are becoming out of sync. I'll give you my spiel in 2 parts:

  • my explanation of what's going on
  • the tests I ran to back it up

Based on what I'm going to say below, I believe this PR solves the issue. But of course you guys should do whatever you feel is best.

The main culprits are:

  1. 0.6.0 started comparing ref for cache lookups in the execute callback.
  2. 0.6.0 is not closing the previous query on a cache miss in the execute callback
  3. The MyXQL adapter doesn't update the Ecto cache if the ref changes (https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/myxql/connection.ex#L37.).

Here is an example sequence that might happen when the same query is called concurrently before it has been cached in Ecto:

  1. Query A enters MyXQL.prepare_execute and sets the MyXQL cache with refA
  2. Query B enters MyXQL.prepare_execute and sets the MyXQL cache with refB
  3. Query A exits MyXQL.prepare_execute and sets the Ecto cache with refA
  4. Query C enters MyXQL.execute with refA (picked up from the Ecto cache)
  5. There is a cache conflict between Ecto and MyXQL and the query is re-prepared. The MyXQL cache is set with refC.
  6. Query B exits MyXQL.prepare_execute and sets the Ecto cache with refB
  7. Query C exits MyXQL.execute and doesn't set the Ecto cache with the new reference.
  8. Ecto cache is stuck with refB and the MyXQL cache can never have that reference again. So new executions are re-prepared indefinitely without closing any of the old ones.

I ran the test below on 0.6.0 and printed out the Ecto/MyXQL refs when execute is ran. You can see the behaviour I described above in the results.

test "why" do
    one = TestRepo.insert!(%Post{title: "one"})
    two = TestRepo.insert!(%Post{title: "two"})

    IO.inspect TestRepo.query("show global status like 'Prepared_stmt_count'")

    t = Task.async_stream(1..20, fn _ -> TestRepo.all(Post) end)
    Enum.each(t ,fn _ -> nil end)

    IO.inspect TestRepo.query("show global status like 'Prepared_stmt_count'")
  end
Beginning prepared statement count: 4
"--------------------------"
Execute 1:
"ecto: #Reference<0.4038475985.365953025.207354>"
"myxql: #Reference<0.4038475985.365953030.204170>"
"--------------------------"
Execute 2:
"ecto: #Reference<0.4038475985.365953029.204483>"
"myxql: #Reference<0.4038475985.365953030.204171>"
"--------------------------"
Execute 3:
"ecto: #Reference<0.4038475985.365953030.204156>"
"myxql: #Reference<0.4038475985.365953030.204181>"
"--------------------------"
Execute 4:
"ecto: #Reference<0.4038475985.365953031.207247>"
"myxql: #Reference<0.4038475985.365953030.204192>"
"--------------------------"
Execute 5:
"ecto: #Reference<0.4038475985.365953030.204159>"
"myxql: #Reference<0.4038475985.365953030.204198>"
"--------------------------"
Execute 6:
"ecto: #Reference<0.4038475985.365953030.204163>"
"myxql: #Reference<0.4038475985.365953030.204204>"
"--------------------------"
Execute 7:
"ecto: #Reference<0.4038475985.365953030.204168>"
"myxql: #Reference<0.4038475985.365953029.204520>"
"--------------------------"
Execute 8:
"ecto: #Reference<0.4038475985.365953030.204170>" <---- all prepare executes done, Echo cache stabilizes
"myxql: #Reference<0.4038475985.365953030.204217>"
"--------------------------"
Execute 9:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953030.204224>"
"--------------------------"
Execute 10:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953029.204523>"
"--------------------------"
Execute 11:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953029.204526>"
"--------------------------"
Execute 12:
"ecto: #Reference<0.4038475985.365953030.204170>"
"myxql: #Reference<0.4038475985.365953030.204238>"
"--------------------------"
Ending prepared statement count: 17

If I run the same test on the changes in this PR, the execute cache always hits because it's comparing by name and only 1 prepared statement is added.

Postgrex compares by ref and gets around it by updating the Ecto cache when the ref changes (https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L101). But when I made MyXQL update the Ecto cache and ran the test, it seems like if the requests come too fast without a lull you can be perpetually out of sync. This can happen if one of the processes doesn't get to update the MyXQL cache and the Ecto cache before another execution begins:

"ecto: #Reference<0.2753011.4138467329.182193>"
"myxql: #Reference<0.2753011.4138467336.176849>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176838>"
"myxql: #Reference<0.2753011.4138467329.182223>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467335.176871>"
"myxql: #Reference<0.2753011.4138467329.182233>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176842>"
"myxql: #Reference<0.2753011.4138467329.182243>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182210>"
"myxql: #Reference<0.2753011.4138467329.182249>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176845>"
"myxql: #Reference<0.2753011.4138467333.179038>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467333.179028>"
"myxql: #Reference<0.2753011.4138467329.182260>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467336.176849>"
"myxql: #Reference<0.2753011.4138467329.182262>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182223>"
"myxql: #Reference<0.2753011.4138467333.179044>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182233>"
"myxql: #Reference<0.2753011.4138467329.182269>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182243>"
"myxql: #Reference<0.2753011.4138467329.182275>"
"--------------------------"
"ecto: #Reference<0.2753011.4138467329.182249>"
"myxql: #Reference<0.2753011.4138467329.182281>"

If I run this modified test where there is a lull before the last query I get a cache hit:

test "why" do
    one = TestRepo.insert!(%Post{title: "one"})
    two = TestRepo.insert!(%Post{title: "two"})

    t = Task.async_stream(1..20, fn x -> TestRepo.all(Post) end)


    Enum.each(t ,fn _ -> nil end)

    # Put this Repo.one query so that the Repo.all query does not hit the "last query" cache
    TestRepo.one(from p in Post, limit: 1)
    TestRepo.all(Post)
  end

"ecto: #Reference<0.3998318333.654311427.208239>"
"myxql: #Reference<0.3998318333.654311430.207635>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207599>"
"myxql: #Reference<0.3998318333.654311427.208259>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207607>"
"myxql: #Reference<0.3998318333.654311427.208265>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311428.207665>"
"myxql: #Reference<0.3998318333.654311427.208273>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207613>"
"myxql: #Reference<0.3998318333.654311427.208283>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207622>"
"myxql: #Reference<0.3998318333.654311427.208288>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208249>"
"myxql: #Reference<0.3998318333.654311427.208294>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311430.207635>"
"myxql: #Reference<0.3998318333.654311430.207650>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208259>"
"myxql: #Reference<0.3998318333.654311427.208301>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208265>"
"myxql: #Reference<0.3998318333.654311427.208307>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208273>"
"myxql: #Reference<0.3998318333.654311427.208313>"
"--------------------------"
"ecto: #Reference<0.3998318333.654311427.208283>"
"myxql: #Reference<0.3998318333.654311428.207682>"
"--------------------------"
"cache hit"

If you made it through all this, thank you for listening :)

@josevalim
Copy link
Member

Thank you @greg-rychlewski, i think the analysis is on point. I am not sure what is the solution though. We can probably keep one of the two commits, but I will ship 0.6.1 first without them to have a more stable base.

@sigurdsa
Copy link

sigurdsa commented Dec 12, 2022

We had a problem "ER_MAX_PREPARED_STMT_COUNT_REACHED" with code like this:

query = MyXQL.prepare!(conn, "", ..)
result = MyXQL.execute(conn, query ..)

but when I changed the code to:

result = MyQXL.prepare_execute(conn, "", ...)

The problem went away..

This was on MySQL 5.6.24-72.2-log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants