Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssl recv timeout on mysql/percona 5.7 connection with TLS enabled #249

Open
kalymero opened this issue Mar 7, 2019 · 3 comments
Open

ssl recv timeout on mysql/percona 5.7 connection with TLS enabled #249

kalymero opened this issue Mar 7, 2019 · 3 comments

Comments

@kalymero
Copy link

kalymero commented Mar 7, 2019

Environment

  • Elixir 1.8.1
  • Ecto 2/3 (all the versions)
  • Mariaex latest (but all versions affected)
  • OTP >= 21.2.0 (older OTP versions work fine)

Description

Starting from OTP-21.2.0, when trying to connect to a mysql instance with TLS enabled from an Elixir/Ecto application the :ssl.recv() call fails due a timeout with the following stacktrace:

** (EXIT) an exception was raised:
--
** (Mariaex.Error) [ssl] `recv` failed with: :timeout
(ecto) lib/ecto/adapters/sql.ex:431: Ecto.Adapters.SQL.execute_and_cache/7
(ecto) lib/ecto/repo/queryable.ex:133: Ecto.Repo.Queryable.execute/5
(ecto) lib/ecto/repo/queryable.ex:37: Ecto.Repo.Queryable.all/4
(custom_lib) lib/data/cache.ex:nnn: Cache.seed_table/3
(elixir) lib/enum.ex:769: Enum."-each/2-lists^foreach/1-0-"/2
(elixir) lib/enum.ex:769: Enum.each/2
(custom_lib) lib/data/cache.ex:nn: Cache.init/1
(stdlib) gen_server.erl:374: :gen_server.init_it/2
(stdlib) gen_server.erl:342: :gen_server.init_it/6
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
** (Mix) Could not start application app: Application.start(:normal, []) returned an error: shutdown: failed to start child:

For the test we used Percona and Mysql 5.7 official docker images.

Please note that it works fine with older OTP versions (for instance OTP-21.1.4)

Also the Erlang layer looks good, it's possible to connect and query the db in pure Erlang

1> {ok, Pid} = mysql:start_link( [ {host, "localhost"}, {user, "root"}, {password, ""}, {database, "test_db"},  {ssl, [ {version, 'tlsv1.2'}, {verify, verify_none} ] } ] ).
{ok,<0.143.0>}
2> mysql:query(Pid, <<"SHOW DATABASES;">>, []).
{ok,[<<"Database">>],
[[<<"information_schema">>],
[<<"mysql">>],
[<<"performance_schema">>],
[<<"sys">>],
[<<"test_db">>]]}
3> mysql:query(Pid, <<"SELECT * FROM test_table">>, []).
{ok,[<<"id">>,<<"name">>],[[1,<<"it works!">>]]}

An educated guess is that in the new OTP versions some lib changed behaviour and it breaks the Mariaex flow

How to reproduce

  1. Start a Percona docker container with SSL enabled:

$ docker run --rm -d -p 3306:3306/tcp -e MYSQL_ALLOW_EMPTY_PASSWORD=yes percona:5.7 --require-secure-transport=ON

  1. Create an empty db

$ mysql -h 127.0.0.1 -u root -p -e "CREATE DATABASE test_db;"

  1. Try to connect to the 127.0.0.1:3306 socket with tlsv1.2 to the test_db database.

(you can also use mysql:5.7 image and tlsv1.1)

The easiest way is to create a new ecto project

$ mix phx.new . --app test_ecto --database mysql --no-webpack --no-html

configure Ecto

config :test_ecto, TestEcto.Repo,
  ssl: true,
  ssl_opts: [versions: [:"tlsv1.2"], ciphers: :ssl.cipher_suites(:all)],
  username: "root",
  password: "",
  database: "test_db",
  hostname: "localhost",
  pool_size: 10

run it

$ mix deps.get && iex -S mix

Erlang/OTP 21 [erts-10.2.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]
--
 
Interactive Elixir (1.8.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> TestEcto.Repo.query!("select * from test;")
[error] Mariaex.Protocol (#PID<0.277.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.281.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.283.0> timed out because it queued and checked out the connection for longer than 15000ms
[error] Mariaex.Protocol (#PID<0.273.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.276.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.282.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.275.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.280.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.278.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.274.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
[error] Mariaex.Protocol (#PID<0.279.0>) disconnected: ** (DBConnection.ConnectionError) ssl recv: timeout
@hauntedhost
Copy link

hauntedhost commented Mar 12, 2019

I'm also experiencing this same issue. Downgrading to Erlang 21.1.4 worked.

@burnison
Copy link

After running a git bisect on OTP, it seems that this is related to a change in the SSL implementation:

6168cf2f5f8b5839b1a56ce870658d76faf3c22f is the first bad commit
commit 6168cf2f5f8b5839b1a56ce870658d76faf3c22f
Author: Ingela Anderton Andin <[email protected]>
Date:   Mon Nov 19 13:50:35 2018 +0100

    ssl: Internaly use {active, N}
    
    Make next_record an internal help function to next_event and avoid
    duplicate calls to tls_socket:setopts for setting the active option.

I have been able to restore original behaviour with the following change to mariaex:

diff --git a/lib/mariaex/protocol.ex b/lib/mariaex/protocol.ex
index 7b24c62..ecdfca9 100644
--- a/lib/mariaex/protocol.ex
+++ b/lib/mariaex/protocol.ex
@@ -311,7 +311,7 @@ defmodule Mariaex.Protocol do
   DBConnection callback
   """
   def checkout(%{buffer: :active_once, sock: {sock_mod, sock}} = s) do
-    case setopts(s, [active: :false], :active_once) do
+    case setopts(s, [active: 0], :active_once) do
       :ok                       -> sock_mod.recv_active(sock, 0, "") |> handle_recv_buffer(s)
       {:disconnect, _, _} = dis -> dis
     end

However, I am only passingly familiar with Elixir and even more unfamiliar with Erlang. As such, I do not know what implications this change may have. From what I can make of the code, it seems that this works for similar reasons as described in the line 814 up the upstream change: erlang/otp@6168cf2?diff=unified#diff-f053ce3c6b9ffc18f497e4bcbc9661f2R814.

Can any maintainers validate this would be a safe change? I'd be happy to turn it into a pull request.

@burnison
Copy link

Setting active, 0 seems to have worked by coincidence, and I have proposed a change in #253 that seems to "more-correctly" address the actual underlying issue. However, I am not sure that the proposed change retains the exact same semantics as the original code and a thorough review should be done before using or merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants