Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues running in an environment without OS level certificates. #141

Open
thiagogsr opened this issue Nov 18, 2024 · 22 comments
Open

Issues running in an environment without OS level certificates. #141

thiagogsr opened this issue Nov 18, 2024 · 22 comments

Comments

@thiagogsr
Copy link

thiagogsr commented Nov 18, 2024

** For future users that encounter issues **

In gun version 2.1.0 an attempt is made to load certificates from the OS. Even if you disable verification this will be done.

This appears to trigger something similar to: erlang/otp#7303

To work around this issue there are a few couple.

  1. Add certificates to the OS.
  2. Specify the cacerts option. (for instance your own list of certificates, an empty list, or using certifi.)

Describe the bug
The library cannot connect to LaunchDarkly on 3.5.0.

To reproduce
Install LaunchDarkly 3.5.0

Expected behavior
It should connect.

Logs

:gen_statem #PID<0.6585.0> terminating
** (MatchError) no match of right hand side value: :undefined
    (public_key 1.15.1.1) pubkey_os_cacerts.erl:39: :pubkey_os_cacerts.get/0
    (gun 2.1.0) /build/deps/gun/src/gun.erl:1129: :gun.ensure_tls_opts/3
    (gun 2.1.0) /build/deps/gun/src/gun.erl:1103: :gun.initial_tls_handshake/3
    (stdlib 5.2.3) gen_statem.erl:1395: :gen_statem.loop_state_callback/11
    (stdlib 5.2.3) proc_lib.erl:241: :proc_lib.init_p_do_apply/3
Queue: [internal: {:retries, 0, #Port<0.24>}]
Postponed: []
State: :initial_tls_handshake
Data: {:state, #PID<0.6584.0>, {:up, #Reference<0.4255606911.3075735553.131477>}, ~c"stream.launchdarkly.com", 443, "https", ~c"stream.launchdarkly.com", 443, [], %{protocols: [:http], retry: 0, transport: :tls, connect_timeout: 2000, retry_timeout: 1, tcp_opts: [], tls_opts: [verify: :verify_none]}, :undefined, :undefined, :gun_tls, true, {:ssl, :ssl_closed, :ssl_error}, :undefined, :undefined, :undefined, :gun_default_event_h, :undefined}
Callback mode: :state_functions, state_enter: false

SDK version
3.5.0

Language version, developer tools
ERLANG_VERSION=26.2.5.1
ELIXIR_VERSION=1.17.1

OS/platform
DEBIAN_VERSION=bookworm-20240701

Additional context
It fails with both:

:ldclient.start_instance(api_key, instance_name)

and

:ldclient.start_instance(api_key, instance_name, %{http_options: %{tls_options: [{:verify, :verify_none}]}})

Dependencies versions

:certifi, "2.13.0"
:eredis, "1.7.1"
:jsx, "3.1.0"
:lru, "2.4.0"
:shotgun, "1.1.0"
:uuid_erl, "2.0.7"
:verl, "1.0.1"
:yamerl, "0.10.0"

UPDATE:
I've tried with:

:ldclient.start_instance(api_key, instance_name, %{http_options: %{tls_options: :ldclient_config.tls_basic_options()}})

and it connected, however, the variation function does not work due to:

** (exit) exited in: :gen_server.call(instance_name, {:add_event, %{data: %{default: "default", value: "default", version: :null, debug: false, key: "key-name", variation: :null, prereq_of: :null, trackEvents: :null, debugEventsUntilDate: :null, include_reason: false, eval_reason: {:error, :exception}}, timestamp: 1731943495975, type: :feature_request, context: %{key: "my-key", kind: "my-kind"}}, :my-app, %{}})
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (stdlib 5.2.3) gen_server.erl:404: :gen_server.call/2
    (stdlib 5.2.3) lists.erl:1686: :lists.foreach_1/2
    (ldclient 3.5.0) /build/deps/ldclient/src/ldclient.erl:147: :ldclient.variation/4
@kinyoklion
Copy link
Member

Hello @thiagogsr,

Thank you for the report. I will look into the issue. The only update in 3.5.0 is to use shotgun 1.1.0, so if you do not require shothun 1.1.0, you can use the 3.4.0 version in the interim.

Would you mind checking your package manager lock file to determine the version of gun and cowlib?

Thank you,
Ryan

@thiagogsr
Copy link
Author

Hello @kinyoklion, thanks for the quick response.

I'm currently using 3.3.1 with no issues, I noticed the issue when I tried to update it to 3.5.0, so I'm not blocked at the moment. I've tested the 3.4.0 and it's working great as well.

@kinyoklion
Copy link
Member

Hello @thiagogsr,

So far no luck in reproduction.

This is the lockfile with the versions that I get resolved.

I am curious about any differences in "gun" and "cowlib" specifically. The error messages sound like a mismatch somewhere there.

%{
  "certifi": {:hex, :certifi, "2.13.0", "e52be248590050b2dd33b0bb274b56678f9068e67805dca8aa8b1ccdb016bbf6", [:rebar3], [], "hexpm", "8f3d9533a0f06070afdfd5d596b32e21c6580667a492891851b0e2737bc507a1"},
  "cowlib": {:hex, :cowlib, "2.13.0", "db8f7505d8332d98ef50a3ef34b34c1afddec7506e4ee4dd4a3a266285d282ca", [:make, :rebar3], [], "hexpm", "e1e1284dc3fc030a64b1ad0d8382ae7e99da46c3246b815318a4b848873800a4"},
  "eredis": {:hex, :eredis, "1.7.1", "39e31aa02adcd651c657f39aafd4d31a9b2f63c6c700dc9cece98d4bc3c897ab", [:mix, :rebar3], [], "hexpm", "7c2b54c566fed55feef3341ca79b0100a6348fd3f162184b7ed5118d258c3cc1"},
  "gun": {:hex, :gun, "2.1.0", "b4e4cbbf3026d21981c447e9e7ca856766046eff693720ba43114d7f5de36e87", [:make, :rebar3], [{:cowlib, "2.13.0", [hex: :cowlib, repo: "hexpm", optional: false]}], "hexpm", "52fc7fc246bfc3b00e01aea1c2854c70a366348574ab50c57dfe796d24a0101d"},
  "jsx": {:hex, :jsx, "3.1.0", "d12516baa0bb23a59bb35dccaf02a1bd08243fcbb9efe24f2d9d056ccff71268", [:rebar3], [], "hexpm", "0c5cc8fdc11b53cc25cf65ac6705ad39e54ecc56d1c22e4adb8f5a53fb9427f3"},
  "ldclient": {:hex, :launchdarkly_server_sdk, "3.5.0", "8c5653396f63e9875e11818bff7d0a383198b94b7959c71cd0be4bd0d9be20a1", [:rebar3], [{:certifi, "2.12.0", [hex: :certifi, repo: "hexpm", optional: false]}, {:eredis, "1.7.1", [hex: :eredis, repo: "hexpm", optional: false]}, {:jsx, "3.1.0", [hex: :jsx, repo: "hexpm", optional: false]}, {:lru, "2.4.0", [hex: :lru, repo: "hexpm", optional: false]}, {:shotgun, "1.1.0", [hex: :shotgun, repo: "hexpm", optional: false]}, {:uuid, "~> 2.0.2", [hex: :uuid_erl, repo: "hexpm", optional: false]}, {:verl, "1.0.1", [hex: :verl, repo: "hexpm", optional: false]}, {:yamerl, "0.10.0", [hex: :yamerl, repo: "hexpm", optional: false]}], "hexpm", "5dbb8021a2858d21d5fab4a71908aa22a8503c4c638c011158a3e1b2cba87703"},
  "lru": {:hex, :lru, "2.4.0", "a8f9967ca9b6f260baa19e2efb2aeb3853a3f5bd5f8416f537a672294b38c1bc", [:rebar3], [], "hexpm", "4fcf77e882b5e57eca068999acba4386a20dbce8e446c98c6a0f8fb3d170afeb"},
  "quickrand": {:hex, :quickrand, "2.0.7", "d2bd76676a446e6a058d678444b7fda1387b813710d1af6d6e29bb92186c8820", [:rebar3], [], "hexpm", "b8acbf89a224bc217c3070ca8bebc6eb236dbe7f9767993b274084ea044d35f0"},
  "shotgun": {:hex, :shotgun, "1.1.0", "e2dd26d138b91e8fe53555a803f8244f719c25080a57319d2357dc0adb1d83a2", [:rebar3], [{:gun, "2.1.0", [hex: :gun, repo: "hexpm", optional: false]}], "hexpm", "5c005f7f87e967b28894220b488b8a9664e0e151b9e9bc53bf3ff440ced3a715"},
  "uuid": {:hex, :uuid_erl, "2.0.7", "b2078d2cc814f53afa52d36c91e08962c7e7373585c623f4c0ea6dfb04b2af94", [:rebar3], [{:quickrand, ">= 2.0.7", [hex: :quickrand, repo: "hexpm", optional: false]}], "hexpm", "4e4c5ca3461dc47c5e157ed42aa3981a053b7a186792af972a27b14a9489324e"},
  "verl": {:hex, :verl, "1.0.1", "1c29fa29ce071820318d04f42850d23b6e0c8b1d6ca513b1b66312240d0690e1", [:rebar3], [], "hexpm", "f1939291d6b04ed7c20b4cf83cb0ebfc9141f692bb6fb51970429dfd145fec9d"},
  "yamerl": {:hex, :yamerl, "0.10.0", "4ff81fee2f1f6a46f1700c0d880b24d193ddb74bd14ef42cb0bcf46e81ef2f8e", [:rebar3], [], "hexpm", "346adb2963f1051dc837a2364e4acf6eb7d80097c0f53cbdc3046ec8ec4b4e6e"},
}

Thank you,
Ryan

@thiagogsr
Copy link
Author

I was using the same version as you

:gun, "2.1.0"
:cowlib, "2.13.0"

@thiagogsr
Copy link
Author

I've compared all packages versions and everything looks like the same.

%{
  "certifi": {:hex, :certifi, "2.13.0", "e52be248590050b2dd33b0bb274b56678f9068e67805dca8aa8b1ccdb016bbf6", [:rebar3], [], "hexpm", "8f3d9533a0f06070afdfd5d596b32e21c6580667a492891851b0e2737bc507a1"},
  "cowlib": {:hex, :cowlib, "2.13.0", "db8f7505d8332d98ef50a3ef34b34c1afddec7506e4ee4dd4a3a266285d282ca", [:make, :rebar3], [], "hexpm", "e1e1284dc3fc030a64b1ad0d8382ae7e99da46c3246b815318a4b848873800a4"},
  "eredis": {:hex, :eredis, "1.7.1", "39e31aa02adcd651c657f39aafd4d31a9b2f63c6c700dc9cece98d4bc3c897ab", [:mix, :rebar3], [], "hexpm", "7c2b54c566fed55feef3341ca79b0100a6348fd3f162184b7ed5118d258c3cc1"},
  "gun": {:hex, :gun, "2.1.0", "b4e4cbbf3026d21981c447e9e7ca856766046eff693720ba43114d7f5de36e87", [:make, :rebar3], [{:cowlib, "2.13.0", [hex: :cowlib, repo: "hexpm", optional: false]}], "hexpm", "52fc7fc246bfc3b00e01aea1c2854c70a366348574ab50c57dfe796d24a0101d"},
  "jsx": {:hex, :jsx, "3.1.0", "d12516baa0bb23a59bb35dccaf02a1bd08243fcbb9efe24f2d9d056ccff71268", [:rebar3], [], "hexpm", "0c5cc8fdc11b53cc25cf65ac6705ad39e54ecc56d1c22e4adb8f5a53fb9427f3"},
  "ldclient": {:hex, :launchdarkly_server_sdk, "3.5.0", "8c5653396f63e9875e11818bff7d0a383198b94b7959c71cd0be4bd0d9be20a1", [:rebar3], [{:certifi, "2.12.0", [hex: :certifi, repo: "hexpm", optional: false]}, {:eredis, "1.7.1", [hex: :eredis, repo: "hexpm", optional: false]}, {:jsx, "3.1.0", [hex: :jsx, repo: "hexpm", optional: false]}, {:lru, "2.4.0", [hex: :lru, repo: "hexpm", optional: false]}, {:shotgun, "1.1.0", [hex: :shotgun, repo: "hexpm", optional: false]}, {:uuid, "~> 2.0.2", [hex: :uuid_erl, repo: "hexpm", optional: false]}, {:verl, "1.0.1", [hex: :verl, repo: "hexpm", optional: false]}, {:yamerl, "0.10.0", [hex: :yamerl, repo: "hexpm", optional: false]}], "hexpm", "5dbb8021a2858d21d5fab4a71908aa22a8503c4c638c011158a3e1b2cba87703"},
  "lru": {:hex, :lru, "2.4.0", "a8f9967ca9b6f260baa19e2efb2aeb3853a3f5bd5f8416f537a672294b38c1bc", [:rebar3], [], "hexpm", "4fcf77e882b5e57eca068999acba4386a20dbce8e446c98c6a0f8fb3d170afeb"},
  "quickrand": {:hex, :quickrand, "2.0.7", "d2bd76676a446e6a058d678444b7fda1387b813710d1af6d6e29bb92186c8820", [:rebar3], [], "hexpm", "b8acbf89a224bc217c3070ca8bebc6eb236dbe7f9767993b274084ea044d35f0"},
  "shotgun": {:hex, :shotgun, "1.1.0", "e2dd26d138b91e8fe53555a803f8244f719c25080a57319d2357dc0adb1d83a2", [:rebar3], [{:gun, "2.1.0", [hex: :gun, repo: "hexpm", optional: false]}], "hexpm", "5c005f7f87e967b28894220b488b8a9664e0e151b9e9bc53bf3ff440ced3a715"},
  "uuid": {:hex, :uuid_erl, "2.0.7", "b2078d2cc814f53afa52d36c91e08962c7e7373585c623f4c0ea6dfb04b2af94", [:rebar3], [{:quickrand, ">= 2.0.7", [hex: :quickrand, repo: "hexpm", optional: false]}], "hexpm", "4e4c5ca3461dc47c5e157ed42aa3981a053b7a186792af972a27b14a9489324e"},
  "verl": {:hex, :verl, "1.0.1", "1c29fa29ce071820318d04f42850d23b6e0c8b1d6ca513b1b66312240d0690e1", [:rebar3], [], "hexpm", "f1939291d6b04ed7c20b4cf83cb0ebfc9141f692bb6fb51970429dfd145fec9d"},
  "yamerl": {:hex, :yamerl, "0.10.0", "4ff81fee2f1f6a46f1700c0d880b24d193ddb74bd14ef42cb0bcf46e81ef2f8e", [:rebar3], [], "hexpm", "346adb2963f1051dc837a2364e4acf6eb7d80097c0f53cbdc3046ec8ec4b4e6e"},
}

@kinyoklion
Copy link
Member

@thiagogsr Thank you.

That is a little problematic.

My initialization is approximately equivalent as well:

    :ldclient.start_instance(
      String.to_charlist(Application.get_env(:hello_elixir, :sdk_key)),
      :default,
      %{
        :http_options => %{
          :tls_options => :ldclient_config.tls_basic_options()
        }
      }
    )

Are you getting any additional logs before the variation call?

09:58:48.407 [notice] Starting instance supervisor for :default with name :ldclient_instance_default

09:58:48.415 [notice] Starting event supervisor for :default with name :ldclient_instance_events_default

09:58:48.419 [notice] Starting event storage server for :default with name :ldclient_event_server_default

09:58:48.423 [notice] Starting event processor for :default with name :ldclient_event_process_server_default

09:58:48.431 [notice] Starting ets server with name :ldclient_storage_ets_server_default

09:58:48.434 [notice] Starting streaming update server for :default

09:58:48.439 [notice] Starting streaming connection to URL: ~c"https://stream.launchdarkly.com/all"
{:ok, #PID<0.222.0>}

09:58:50.088 [notice] Received event with 172 flags and 6 segments

I would expect a series of logs similar to these.

Thank you,
Ryan

@thiagogsr
Copy link
Author

Ah yes, with the tls_basic_options it initializes, but I can't call the :ldclient.variation/4 function.

@kinyoklion
Copy link
Member

We've had some problems since OTP 25 with tls_options. The default behavior of the OTP has changed a few times, which has proved problematic in keeping compatible defaults for both old and new versions. (So generally I recommend always setting explicit TLS options at this point.)

That said I am still curious about any additional logging that would help to explain why the variation call is failing.

If we start each supervisor, and none of them are logging any failures, then I would expect the process to be there.

One thing strange to me is the instance_name:

** (exit) exited in: :gen_server.call(instance_name, {:add_event, %{data: %{default: "default", value: "default", version: :null, debug: false, key: "key-name", variation: :null, prereq_of: :null, trackEvents: :null, debugEventsUntilDate: :null, include_reason: false, eval_reason: {:error, :exception}}, timestamp: 1731943495975, type: :feature_request, context: %{key: "my-key", kind: "my-kind"}}, :my-app, %{}})
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (stdlib 5.2.3) gen_server.erl:404: :gen_server.call/2
    (stdlib 5.2.3) lists.erl:1686: :lists.foreach_1/2
    (ldclient 3.5.0) /build/deps/ldclient/src/ldclient.erl:147: :ldclient.variation/4

I would expect that log to show the atom for the instance_name. :something.

When I manually change mine to an instance name that doesn't exist I get:

:gen_server.call(:ldclient_event_server_name

Where name is the input atom.

(I am more familiar with Erlang than Elixir, so there may be something I am misunderstanding there.)

Thank you,
Ryan

@thiagogsr
Copy link
Author

@kinyoklion it's indeed an atom, I replaced it before sending it here. I will run more tests on it, but there might be something off as the v3.4.0 works well and just the update to 3.5.0 breaks it. We have been running it without tls verification since we migrated to OTP 26 some months ago.

:ldclient.start_instance(api_key, instance_name, %{http_options: %{tls_options: [{:verify, :verify_none}]}})

@kinyoklion
Copy link
Member

Ok. I do just want to verify then, that in your logs
09:58:48.419 [notice] Starting event storage server for :default with name :ldclient_event_server_default
The equivalent of that line is matching the atom in the variation error message.

In regards to [{:verify, :verify_none}] I will check. I recall there being some gun option changes.

Thank you,
Ryan

@kinyoklion
Copy link
Member

@thiagogsr

Mine works with the following configuration as well:

    :ldclient.start_instance(
      String.to_charlist(Application.get_env(:hello_elixir, :sdk_key)),
      :default,
      %{http_options: %{tls_options: [{:verify, :verify_none}]}}
    )

Is this failure something you are experiencing just on a local development instance from testing the upgrade? I am curious if there are any build remnants interfering with things.

Thank you,
Ryan

@thiagogsr
Copy link
Author

I'm experiencing this error when I deploy it to Kubernetes, the project and the docker image are built just fine. Are you testing on those versions?

Language version, developer tools
ERLANG_VERSION=26.2.5.1
ELIXIR_VERSION=1.17.1

OS/platform
DEBIAN_VERSION=bookworm-20240701

@thiagogsr
Copy link
Author

We build the project on: hexpm/elixir:26.2.5.1-erlang-1.17.1-debian-bookworm-20240701, and after compiled we run it on debian:bookworm-20240701-slim

@kinyoklion
Copy link
Member

I am testing those tool versions, but not that OS. I can try those specific containers as well.

@thiagogsr
Copy link
Author

thiagogsr commented Nov 18, 2024

It seems to be related to the OS. The following command works locally (OSX), but it does not work on my remote container.

:public_key.cacerts_get()
:public_key.cacerts_get()
** (MatchError) no match of right hand side value: :undefined
    (public_key 1.15.1.1) pubkey_os_cacerts.erl:39: :pubkey_os_cacerts.get/0
    iex:1: (file)

It's used by gun on the new version.

ensure_tls_opts(Protocols0, TransOpts0, OriginHost) ->
	%% CA certificates.
	TransOpts1 = case lists:keymember(cacerts, 1, TransOpts0) of
		true ->
			TransOpts0;
		false ->
			case lists:keymember(cacertfile, 1, TransOpts0) of
				true ->
					TransOpts0;
				false ->
					%% This function was added in OTP-25. We use it  when it is
					%% available and keep the previous behavior when it isn't.
					case erlang:function_exported(public_key, cacerts_get, 0) of
						true ->
							[{cacerts, public_key:cacerts_get()}|TransOpts0]; <-- here
						false ->
							TransOpts0
					end
			end
	end,
	%% ALPN.
	Protocols = lists:foldl(fun
		(http, Acc) -> [<<"http/1.1">>|Acc];
		({http, _}, Acc) -> [<<"http/1.1">>|Acc];
		(http2, Acc) -> [<<"h2">>|Acc];
		({http2, _}, Acc) -> [<<"h2">>|Acc];
		(_, Acc) -> Acc
	end, [], Protocols0),
	TransOpts = [
		{alpn_advertised_protocols, Protocols}
	|TransOpts1],
	%% SNI.
	%%
	%% Normally only DNS hostnames are supported for SNI. However, the ssl
	%% application itself allows any string through so we do the same.
	%%
	%% Only add SNI if not already present and OriginHost isn't an IP address.
	case lists:keymember(server_name_indication, 1, TransOpts) of
		false when is_list(OriginHost) ->
			[{server_name_indication, OriginHost}|TransOpts];
		false when is_atom(OriginHost) ->
			[{server_name_indication, atom_to_list(OriginHost)}|TransOpts];
		_ ->
			TransOpts
	end.

@thiagogsr
Copy link
Author

I'm going to test it with the latest OS version.

@kinyoklion
Copy link
Member

@thiagogsr

This seems very similar to this: erlang/otp#7321

I wonder if this is the underling change:
ninenines/gun@8b5f160

Which makes me think that maybe explicitly using the certify options would override that behavior (tls_basic_certifi_options).

Thank you,
Ryan

@thiagogsr
Copy link
Author

It seems so, I just don't understand why it tries to get cacerts when it's configured to :verify_none, like this other guy commented here. Looks like a regression of erlang/otp#7303. I will test the other TLS options anyway

@thiagogsr
Copy link
Author

The tls_basic_certifi_options was the only one that worked. I still think it should work with :verify_none, but I'm not sure if something that should be addressed here or somewhere else.

@kinyoklion
Copy link
Member

So, tls_basic_certifi_options defines cacerts, so gun doesn't attempt to use public_key:cacerts_get(). The code in gun doesn't seem to be conditional on verify_none so it is going to use public_key:cacerts_get() regardless.

Our helper for basic tls options will also attempt to use public_key:cacerts_get() and then fallback to certifi.

Thoeretically you could also specify cacerts as an empty list and that should bypass that code as well.

%{http_options: %{tls_options: [{:verify, :verify_none}, {:cacerts, []}]}}

I am hesitant to incorporate a workaround, as it seems probable that it could cause more unexpected behavior (as it is somewhat unexpected behavior that gun is adding cacerts when not provided in the options, making an explicitly empty list required to equal previous options).

Thanks,
Ryan

@thiagogsr
Copy link
Author

thiagogsr commented Nov 19, 2024

I understand @kinyoklion, I'm sure you know what's the best for the library. Thanks for the help on it. Just a last topic, in order to make tls_basic_certifi_options work, I had to install the certifi library. It's already a dependency of ldclient, however, I think this module is not loaded. Is that expected?

@kinyoklion
Copy link
Member

@thiagogsr

I will look. I do not think that is expected.

Did it not work with {:cacerts, []}? I will also try to find some time to determine if gun can be changed to prevent this issue.

Thank you,
Ryan

@kinyoklion kinyoklion changed the title Latest version cannot connect to LaunchDarkly Issues running in an environment without OS level certificates. Nov 19, 2024
@kinyoklion kinyoklion pinned this issue Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants