Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A first proposal to fix the no-sni section. #594

Merged
merged 1 commit into from
Feb 24, 2024

Conversation

taddhar
Copy link
Contributor

@taddhar taddhar commented Nov 15, 2023

In this part of the text there would be a lot to say.

First for non-English native speakers who are not familiar with the topic and in particular in operations or even to some degree in system architecture or network security domains, the text will be very difficult to read.

In fact think about someone in a Small and Medium Enterprises organization that has to ensure the security of its organization it will be close to impossible to connect the dots. How many dozen of millions of SMEs on the planet?

Anyway, it is VERY hard to be more explicit in the context of this draft though I would be tempted to add examples and be more explicit on sub-contexts and on use cases e.g. put in hard BYOD will fail. but anyway this links to OUR Internet Draft on Deployment Considerations that suddenly makes a lot of sense as an expansion of this section from 1 page to 20+.

One could argue that there are many types of Middleboxes and they all will be affected by that (DPI, Firewalls, Proxies, etc.)

However there is ONE section which is simply 1) wrong and 2) completely irrelevant is the last section which I removed. The only alternative would be to make a significant fix in that section and even if I fix it, it will add nothing to the fact that when a number of devices won't be able to access the SNI, they will break, period.

So I am happy to rework a section but the section will say that whilst in general the SNI is unreliable, in practice there are essential sub-context where is is not only VERY reliable but as well is a very valuable information for in-path actors who have many ways to check and validate and handle the unreliable part.

@ekr
Copy link
Collaborator

ekr commented Nov 15, 2023

So I am happy to rework a section but the section will say that whilst in general the SNI is unreliable, in practice there are essential sub-context where is is not only VERY reliable but as well is a very valuable information for in-path actors who have many ways to check and validate and handle the unreliable part.

You're going to have to explain more about why you think this is the case. Validating the certificate is insufficient. You need to ensure that the site has the corresponding private key, which you can't do passively with TLS 1.2 in non-PFS mode.

@ekr
Copy link
Collaborator

ekr commented Nov 15, 2023

I appreciate that people use the SNI the way you indicate. But the point of this text is that this practice is not secure if the endpoint is compromised.

@roelfdutoit
Copy link

Network security middleboxes utilize various techniques to improve the reliability of the control in an enterprise environment, inter alia:

  1. DNS hostname lookup of SNI and optional destination redirect or failure on mismatch.
  2. TLS "probe" sessions to validate not only the certificate and SAN entries, but also the ability of the destination to complete the TLS handshake.

@ekr
Copy link
Collaborator

ekr commented Nov 15, 2023

I don't understand what (1) means. Re (2) this is not reliable because the attacker can simply forward the TLS connection to the true endpoint when being probed and it's straightforward to arrange that the probe connection doesn't look like a connection from their own compromised clients.

@roelfdutoit
Copy link

roelfdutoit commented Nov 15, 2023

I don't understand what (1) means

dstip_client = destination as observed on client side of middlebox
dstip_sni = gethostbyname(SNI)
Option 1: if (dstip_sni != dstip_client) then reset connection.
Option 2: dstip = dstip_sni, i.e. redirect the connection to dstip_sni, where "redirect" is implementation specific.

@roelfdutoit
Copy link

(2) this is not reliable because the attacker can simply forward the TLS connection to the true endpoint

Sure, I agree, but do they really? Implementing (2) forces the attacker to implement this relay, and when combined with (1) gives the middlebox a mechanism to detect an anomaly where the true endpoint dstip seems to have "jumped".

@ekr
Copy link
Collaborator

ekr commented Nov 15, 2023

I don't understand what (1) means

dstip_client = destination as observed on client side of middlebox dstip_sni = gethostbyname(SNI) Option 1: if (dstip_sni != dstip_client) then reset connection. Option 2: dstip = dstip_sni, i.e. redirect the connection to dstip_sni, where "redirect" is implementation specific.

Wait, you're talking about comparing the IP address? It's incredibly common to get different IPs for the same DNS name, so (1) is a non-starter (2) is really just like fighting with the load balancing on the server, so is bad news.

@roelfdutoit
Copy link

Wait, you're talking about comparing the IP address? It's incredibly common to get different IPs for the same DNS name, so (1) is a non-starter (2) is really just like fighting with the load balancing on the server, so is bad news.

The context of my comments is "enterprise environment". Calling (1) a non-starter ignores the level of control an enterprise has over DNS (including DoH requests) and gethostbyname in my pseudo-code is not referring to a simple lookup.

As for (2).. the attacker tries to bypass the middlebox by using www.example.org as SNI on a TLS session to c2.attacker.xxx. The middlebox will resolve www.example.org and connect upstream to www.example.org, not to c2.attacker.xxx.

@sftcd
Copy link
Collaborator

sftcd commented Nov 15, 2023 via email

@ekr
Copy link
Collaborator

ekr commented Nov 16, 2023

Wait, you're talking about comparing the IP address? It's incredibly common to get different IPs for the same DNS name, so (1) is a non-starter (2) is really just like fighting with the load balancing on the server, so is bad news.

The context of my comments is "enterprise environment". Calling (1) a non-starter ignores the level of control an enterprise has over DNS (including DoH requests) and gethostbyname in my pseudo-code is not referring to a simple lookup.

As for (2).. the attacker tries to bypass the middlebox by using www.example.org as SNI on a TLS session to c2.attacker.xxx. The middlebox will resolve www.example.org and connect upstream to www.example.org, not to c2.attacker.xxx.

The relevant case here is one where the endpoint is doing its own DNS resolution invisibly to the enterprise, because otherwise the enterprise can just strip the ECHConfigs.

@roelfdutoit
Copy link

The relevant case here is one where the endpoint is doing its own DNS resolution invisibly to the enterprise, because otherwise the enterprise can just strip the ECHConfigs.

Understood, but I thought we were discussing the reliabillity of SNI as control irrespective of ECH as stated in the text this PR is trying to modify: "use as a control even in the absence of ECH".

In an enterprise environment where the client device is compromised the network security device will use the techniques I have described above to improve the reliability of SNI as a control. I understand that the attacker can also improve its chances of not being detected, but it has to move up the Pyramid of Pain - just spoofing the SNI is an unreliable way to bypass advanced network security devices.

@taddhar
Copy link
Contributor Author

taddhar commented Nov 16, 2023

There is no disagreement on the goal of this draft, the disagreement is that the clause we are debating is wrong but more importantly it is irrelevant to this draft, so we are not even trying to fix it, this will confuse anyone and adds nothing to the benefit of ECH.

@sftcd
Copy link
Collaborator

sftcd commented Nov 16, 2023 via email

@ekr
Copy link
Collaborator

ekr commented Feb 17, 2024

OK, given that this is not in security considerations and (1) the text here makes a claim which is disputed and (2) the effect of removing the text is to make the impact of ECH seem bigger than it did in the previous text which seems to be erring on the safe side, I'm going to merge this PR unless someone objects by 2/24.

@ekr ekr merged commit 9f5c3f1 into tlswg:master Feb 24, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants