Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the behavior of silent dropping replay packets #556

Merged
merged 1 commit into from
Jun 29, 2021

Conversation

Mygod
Copy link
Contributor

@Mygod Mygod commented Jun 19, 2021

This removes a carefully engineered feature (proposed in shadowsocks/shadowsocks-org#44 (comment)) to detect Shadowsocks as per net4people/bbs#22. Testing in progress.

@Mygod
Copy link
Contributor Author

Mygod commented Jun 19, 2021

CC @madeye Feel free to test.

@zonyitoo
Copy link
Collaborator

Hmm, but why removing it?

@Mygod
Copy link
Contributor Author

Mygod commented Jun 19, 2021

Because Bloom filter itself is a feature that can be utilized by GFW. See the post I linked above.

EDIT: Yes. I am saying that having a Bloom filter is even worse than not doing anything, and I am actively testing this right now.

@database64128
Copy link
Contributor

The report you linked as a reference contradicts your point. Shadowsocks servers need protection against replay attacks. The current bloomfilter-based implementation might not be good enough (limited capacity, no persistency, etc). But removing the protection is not the solution. We need protocol changes to achieve full protection.

Since you have consistently given downvotes/objections to proposals of any change (shadowsocks/shadowsocks-org#177, shadowsocks/shadowsocks-org#178, shadowsocks/shadowsocks-org#183 (comments)), I'm giving your PR a 👎.

@zonyitoo
Copy link
Collaborator

Since there is no better solution for now, doing nothing should be better than removing it completely. net4people/bbs#22 actually indicates that the main detection method is replay.

@Mygod
Copy link
Contributor Author

Mygod commented Jun 19, 2021

Let me clarify by giving an explicit attack to distinguish/detect shadowsocks traffic. Most protocols that looks random allow replay of the first packet. The only counterexample I can think of is TLS 1.3 0-RTT, but even there, the behavior is immediately different. shadowsocks-libev and shadowsocks-rust disallows replay within a certain period (until server restarts or ping-pong bloom filter resets). This by itself is a feature that an active probing adversary can utilize to distinguish shadowsocks-rust from TLS by doing replay. To perform this distinguishing attack, you need:

  1. Immediate replay of the packet, and verify that the connection is silently dropped.
  2. Replay of the packet after a random amount of time, and verify that the server eventually accepts and returns some data.
  3. Replay of the packet (also after a random amount of time), with some bytes changed, and verify that the connection is silently dropped.

Given that there are enough traffic to the server, shadowsocks-libev and shadowsocks-rust eventually passes this test with some noticeable probability. Compare this to TLS (not 0-RTT), which will never pass the first test since TLS always accepts replay of the ClientHello message. Even for TLS 0-RTT, the server returns HTTP 425 when it detects a replay, instead of silently dropping the connection, and therefore also fails to pass the first test. Therefore, a firewall can perform this test randomly, and block the server whenever all three tests passes. (The distribution of the delay can be chosen so that the amortized space complexity to carry out the attack is low.)

In fact, if the distinguisher/censor keeps track of a counter for all the new connections made to the server, he can in principle know exactly when to replay the packet that it will pass the verification, since funnily enough, the number of maximum entries for the Bloom filter is hardcoded in shadowsocks-libev/shadowsocks-rust:

// Entries for server's bloom filter
//
// Borrowed from shadowsocks-libev's default value
const BF_NUM_ENTRIES_FOR_SERVER: usize = 1_000_000;

Given the data from net4people/bbs#22, it is likely that this attack is already deployed. There are a few alternative ways to protect against this attack, but this PR is the only way without changing the protocol, and I am testing its effectiveness on a popular VPS. By removing the Bloom filter, shadowsocks-rust now fails the first test and passes the other two, which is consistent with the behavior of TLS. This backs up my claim that "doing nothing is better than having a Bloom filter," as per #556 (comment).

@zonyitoo
Copy link
Collaborator

Well, sounds reasonable. What do you think @madeye ?

@Mygod
Copy link
Contributor Author

Mygod commented Jun 19, 2021

This also fixes shadowsocks/shadowsocks-org#184.

Since you have consistently given downvotes/objections to proposals of any change (shadowsocks/shadowsocks-org#177, shadowsocks/shadowsocks-org#178, shadowsocks/shadowsocks-org#183 (comments)), I'm giving your PR a 👎.

@database64128 You mean I have consistently given downvotes/objections to proposals of any useless or even harmful change*. You are welcome.

*Also I have supported the change of adding AAD to the protocol, as per shadowsocks/shadowsocks-org#183 (comment). However, since it is a breaking change, it is not worth the upgrade for now.

Also since you mentioned it, you can see that I have already proposed some similar ideas in shadowsocks/shadowsocks-org#183 (comment). I think this Bloom filter thing is a good example of how various proposals of "protecting against replay attacks" or "forward secrecy" or [insert your other unnecessary demands on Shadowsocks and questionable protocol "upgrades"] can actually harm the main goal of Shadowsocks being an access tool.

@database64128
Copy link
Contributor

This by itself is a feature that an active probing adversary can utilize to distinguish shadowsocks-rust from TLS by doing replay.

By removing the Bloom filter, shadowsocks-rust now fails the first test and passes the other two, which is consistent with the behavior of TLS.

This makes no sense at all. Even a simple DPI system can tell that Shadowsocks AEAD traffic is not TLS. Not mimicking TLS doesn't make Shadowsocks unique. But a half-baked "fake" TLS will almost certainly stand out.

@madeye
Copy link
Contributor

madeye commented Jun 19, 2021

@zonyitoo instead of removing the bloomfilter, I think we can disable its behavior "drop replay connection" by default.

Let's still report the replay in the log, which should help to understand the replay probe in the future.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jun 20, 2021

Well, that should be a good idea.

ss-go2 also have a bf for detecting replay connections, what do you think? @riobard

I think it can also be a configuration feature, simply:

{
    // A switch for enabling silent drop of replay connections
    // It is "true" by default, for backward compatiblity.
    "silent_drop_replay": false
}

@zonyitoo
Copy link
Collaborator

https://github.com/Jigsaw-Code/outline-ss-server/blob/master/service/PROBES.md

This is how outline-ss-server handles replay connections.

@RPRX
Copy link

RPRX commented Jun 20, 2021

需要留意,一旦移除了重放过滤器,shadowsocks/shadowsocks-org#183 会有更多玩法。

@Mygod
Copy link
Contributor Author

Mygod commented Jun 20, 2021

@zonyitoo There is no backwards compatibility to worry about here. The honest users should not be affected by this change at all.

@RPRX Any concrete efficient attack for detecting Shadowsocks traffic if we remove the filter?

@RPRX
Copy link

RPRX commented Jun 20, 2021

@Mygod

You mean I have consistently given downvotes/objections to proposals of any useless or even harmful change*. You are welcome.

shadowsocks/shadowsocks-org#178 (comment)

Any concrete efficient attack for detecting Shadowsocks traffic if we remove the filter?

比如 shadowsocks/shadowsocks-org#183 提到的对客户端的重放,以及将客户端的请求塞回给客户端、将服务端的响应塞回给服务端、再加上一些逐字节探测的手段,还有 UDP DNS 杂交到 TCP 对比响应长度什么的...很难排列组合完,这些都变成了加密层无法识别、无法防御的攻击,具体的 Shadowsocks 受攻击响应行为 pattern 需逐个实测。

@dev4u
Copy link

dev4u commented Jun 20, 2021

I think it can also be a configuration feature, simply:

{
    // A switch for enabling silent drop of replay connections
    // It is "true" by default, for backward compatiblity.
    "silent_drop_replay": false
}

能不能判断运行在plugin模式下自动切换?

@zonyitoo
Copy link
Collaborator

I think it can also be a configuration feature, simply:

{
    // A switch for enabling silent drop of replay connections
    // It is "true" by default, for backward compatiblity.
    "silent_drop_replay": false
}

能不能判断运行在plugin模式下自动切换?

Ah. That’s a good point.

@riobard
Copy link

riobard commented Jun 21, 2021

@zonyitoo What exactly is "silent_drop_replay" in this case? Currently go-ss2 just keeps reading from the connection after detecting replay without actively dropping it. Does ss-rust behave the same?

@zonyitoo
Copy link
Collaborator

@zonyitoo What exactly is "silent_drop_replay" in this case? Currently go-ss2 just keeps reading from the connection after detecting replay without actively dropping it. Does ss-rust behave the same?

Yes, they work the same. Here Mygod proposed a change to remove it completely, and madeye suggested to change the behavior to detect only. What do you think?

@riobard
Copy link

riobard commented Jun 21, 2021

@zonyitoo It's definitely better to keep the replay detection at least to inform server admins that they're being probed.

I'm undecided as to whether or not disable silent drop. If the end goal is to make it more troublesome to probe, maybe a chaotic approach would be better? e.g. after detecting replay (or any invalid attempts to connect), we could respond randomly, i.e. sometimes drop the connection after a random timeout, sometimes keep the connection but reply with bogus data, sometimes just keep draining as if nothing is wrong. Hopefully it would confuse probes. The probabilities of each response would ideally be different per server deployment in order to avoid leaking statistical patterns.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Jun 22, 2021

If respond randomly, then the attacker would try with one data stream multiple times, and check if the server respond randomly.

If we can persist the IV filter for longer time, then the current issue should be able to avoid.

@Mygod
Copy link
Contributor Author

Mygod commented Jun 22, 2021

If we can persist the IV filter for longer time, then the current issue should be able to avoid.

Do you want to use a database holding all the past IV, and make your server super slow, and have users complaining that their disk is full?

@zonyitoo
Copy link
Collaborator

Storing the past bloom filters should be enough. And we don't need to store all filters.

@Mygod
Copy link
Contributor Author

Mygod commented Jun 22, 2021

No and the reason is left as exercise.

@Mygod
Copy link
Contributor Author

Mygod commented Jun 22, 2021

Updated the patch to keep the error logging as per #556 (comment).

P.S. The answer to the exercise above is that your running time still scales linearly, and even worse your false positive rate goes up also linearly.

@Mygod Mygod changed the title Remove Bloom filter Remove the behavior of silent dropping replay packets Jun 22, 2021
@riobard
Copy link

riobard commented Jun 22, 2021

@zonyitoo

If respond randomly, then the attacker would try with one data stream multiple times, and check if the server respond randomly.

This is easy to counter-react: choose one of many responses based on a hash of the IV then we have the same response for one particular stream.

@database64128

This comment has been minimized.

@gfw-report
Copy link

I think we need to figure out if long-term replay protection is in general good or bad for fingerprinting Shadowsocks servers. If you have spare resources, may I suggest that you do an analysis of existing popular non-TLS encrypted traffic and see what's the percentage of them being replay-resistant? Ideally we'd like to blend in with those traffic and respond to replay in similar ways.

Great suggestions, @riobard ! We have been taking actions since you suggested.

@database64128
Copy link
Contributor

I want to apologize for my previous statements against Mygod on replay protection. After reevaluating the situation with @zonyitoo, I now agree that the current implementation is inherently broken, and should probably be completely removed. I have retracted my downvotes and changed some of my previous statements to be hidden.

@riobard
Copy link

riobard commented Sep 28, 2021

What's the latest development?

@database64128
Copy link
Contributor

What's the latest development?

It has been completely removed in 90f05a5.

@riobard
Copy link

riobard commented Sep 28, 2021

I saw it. I'm wondering what changed your mind.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Sep 28, 2021

  1. sslocal in this project supports multiple remote servers. It will "ping" all the remote servers every 10 seconds. So if you have many servers, then the bloom filters will soon become full and start to cleanup themselves.
  2. Many websites and Apps uses QUIC. Every UDP packets will need to generate an unique salt/iv, that will also cause the bloom filters become full in a very short time (a couple of minutes).
  3. If a ssserver is shared by multiple clients, then each individual clients may generate the same salt/iv then be rejected by the ssserver.
  4. The SYN of TFO may be retransmitted, then the ssserver will see this as two individual clients and reject the second one (the real one).
  5. ...

To be concluded, the current implementation of replay attack protection with ping-pong bloom filters is actually broken, which couldn't provide any protection as it was designed for. And it may also reject legitimate clients causing unexpected connection failures.

@database64128
Copy link
Contributor

I saw it. I'm wondering what changed your mind.

We first observed a high false positive rate on TFO-enabled servers. This was probably partially caused by duplicate SYNs with data. It has led us to do a little math on the real world effectiveness of the replay protection implementation.

Both shadowsocks-rust and Outline use 10000 as the IV cache capacity by default. Turns out you can easily exhaust it in as little as a few minutes even on your personal server. Just watch a YouTube video via QUIC (It's the default transport on both desktop and mobile). Half way through the video, the cached IVs have probably already been rotated.

Even if you give up on filtering UDP traffic (like Outline does), new TCP connections from normal household Internet usage can exhaust your IV cache capacity in less than a few hours. Let's also not forget about the fact that most of us share servers with family, friends or in a community.

Clients also can't guarantee the uniqueness of the generated salt due to the limited capacity, leading to higher false positive rate at server side. With 250 servers in ss-rust's ping balancer, it takes exactly 5 minutes to fill up the salt cache.

@riobard
Copy link

riobard commented Sep 28, 2021

UDP is an afterthought anyway, so it's fine to ignore those. 10k capacity is clearly insufficient. IIRC libev port defaults to 1m? You should raise the capacity on busy servers.

I'm wondering if there's any behavior leaks. Previous discussion seems resulted in no clear conclusion.

@zonyitoo
Copy link
Collaborator

Both shadowsocks-rust and Outline use 10000 as the IV cache capacity by default.

Correct: shadowsocks-rust has the same capacity as shadowsocks-libev. But even 10k capacity is not enough for the QUIC case.

Also, the bloom filter can only allow one client to one server configuration. But in reality, users have mobile phones, PCs, smart routers, relays, ... The current implementation will require all clients to generate salt/iv to be globally unique, which is impossible in most cases.

And we also found that TFO will significantly reduce latency in shadowsocks connections, but the "duplicate SYNs" behavior will cause shadowsocks servers to reject clients and causing connection problems, or prints lots of warning logs in the shadowsocks-rust previous implementation (no reject but just print logs).

@riobard
Copy link

riobard commented Sep 28, 2021

Please don't complicate this issue by messing up several related issues. I'll address each of @zonyitoo's previous 5 points below:

  1. This is a capacity issue. Raise it up to 1m or 10m will solve the problem. It won't cost too much memory.
  2. Still a capacity issue, but UDP makes it much more severe. Disable anti-replay for UDP is an acceptable workaround.
  3. Statistically how often does this happen?
  4. This is a legitimate concern, but I'd argue that TFO is rather broken anyway and it's clearly a minority on the Internet. If we're worried about leaking behavior signals to observers, maybe TFO should be disabled, no?

Don't take me wrong. I'm not defending replay protection per se, but let's separate configuration issues and fundamental brokenness.

@riobard
Copy link

riobard commented Sep 28, 2021

Also, the bloom filter can only allow one client to one server configuration. But in reality, users have mobile phones, PCs, smart routers, relays, ... The current implementation will require all clients to generate salt/iv to be globally unique, which is impossible in most cases.

What are you even thinking? Which part of UUID have you forgotten about? :D

@zonyitoo
Copy link
Collaborator

zonyitoo commented Sep 28, 2021

  1. Raising up capacity can help but not an ultimate solution.
  2. Ok, then we can disable UDP replay protection, but QUIC is now become more and more popular in the Internet. So in the end, we have already disable replay detect for the most cases already.
  3. It is not a capacity issue. Different clients won't be able to generate globally unique random iv/salts without communicating. If you use UUID as iv/salt, then it would become a complete fault.
  4. Maybe not, TFO is not broken. You can get obvious latency improvement after enabling TFO.

@riobard
Copy link

riobard commented Sep 28, 2021

I'm sorry but this is not the constructive way to reason about it.

Probabilistic approaches like Bloom filters will never give you 100% reliable results. The idea is to find a very low probability such that it won't matter in practice. If all you're concerned about is false positive rate, just increase the capacity at the cost of cheap RAM will solve the problem. There's no ultimate solutions. Just tradeoffs.

UDP traffic has completely different behaviors. Remember we need replay protection for fingerprinting. Now convince me if QUIC can be fingerprinted by replaying? If not, it's fine to leave it alone.

Once again, tell me what's the probability of two honest clients repeatedly generating 256-bit randomness and ending up colliding? It's not a capacity issue coz it's a statistics issue. Use your math! 😂

Even if TFO is not broken, how do you deal with the fact that almost no other TCP traffic adopts TFO? What replay protection protects us from? It's completely backwards.

Again, I'm not saying replay protection is alright. I'm just not seeing the correct arguments against it.

@zonyitoo
Copy link
Collaborator

Or we should keep it simple:

  1. The Ping-pong Bloom filter is not a perfect solution because it have obvious flaws, and you have agreed with it.
  2. It couldn't protect us from replay attack because it cannot distinguish legitimate clients and evil clients.
  3. The experiment that Mygod have done is that the current solution have no effect in protecting servers from being discovered by GFW.

Just that simple. Ineffective, False Positive, Flawed.

We can further engineered a new solution against replay attack.

@riobard
Copy link

riobard commented Sep 28, 2021

Bloom filters obviously is not perfect, but a) there's no perfect solution; and b) SS itself is not a perfect solution; and c) I'm not saying we should not use an imperfect solution. C'est la vie!

it cannot distinguish legitimate clients and evil clients.

You're jumping right into the conclusion without even arguing for it. How does it not distinguish replayed traffic? Sure it cannot protect us from every attack but for the intended protection against short-term replay it's very effective.

@Mygod's experiment above is very valuable but it's only a single data point. So far there's no more useful data points to (dis)prove the effectiveness of replay protection at all. @gfw-report agreed to do more experiments but it might take more time to reach conclusions.

My objection to your arguments so far is that they're mostly nitpicking about configuration defects (which can be simply corrected by changing some parameters), and some of they are factually wrong (e.g. 256-bit randomness collision) or conflicting objectives (e.g. TFO vs hidden-in-plain-sight).

@Mygod
Copy link
Contributor Author

Mygod commented Sep 28, 2021

I don't think 90f05a5 is necessary. As proposed by @madeye, detecting the replays and printing a warning (without acting on the packet differently in a way observable by the external environment) is still useful.

@riobard Certainly 10000 capacity is not enough. However, considering that all it takes for an attacker to distinguish is a single replay success, you would have to remember virtually every single past IV to protect against this attack. On the other hand, the attack only needs to randomly pick a constant number of IVs to remember and eventually succeed at detecting. Removing the bloom filter invalidates this attack completely without the server remembering a single IV.

As I have stressed multiple times in this thread, Bloom filters are so shit that it is better not having it to begin with. Good bye.

@zonyitoo
Copy link
Collaborator

I am not trying to convince you because I agree with your opinion.

You are saying that this is rather an effective solution, but we were saying that we would better remove it completely than have a solution that have several obvious flaws. You may notice that we are talking about the same thing from different view.

Mygod is right, if you are actually want to protect against from replay attack, you will have to remember all IVs instead of a "short-term replay". If it can only protect in a short-term, in my point of view, it is ineffective.

@riobard
Copy link

riobard commented Sep 28, 2021

Now we're on the right path 😄

So at the core of the issue is about whether replay protection is desirable at all. The answer to that question really depends on analysis of other popular anonymous traffic and see what kind of replay protection (if any) they employ. I don't have a clear idea and I'm hoping @gfw-report could continue researching on it.

If it's deemed necessary to have perfect replay protection, I agree with both of you that the current solution is very problematic. Specifically, @Mygod said that

the attack only needs to randomly pick a constant number of IVs to remember and eventually succeed at detecting.

which I'm not 100% sure. Yes it's definitely a weakness, but do other less-than-perfectly-replay-protected protocols exhibit similar weakness? In other words, how reliable can a probe fingerprint SS servers knowing this behavior quirk, and at what storage/scalability cost? Also

Removing the bloom filter invalidates this attack completely without the server remembering a single IV.

which I completely agree, but then we're vulnerable to short-term replay. Are we now more vulnerable to fingerprinting or not? I guess it also depends on whether other popular anonymous traffic are susceptible to replay, right?

Once again, I'm not pro or against replay protection. I'm just saying that we don't know enough to have a clear-cut conclusion yet.

@riobard
Copy link

riobard commented Sep 28, 2021

By the way, it's relatively easy to gain long-term replay protection while keeping the simplistic Bloom filters and without changing the underlying protocol. Just mix-in a coarse-grained timestamp (e.g. one tick every 15 minutes) into the salt randomization process. It would require both clients and servers to adapt for proper enforcement, but servers could choose to fallback to non-timestamped as before for backwards compatibility.

I'm hoping we're not throwing the baby out with bathwater. 😂

@zonyitoo
Copy link
Collaborator

You are right, I am also thinking about this kind of solution.

@riobard
Copy link

riobard commented Sep 28, 2021

Great! And if there's new solution to address it, we should probably first discuss what compatibility cost we're gonna swallow 🤢

@dev4u
Copy link

dev4u commented Sep 28, 2021

可以考虑用totp来产生salt

@ditsing
Copy link

ditsing commented Oct 3, 2021

I know one of the requirements of reply attack mitigation is to not modify the original protocol. But if we swap the salt used by the client and the server, we could trivially make reply attacks impossible. We could do the follows.

The client initiates a connection and sends a random salt to the server. The server responses with another random salt to the client. After the encryption handshake, the server uses the salt designated by the client's first request to encrypt traffic sent to the client, and the client should use the salt provided by the server to encrypt traffic sent to the server. The wire format will stay the same. Reply attack would be impossible, because the server will send different salt to different clients.

@zonyitoo
Copy link
Collaborator

zonyitoo commented Oct 3, 2021

But no forward security.

BTW, that will require the server to respond instantly after received the first packet. The current shadowsocks protocol doesn't require that behavior.

@ditsing
Copy link

ditsing commented Oct 12, 2021

But no forward security.

I spent quite some time Googling this term (and forward secrecy), but could not quite understand it in the context of Shadowsocks. Do you have an example scenario?

BTW, that will require the server to respond instantly after received the first packet. The current shadowsocks protocol doesn't require that behavior.

That is true. Not replying immediately makes the shadow server look more like an HTTP server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants