-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be more tolerant towards weird data in transaction pdus #17893
base: develop
Are you sure you want to change the base?
Be more tolerant towards weird data in transaction pdus #17893
Conversation
@@ -0,0 +1 @@ | |||
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more context for the issue you ran into specifically? Some issue that should be linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think there's an issue for it, but basically unless the room is v11, the version key is actually unprotected. Unfortunately there was no cs api protection for redacting the create event in conduwuit, so a silly bot creator accidentally redacted everything in the room, making that particular room version fall back to "1" like the spec says
Once i finally returned 200 OK, all the pdus and edus that had been failing started to trickle in again at last
I think matrix.org was also in the room at the time, so it's similarly affected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're talking about the behavior described here:
Room Version 11
[New in this version] [...] The
m.room.create
event now keeps the entirecontent
property.
And the m.room.create
event defines the content.room_version
.
For my own reference, this was added in MSC2176 and included in room version 11 via MSC3820.
I see how this PR allows valid events from a transaction to be received while ignoring invalid ones.
In the scenario where a room falls back to a v1 room, does the room_version
we have stored in Synapse get updated?
If the room_version
doesn't get updated, this just makes sure that the server can receive the final "valid" events in the room before it broke (people started sending v1 events). Is there an issue tracking whether Synapse should update the room_version
after the m.room.create
redaction?
If the room_version
does get updated to v1, then events are only accepted according to the order received (before/after receiving the redaction forcing the fallback).
Perhaps we should have a test for this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately there was no cs api protection for redacting the create event
Does this exist in Synapse somewhere already?
@@ -56,7 +56,11 @@ | |||
SynapseError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@morguldir Are you up for adding a test for this? Probably something in tests/federation/test_federation_server.py
, make a request like this and then assert that the other PDU's in the transaction besides the corrupted one were persisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is b6b5d81 what you had in mind? See also matrix-org/complement#743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I completely missed your Complement test in my initial review. The Synapse test is looking good 👍
event = event_from_pdu_json(p, room_version) | ||
except Exception as e: | ||
# We can only provide feedback to the federating server if we can determine what the event_id is | ||
# but since we we failed to parse the event, we can't derive the `event_id` so there is nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# but since we we failed to parse the event, we can't derive the `event_id` so there is nothing | |
# but since we failed to parse the event, we can't derive the `event_id` so there is nothing |
event1_json = event1.get_pdu_json() | ||
event2_json = event2.get_pdu_json() | ||
|
||
logging.info("Purposefully adding event id that shouldn't be there") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, I think we want to use "Purposely" here. Here is the best summary I could find:
Purposely means on purpose, purposefully means with purpose
https://www.reddit.com/r/ENGLISH/comments/188w42b/comment/kbnibp2/
logging.info("Purposefully adding event id that shouldn't be there") | |
// Purposely adding event_id that shouldn't be there |
{"pdus": [event1_json, event2_json]}, | ||
) | ||
body = channel.json_body | ||
logging.info(f"Response body: {body}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove logs
user = self.register_user("nex", "test") | ||
tok = self.login("nex", "test") | ||
room_id = self.helper.create_room_as("nex", tok=tok) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user = self.register_user("nex", "test") | |
tok = self.login("nex", "test") | |
room_id = self.helper.create_room_as("nex", tok=tok) | |
user = self.register_user("user1", "test") | |
tok = self.login("user1", "test") | |
room_id = self.helper.create_room_as("user1", tok=tok) |
tok = self.login("nex", "test") | ||
room_id = self.helper.create_room_as("nex", tok=tok) | ||
|
||
builder = self.hs.get_event_builder_factory().for_room_version( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably use this helper instead: from synapse.events import make_event_from_dict
"type": EventTypes.Message, | ||
"sender": user, | ||
"room_id": room_id, | ||
"content": {"body": "hello i am nexy", "msgtype": "m.text"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to label them for easier debugging
"content": {"body": "hello i am nexy", "msgtype": "m.text"}, | |
"content": {"body": "event1", "msgtype": "m.text"}, |
) | ||
|
||
event1_json = event1.get_pdu_json() | ||
event2_json = event2.get_pdu_json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add a third event event3
to better ensure that events before and after are still persisted.
self.assertTrue(body["pdus"][event1.event_id] == {}) | ||
self.assertTrue(body["pdus"][event2.event_id]["error"] != "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better asserts so we can tell what happened when these fail (instead of just seeing False
).
self.assertTrue(body["pdus"][event1.event_id] == {}) | |
self.assertTrue(body["pdus"][event2.event_id]["error"] != "") | |
// Ensure the response indicates an error for the corrupt event | |
self.assertEqual(body["pdus"][event1.event_id], {}) | |
self.assertNotEqual(body["pdus"][event2.event_id].get("error", ""), "") |
result = self.get_success( | ||
self.hs.get_storage_controllers().main.get_event(event1.event_id) | ||
) | ||
self.assertEqual(result.event_id, event1.event_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result = self.get_success( | |
self.hs.get_storage_controllers().main.get_event(event1.event_id) | |
) | |
self.assertEqual(result.event_id, event1.event_id) | |
// Make sure other valid events from the send transaction were persisted successfully | |
self.get_success( | |
self.hs.get_storage_controllers().main.get_event(event1.event_id) | |
) |
@@ -0,0 +1 @@ | |||
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir | |
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir. |
# to use as the `pdu_results` key. Best we can do is just log for our own record and move on. | ||
if possible_event_id != _UNKNOWN_EVENT_ID: | ||
pdu_results[possible_event_id] = { | ||
"error": f"Failed to convert json into event, {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"error": f"Failed to convert json into event, {e}" | |
"error": f"Failed to convert JSON into event: {e}" |
# An event should only have an event_id at this point if it's for a v1/v2 like room. | ||
# In future room versions, the `event_id` is derived from the event canonical JSON. | ||
# | ||
# So if we see a `event_id` but the room version doesn't support | ||
# v1/v2 events, then it's invalid and we should reject it. | ||
if possible_event_id != _UNKNOWN_EVENT_ID: | ||
if room_version.event_format != EventFormatVersions.ROOM_V1_V2: | ||
logger.info( | ||
f"Rejecting event {possible_event_id} from {origin} " | ||
f"because the event was made for a v1 room, " | ||
f"while {room_id} is a v{room_version.identifier} room" | ||
) | ||
pdu_results[possible_event_id] = { | ||
"error": "Event ID should not be supplied in non-v1/v2 room" | ||
} | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of all of this logic here, we could just add an error message to the assertion in the parsing:
synapse/synapse/events/__init__.py
Line 408 in 77eafd4
assert "event_id" not in event_dict |
assert (
"event_id" not in event_dict
), "Event ID should not be supplied in non-v1/v2 rooms"
# An event should only have an event_id at this point if it's for a v1/v2 like room. | |
# In future room versions, the `event_id` is derived from the event canonical JSON. | |
# | |
# So if we see a `event_id` but the room version doesn't support | |
# v1/v2 events, then it's invalid and we should reject it. | |
if possible_event_id != _UNKNOWN_EVENT_ID: | |
if room_version.event_format != EventFormatVersions.ROOM_V1_V2: | |
logger.info( | |
f"Rejecting event {possible_event_id} from {origin} " | |
f"because the event was made for a v1 room, " | |
f"while {room_id} is a v{room_version.identifier} room" | |
) | |
pdu_results[possible_event_id] = { | |
"error": "Event ID should not be supplied in non-v1/v2 room" | |
} | |
continue |
@@ -0,0 +1 @@ | |||
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you're talking about the behavior described here:
Room Version 11
[New in this version] [...] The
m.room.create
event now keeps the entirecontent
property.
And the m.room.create
event defines the content.room_version
.
For my own reference, this was added in MSC2176 and included in room version 11 via MSC3820.
I see how this PR allows valid events from a transaction to be received while ignoring invalid ones.
In the scenario where a room falls back to a v1 room, does the room_version
we have stored in Synapse get updated?
If the room_version
doesn't get updated, this just makes sure that the server can receive the final "valid" events in the room before it broke (people started sending v1 events). Is there an issue tracking whether Synapse should update the room_version
after the m.room.create
redaction?
If the room_version
does get updated to v1, then events are only accepted according to the order received (before/after receiving the redaction forcing the fallback).
Perhaps we should have a test for this scenario?
Complement test: matrix-org/complement#743
Pull Request Checklist