Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more tolerant towards weird data in transaction pdus #17893

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

morguldir
Copy link

@morguldir morguldir commented Oct 31, 2024

Complement test: matrix-org/complement#743

Pull Request Checklist

@morguldir morguldir requested a review from a team as a code owner October 31, 2024 18:44
synapse/federation/federation_server.py Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there more context for the issue you ran into specifically? Some issue that should be linked?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think there's an issue for it, but basically unless the room is v11, the version key is actually unprotected. Unfortunately there was no cs api protection for redacting the create event in conduwuit, so a silly bot creator accidentally redacted everything in the room, making that particular room version fall back to "1" like the spec says

Once i finally returned 200 OK, all the pdus and edus that had been failing started to trickle in again at last

I think matrix.org was also in the room at the time, so it's similarly affected

Copy link
Contributor

@MadLittleMods MadLittleMods Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're talking about the behavior described here:

Room Version 11

[New in this version] [...] The m.room.create event now keeps the entire content property.

-- https://spec.matrix.org/v1.12/rooms/v11/#redactions

And the m.room.create event defines the content.room_version.

For my own reference, this was added in MSC2176 and included in room version 11 via MSC3820.


I see how this PR allows valid events from a transaction to be received while ignoring invalid ones.

In the scenario where a room falls back to a v1 room, does the room_version we have stored in Synapse get updated?

If the room_version doesn't get updated, this just makes sure that the server can receive the final "valid" events in the room before it broke (people started sending v1 events). Is there an issue tracking whether Synapse should update the room_version after the m.room.create redaction?

If the room_version does get updated to v1, then events are only accepted according to the order received (before/after receiving the redaction forcing the fallback).

Perhaps we should have a test for this scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there was no cs api protection for redacting the create event

Does this exist in Synapse somewhere already?

synapse/federation/federation_server.py Outdated Show resolved Hide resolved
synapse/federation/federation_server.py Outdated Show resolved Hide resolved
synapse/federation/federation_server.py Outdated Show resolved Hide resolved
@@ -56,7 +56,11 @@
SynapseError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morguldir Are you up for adding a test for this? Probably something in tests/federation/test_federation_server.py, make a request like this and then assert that the other PDU's in the transaction besides the corrupted one were persisted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is b6b5d81 what you had in mind? See also matrix-org/complement#743

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I completely missed your Complement test in my initial review. The Synapse test is looking good 👍

event = event_from_pdu_json(p, room_version)
except Exception as e:
# We can only provide feedback to the federating server if we can determine what the event_id is
# but since we we failed to parse the event, we can't derive the `event_id` so there is nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# but since we we failed to parse the event, we can't derive the `event_id` so there is nothing
# but since we failed to parse the event, we can't derive the `event_id` so there is nothing

event1_json = event1.get_pdu_json()
event2_json = event2.get_pdu_json()

logging.info("Purposefully adding event id that shouldn't be there")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, I think we want to use "Purposely" here. Here is the best summary I could find:

Purposely means on purpose, purposefully means with purpose

https://www.reddit.com/r/ENGLISH/comments/188w42b/comment/kbnibp2/

Suggested change
logging.info("Purposefully adding event id that shouldn't be there")
// Purposely adding event_id that shouldn't be there

{"pdus": [event1_json, event2_json]},
)
body = channel.json_body
logging.info(f"Response body: {body}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove logs

Comment on lines +90 to +92
user = self.register_user("nex", "test")
tok = self.login("nex", "test")
room_id = self.helper.create_room_as("nex", tok=tok)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
user = self.register_user("nex", "test")
tok = self.login("nex", "test")
room_id = self.helper.create_room_as("nex", tok=tok)
user = self.register_user("user1", "test")
tok = self.login("user1", "test")
room_id = self.helper.create_room_as("user1", tok=tok)

tok = self.login("nex", "test")
room_id = self.helper.create_room_as("nex", tok=tok)

builder = self.hs.get_event_builder_factory().for_room_version(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably use this helper instead: from synapse.events import make_event_from_dict

"type": EventTypes.Message,
"sender": user,
"room_id": room_id,
"content": {"body": "hello i am nexy", "msgtype": "m.text"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to label them for easier debugging

Suggested change
"content": {"body": "hello i am nexy", "msgtype": "m.text"},
"content": {"body": "event1", "msgtype": "m.text"},

)

event1_json = event1.get_pdu_json()
event2_json = event2.get_pdu_json()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add a third event event3 to better ensure that events before and after are still persisted.

Comment on lines +125 to +126
self.assertTrue(body["pdus"][event1.event_id] == {})
self.assertTrue(body["pdus"][event2.event_id]["error"] != "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better asserts so we can tell what happened when these fail (instead of just seeing False).

Suggested change
self.assertTrue(body["pdus"][event1.event_id] == {})
self.assertTrue(body["pdus"][event2.event_id]["error"] != "")
// Ensure the response indicates an error for the corrupt event
self.assertEqual(body["pdus"][event1.event_id], {})
self.assertNotEqual(body["pdus"][event2.event_id].get("error", ""), "")

Comment on lines +127 to +130
result = self.get_success(
self.hs.get_storage_controllers().main.get_event(event1.event_id)
)
self.assertEqual(result.event_id, event1.event_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = self.get_success(
self.hs.get_storage_controllers().main.get_event(event1.event_id)
)
self.assertEqual(result.event_id, event1.event_id)
// Make sure other valid events from the send transaction were persisted successfully
self.get_success(
self.hs.get_storage_controllers().main.get_event(event1.event_id)
)

@@ -0,0 +1 @@
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir.

# to use as the `pdu_results` key. Best we can do is just log for our own record and move on.
if possible_event_id != _UNKNOWN_EVENT_ID:
pdu_results[possible_event_id] = {
"error": f"Failed to convert json into event, {e}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"error": f"Failed to convert json into event, {e}"
"error": f"Failed to convert JSON into event: {e}"

Comment on lines +480 to +495
# An event should only have an event_id at this point if it's for a v1/v2 like room.
# In future room versions, the `event_id` is derived from the event canonical JSON.
#
# So if we see a `event_id` but the room version doesn't support
# v1/v2 events, then it's invalid and we should reject it.
if possible_event_id != _UNKNOWN_EVENT_ID:
if room_version.event_format != EventFormatVersions.ROOM_V1_V2:
logger.info(
f"Rejecting event {possible_event_id} from {origin} "
f"because the event was made for a v1 room, "
f"while {room_id} is a v{room_version.identifier} room"
)
pdu_results[possible_event_id] = {
"error": "Event ID should not be supplied in non-v1/v2 room"
}
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of all of this logic here, we could just add an error message to the assertion in the parsing:

assert "event_id" not in event_dict

assert (
    "event_id" not in event_dict
), "Event ID should not be supplied in non-v1/v2 rooms"

Suggested change
# An event should only have an event_id at this point if it's for a v1/v2 like room.
# In future room versions, the `event_id` is derived from the event canonical JSON.
#
# So if we see a `event_id` but the room version doesn't support
# v1/v2 events, then it's invalid and we should reject it.
if possible_event_id != _UNKNOWN_EVENT_ID:
if room_version.event_format != EventFormatVersions.ROOM_V1_V2:
logger.info(
f"Rejecting event {possible_event_id} from {origin} "
f"because the event was made for a v1 room, "
f"while {room_id} is a v{room_version.identifier} room"
)
pdu_results[possible_event_id] = {
"error": "Event ID should not be supplied in non-v1/v2 room"
}
continue

@@ -0,0 +1 @@
Fix a bug where all messages from a server could be blocked because of one bad event. Contributed by @morguldir
Copy link
Contributor

@MadLittleMods MadLittleMods Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're talking about the behavior described here:

Room Version 11

[New in this version] [...] The m.room.create event now keeps the entire content property.

-- https://spec.matrix.org/v1.12/rooms/v11/#redactions

And the m.room.create event defines the content.room_version.

For my own reference, this was added in MSC2176 and included in room version 11 via MSC3820.


I see how this PR allows valid events from a transaction to be received while ignoring invalid ones.

In the scenario where a room falls back to a v1 room, does the room_version we have stored in Synapse get updated?

If the room_version doesn't get updated, this just makes sure that the server can receive the final "valid" events in the room before it broke (people started sending v1 events). Is there an issue tracking whether Synapse should update the room_version after the m.room.create redaction?

If the room_version does get updated to v1, then events are only accepted according to the order received (before/after receiving the redaction forcing the fallback).

Perhaps we should have a test for this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants