Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sliding Sync: Avoid fetching left rooms and add back newly_left rooms #17725

Merged
merged 17 commits into from
Sep 19, 2024

Conversation

MadLittleMods
Copy link
Collaborator

@MadLittleMods MadLittleMods commented Sep 18, 2024

Performance optimization: We can avoid fetching rooms that the user has left themselves (which could be a significant amount), then only add back rooms that the user has newly_left (left in the token range of an incremental sync). It's a lot faster to fetch less rooms than fetch them all and throw them away in most cases. Since the user only leaves a room (or is state reset out) once in a blue moon, we can avoid a lot of work.

Based on @erikjohnston's branch, develop...erikj/ss_perf

Todo

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

Comment on lines 1064 to 1084
# Expect to see room1 because it is `newly_left` thanks to being state reset out
# of it since the last time we synced. We need to let the client know that
# something happened and that they are no longer in the room.
self.assertIncludes(
set(response_body["rooms"].keys()), {space_room_id}, exact=True
)
# We set `initial=True` to indicate that the client should reset the state they
# have about the room
self.assertEqual(response_body["rooms"][space_room_id]["initial"], True)
# Empty `required_state` to indicate that the client should clear their state
# (as the user is no longer in the room)
self.assertIsNone(
response_body["rooms"][space_room_id].get("required_state"),
response_body["rooms"][space_room_id],
)
# Where the state reset happened
self.assertEqual(
response_body["rooms"][space_room_id]["bump_stamp"],
join_rule_event_pos.stream,
response_body["rooms"][space_room_id],
)
Copy link
Collaborator Author

@MadLittleMods MadLittleMods Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we actually expect a room that you are state reset out of to look when it comes down Sliding Sync?

This is just based off of my best guess in the comments before. The main thing is initial: True and required_state that doesn't include your membership to try to indicate to the client that the new state doesn't include you anymore. But there is an edge with that if you're not requesting your own membership event, then you wouldn't expect it to be missing.

This is better than before though so we could punt until the MSC figures that out more 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the best we can do for now, and we should deal with this question in the MSC. I don't see any problem with compatibility with landing this as it is

We already don't fetch self-leaves and add back `newly_left`
Comment on lines +275 to 276
# TODO: It would be nice to avoid these copies
room_membership_for_user_map = dict(room_membership_for_user_map)
Copy link
Collaborator Author

@MadLittleMods MadLittleMods Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it crazy to introduce something like a SoftDeleteChainMap? Where we can insert sentinel values on map[0] whenever we want to "delete" something from the map and avoid copying the immutable room_membership_for_user_map.

The Python docs encourage making your own ChainMap subclass variations like DeepChainMap.

Can also just tackle this in a follow-up PR.

SoftDeleteChainMap first draft
class Sentinel(Enum):
    # defining a sentinel in this way allows mypy to correctly handle the
    # type of a dictionary lookup and subsequent type narrowing.
    UNSET_SENTINEL = object()


class SoftDeleteChainMap(ChainMap):
    """
    Variant of ChainMap that allows deleting keys without affecting the underlying maps (maps[1:]).
    Only adds sentinel values to the top-most mutable map[0].

    aka `DeletableChainMap`, `NullableChainMap`
    """

    def __getitem__(self, key):
        for mapping in self.maps:
            try:
                value = mapping[key]  # can't use 'key in mapping' with defaultdict
                if value is Sentinel.UNSET_SENTINEL:
                    return self.__missing__(
                        key
                    )  # support subclasses that define __missing__

                return value
            except KeyError:
                pass
        return self.__missing__(key)  # support subclasses that define __missing__

    def get(self, key, default=None):
        if key in self:
            value = self[key]
            if value is Sentinel.UNSET_SENTINEL:
                return default

            return value
        else:
            return default

    def __delitem__(self, key):
        self.maps[0] = Sentinel.UNSET_SENTINEL

    def popitem(self):
        raise NotImplementedError("popitem is not supported")

    def pop(self, key, *args):
        """
        Remove specified key and return the corresponding value.

        If the key is not found, return the default if given; otherwise, raise a
        KeyError.
        """
        if key not in self:
            if len(args) == 0:
                raise KeyError(key)
            # Return the default if given
            return args[0]

        value = self[key]

        if value is Sentinel.UNSET_SENTINEL:
            if len(args) == 0:
                raise KeyError(key)
            # Return the default if given
            return args[0]

        del self[key]
        return value

    def __iter__(self):
        # TODO: Fix/implement this one
        d = {}
        for mapping in map(dict.fromkeys, reversed(self.maps)):
            d |= mapping  # reuses stored hash values if possible
        return iter(d)

    def __contains__(self, key):
        return any(
            (key in m and m[key] is not Sentinel.UNSET_SENTINEL) for m in self.maps
        )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that could make a lot of sense. Annoying that we have to roll our own

@@ -444,8 +488,6 @@ async def _compute_interested_rooms_new_tables(
if room_id in partial_state_rooms:
continue

all_rooms.add(room_id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because we already add it just above

@@ -460,7 +502,7 @@ async def _compute_interested_rooms_new_tables(

# Filtered subset of `relevant_room_map` for rooms that may have updates
# (in the event stream)
relevant_rooms_to_send_map = await self._filter_relevant_room_to_send(
relevant_rooms_to_send_map = await self._filter_relevant_rooms_to_send(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just renaming to be proper plural

response_body, _ = self.do_sync(sync_body, since=from_token, tok=user1_tok)

if self.use_new_tables:
if server_leaves_room:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if-tree is a bit messy but the test setup is so similar I think I'd rather have this than duplicating the test a bunch.

The extra self.use_new_tables layer will disappear in the future

@MadLittleMods MadLittleMods marked this pull request as ready for review September 18, 2024 06:24
@MadLittleMods MadLittleMods requested a review from a team as a code owner September 18, 2024 06:24
Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, thanks!

@MadLittleMods MadLittleMods merged commit c2e5e9e into develop Sep 19, 2024
39 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/sliding-sync-add-back-newly-left branch September 19, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants