Skip to content
This repository has been archived by the owner on Jul 23, 2019. It is now read-only.

Optimize cursor movement and selection manipulation #45

Merged
merged 5 commits into from
Mar 16, 2018

Conversation

as-cii
Copy link
Contributor

@as-cii as-cii commented Mar 15, 2018

This pull request contains a few changes that will optimize selection manipulation and cursor movement. I'd recommend reviewing it on a per-commit basis:

  • f1361e8 fixes a logic error that was using Anchor::offset as an offset into the fragment
  • cc7ce29 caches anchor positions until the buffer is edited
  • a829f24 wraps FragmentId's vector of identifiers into an Rc
  • 29b886e uses statically-allocated values for FragmentId::{min, max}_value
  • 2d6c1b7 caches the mapping between offsets and points until the buffer is edited

I have tested the above optimizations following these steps:

  • Edit the buffer 10000 times
  • Add 1000 selections
  • Start profiling
  • Move right
  • Stop profiling
Before After
before after

Note how the time spent in TextEditor (i.e., the call to Editor::move_right) has been reduced by a 35x factor. Performance is similar when moving cursors left, up or down.

Next Steps

We think this is a good first improvement that will allow xray to stay well under the frame budget, even when the user is manipulating thousands of selections at the same time. We are currently focusing on other areas of the system and, therefore, we are thinking to put further improvements to this on hold until more architectural concerns have been sorted out.

In the future we may revisit some areas of this code to optimize it even further, for example:

  • Accesses to the B+ tree can have more cache-locality. Right now, accessing nodes and fragment ids always requires to follow a pointer wrapped by an Arc. This reduces the chances of having the data we need ready to consume in the cache during a tree traversal. As such, it might be interesting to explore using different representations for Node and FragmentId so that they are more cache-friendly.
  • Reuse the same cursor across different tree traversals. Cursors use a Vec to represent a stack of seen nodes during a traversal. While a single vector allocation is innocuous, doing one of them for each selection means we will repeatedly allocate a value on the heap and free it shortly afterward. By using the same (pool of) cursor(s) over and over again, we will minimize memory allocations, which we've learned to be very expensive on hot code paths like this. One implementation thought for this could be to instantiate a cursor during the initialization of a buffer, wrapping it in a Arc. Then, we could hand out clones of that Arc and, in order to use the cursor, callers will have to invoke make_mut on it. If the cursor is used by someone else, it will just be copied and only then we will pay for a heap allocation.
  • Speed up FragmentId comparisons. When resolving the position of an anchor we may end up comparing many fragment identifiers. Right now the size of a fragment id isn't known at compile time, so the number of comparison operations is not constant. We can partially solve this by using a more cache-friendly representation like suggested above, but constant factors may still be relatively high. As such, another idea would be to explore using SIMD instructions to compare two arbitrary fragment ids. This StackOverflow thread gives some insights on how we could do that.
  • Overall, minimize heap accesses and use the stack as much as possible.

/cc: @nathansobo

Antonio Scandurra added 4 commits March 15, 2018 11:22
This fixes a logic error on master that uses `Anchor::offset` as an
offset into the fragment. This is wrong because fragments could be
subject to further splits, thus making the relative offset potentially
invalid.
This avoids unnecessary lookups in the buffer tree when the anchor's
position has already been retrieved since the last edit.

We are representing the cache as a RefCell to avoid the need to acquire
a mutable reference to the buffer even if we just need to query it. If
the cache is already borrowed for some reason, we will simply skip
writing to it.
This will allow to efficiently clone the FragmentId (i.e., without heap
allocations) for comparison purposes when it's used during a tree
traversal.
`lazy_static!` requires variables to be thread-safe (for good reasons),
so we have also replaced `Rc` with an `Arc`. This doesn't seem to affect
performance in a significant way.
@nathansobo
Copy link

When visiting a given node in the tree, we currently have to follow all the child pointers in order to fetch their spatial summaries. We should store a copy of the spatial summary of each child in the node that points to those children, so that we don't need to follow the pointer to a child unless we know it contains our target leaf. Combined with your other suggestions, this seems like it will greatly increase the cache friendliness of tree searches. Once we do this, we should experiment with different maximum node sizes to see what works the best, although it probably depends on the size of the summary nodes for any given tree.

It's also worth noting that #24 is relevant to some of what you have described here.

When moving cursors, we heavily rely on `Buffer::len_for_row` to
understand when such cursors should wrap. This method is implemented by
traversing the tree twice: the first traversal gets the offset of the
beginning of the requested line, whereas the second traversal gets the
offset of the line next to the requested one. Then, the two offsets are
subtracted to obtain the length of the line.

With this commit we start caching the mapping between offsets and
points. This allows to skip the two traversals and retrieve the length
of a line in constant time if the line length had been computed before
and no edit has occurred since then.
@nathansobo nathansobo merged commit dfd692b into master Mar 16, 2018
@nathansobo nathansobo deleted the optimize-selections branch March 16, 2018 22:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants