Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create cashay-server for realtime updates #149

Open
mattkrick opened this issue Dec 18, 2016 · 2 comments
Open

create cashay-server for realtime updates #149

mattkrick opened this issue Dec 18, 2016 · 2 comments

Comments

@mattkrick
Copy link
Owner

The problem is not everyone uses RethinkDB for reactivity & even if they do, they're more or less limited to single-table subscriptions. This makes sense, since it can get pretty expensive to simulate a join & subscribe to it. Apollo offers something that's in the experimental phase, but it's, uh, not robust. Here's the blueprints for how to make something that can match (or exceed) rethinkdb performance while allowing for cross-table subs.

Problem:

  • client calls the getTop5Posts(userId: 'user123') query. This returns documents of type Post with ids A,B,C,D,E
  • a second client calls upvote(postId: 'F') mutation. How do we know who to send this update to? Some folks care about just that document. Other folks don't care about that document yet, but supposing that the upvote gives F more votes than E, we should replace E with F. Doing this naively results in n db hits, where n is the number of channels that include at least 1 post.

Solution:

  • we have a topic lookup table full of queries, which are full of mutations, which contain a factory function for what I call "bump functions".
const topicLookupTable = {
  getTop5Posts: {
    upvote: (minVotes, minVoteId) => (mutatedDoc) => {
      if (mutatedDoc.votes > minVotes) {
        r.table('Post').get(mutatedDoc.id).run().then(post => {
          return {
            removeDocId: minVoteId,
            newDocId: post,
            bumpFnVars: [post.votes, post.id]
          }
        });
      }
    }
  }
};
  • At the end of the resolve method, before returning the array of docs, that socketId does a few things:
  • see if the channel getTop5Posts/user123 exists. if not, create the bump function for it: topicLookupTable[query][mutation](5, 'F'). Store this bump function on the channel getTop5/user123.
  • subscribes to getTop5Posts/user123

The magic of the bump function is that it contains really inexpensive logic (in this case, mutatedDoc.votes > minVotes). Without it, we'd have to re-run each original database function to determine if F replaced E. This is critical because every time upvote gets called, we're gonna have to run through every channel with the getTop5Posts topic. A single Float64 comparison should be cheap enough that JS will work at scale. SocketCluster already contains a message bus, but to save a function on each channel, we'll have to use a key/value store like redis to save the bumpFnVars on each channel.

For the next example, let's try a form of CmRDT. Say we have hell world and we want to correct it. We send: updateContent(changes: {id: 'A', pos: 4, val: 'o'}) to make it hello world. Since it's a C_m_RDT, We'll never have the full state, rather just a transform. That means our mutation will have to adjust the db with just this info. Then, we forward the operational transform onto the client & trust that the client knows how to do it. Since the updateContent mutation can never change the docs that are returned by getTop5Posts, our bumpFn is easy:

(idArr) => (mutatedDoc) => {
  if (idArr.includes(mutatedDoc.id)) {
    return {
      transform: mutatedDoc
    }
  }
} 

For super fine grained performance tweaking, we could consider establishing a discrete channel just for that field: content/content123, but that would be very application specific & could result in a performance net-loss.

A fringe benefit of all of these things is that it means we don't always necessarily need to use a websocket between the client and the server. For example, I can take the return values of the bump functions and store them away in a key/value store under the JWT. Then, when the client long-polls for updates, I just send the array of changes. That means in 1 network request, they get a whole bunch of fresh new info without having to request it from each individual query.

@mattkrick
Copy link
Owner Author

additional thought:
suppose each query can take in 2 additional args:

  • ids: The list of IDs that we currently have on the client
  • lastUpdatedAt: The max of all updatedAt in the list of IDs

With these 2 things, we can greatly reduce the network payload. For example, I subscribe to team members. Then i unsubscribe, then I subscribe again like: teamMembers(teamId: 'team123', ids: ['A', 'B', 'C'], updatedAt: Yesterday)
Now, I run the query. When it resolves from the DB, I get something like this:

const teamMembers = [
  {
    id: 'A',
    updatedAt: 'last week',
    name: 'matt'
  },
  {
    id: 'B',
    updatedAt: 'today',
    name: 'jordan'
  },
  {
    id: 'D',
    updatedAt: 'last week'
  }
]

First, we intersect the result with the ids. On the left side, we have D. On the right side, we have C. In the intersection, we have A,B. WIthin that intersection, we see that A hasn't been updated for a week, so we exclude it. B has been updated since we have recently seen it, so we need to include it. So, we return a result like:

return {
  removeDocId: 'C',
  addDoc: {
    id: 'D',
    updatedAt: 'last week'
  },
  updateDoc: {
    id: 'B',
    updatedAt: 'today',
    name: 'jordan'
  },
};

Now, let's assume we cache this locally & then they refresh the page. The server doesn't even need to reply!

@mattkrick mattkrick mentioned this issue Dec 18, 2016
@dustinfarris
Copy link
Contributor

This looks freakin awesome. I love the diffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants