Improve page and partial caching across the site #32

dhedlund · 2013-08-02T10:44:12Z

This is a fairly generic ticket and is meant to be "take a stab at some of the low hanging fruit". The extent of our caching for dynamic content is as follows:

The events index page caches the list of events using a cache fragment (see app/views/events/index.html.haml). If anything changes for any events or venue, for any reason, all caching is cleared for the entire site (see app/observers/cache_observer.rb).

That's it.

Also, any changes to any events or venues on any site clears the cache for all sites. Not scalable.

Scope of Ticket

Pick some low hanging fruit. The show pages for events and venues seem like obvious choices to start with. See the following article by DHH for inspiration (we will be using memcached on production):
http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works (also has 67 comments, worth reading)
Try to get the existing caching for the events index page to not clear the entire caching backend. Using the key-based-cache expiring in DHH's article, you could instead rely on the maximum Event#updated_at timestamp. You would, however, first have to tie the model objects together so they update the event's timestamp as necessary. The site has a fairly high read vs. edit ratio so slowing down write performance is not a concern.
Create new tickets as you come across things that could be optimized but need more significant code changes to do correctly.

The text was updated successfully, but these errors were encountered:

dhedlund · 2014-01-17T03:09:34Z

Was closed accidentally by upstream calagator merge w/ a "fixes ##" mention.

lindsaycaron · 2014-03-01T08:27:05Z

Additional details / thoughts from Daniel Feb 27th:
The caching code was originally written when the software was designed to supported one site, before we forked it and added multi-tenancy; nuking the entire cache from orbit "worked" for their use case. Memcached does not offer support for wildcards to nuke just one site's cache, i.e. "deleting all keys with prefix X"; there are some workarounds (http://stackoverflow.com/questions/1595904/memcache-and-wildcards), but the better solution is to either:

Rebuild the HTML for the calendar in the background whenever a change is made to an event (create a background job as part of an after_save callback). This would require another heroku dyno to do background processing off of a jobs queue that is idle 99.99% of the time. We might be able to work around needing a background queue if we trigger a background web request using ajax on the page received after the form submission.
Rebuild the HTML for the calendar whenever the web request detects something on the page has changed more recently than the existing generated page. Generating the calendar is actually pretty slow right now because it pipes each event through a couple conversion layers (both markdown and a html scrubber); the results of the conversion for each of these events should be cached independently of the aggregate calendar view. The calendar view would then just be 'pass the changed events through the converter" and then pull them together in the aggregate view. This requires the events' updated_at timestamp to get updated when related objects change (the venue title for example), which requires going through the code and making sure objects are touching each other when needed and to add russian-doll-style caching blogs around most of the views. A better solution but a bit more work.

As things stand right now, the calendar can sometimes take a few seconds to regenerate when it sees changes. If the server is getting a hit or two a second (we're getting less than a hit a minute) and an event changes, multiple web requests will all see that the cache is out of date, start generating a new cached page and slow the machine down enough that it'll take 10+ more seconds to generate the new cached page...at which time we'll get another 10 requests trying to generate the page and easily overload the server. So yeah, for our existing traffic and scaling more slowly, we're okay for now; for any sudden influx, it could easily become problematic.

(note: we could probably set some value in memcache that says we're currently generating a calendar before we start and just have the other requests wait instead of starting to regenerate the calendar again on their own...there are a lot of different possibilities, just nothing has been done yet)

mbijon · 2014-03-05T20:47:27Z

Good chatting Lindsey, here are some notes on creating pseudo-namespaces in memcached. The reason for "memcached" (with the "d") is that it supports pieces of the protocol that older clients like memcache-client didn't. Since you're on Heroku you probably have a client that supports this, even if the Calagator devs didn't take adantage of the better protocol support, https://github.com/mperham/dalli/wiki/Heroku-Configuration. Here's a simple explanation of how things moved on since "memcache": http://stackoverflow.com/a/1442747/982920

Here's a good strategy for setting prefixes. You can make them site/city specific & then separate by object type (event org, etc) so the cache isn't monolithic:

https://code.google.com/p/memcached/wiki/NewProgrammingTricks#Namespacing -- not specific to Rails, but strategy is good
A Rails-specific version of this was built, but seems to be taken down from Github now: http://www.edicy.com/developer/blog/namespaced-cache-expiring-with-memcached

lindsaycaron · 2014-03-06T23:56:02Z

Mike also suggested:

Cache Key should be site specific
Ideally, key extensions linked to Topics, Types or Event ID# to minimize clearing

lindsaycaron · 2014-03-08T06:38:41Z

Tim Connor comments:
If you can get everything updated to Rails 4 you can use their fragment caching stuff to handle #2 and some of this better in general, probably.

mbijon · 2014-03-08T07:51:30Z

As caching strategies go the Russian Doll approach you can build with fragment caching is great, but I don't think your app is ready for it.

However:
(1) It means significant refactoring beyond just the upgrade to Rails 4. You'd still need to wrap each object in a cache block and add an expiration action per-each.

(2) Fragment caching defaults to disk storage but it's recommended to back it with memcached or redis to get a lot more speed from it. Since you currently only use memcached you can store things there ... but that's also where you cache flush cascade problems happen. So you either use slower disk cache or lose your fragments when memcache dumps like it does now & then it takes even longer to catch up while the server rebuilds both the memcache & the fragment caches.

(3) You could introduce a shared Redis instance to store fragments for one or more servers, which would prevent the overlapping cache expirations. This would actually get you pretty far down the road, but you'll still have the memcache cache shared between all the sites. Essentially that will be a broken cache strategy that's covered for a while by a better cache. But as you grow to new cities you'll either exhaust how far fragments get you & need memcache again ... and it will be a lot bigger problem to solve then.

Ignoring the doom & gloom: I'll recommend fixing memcache before you add another cache. I prefer fixing something that's not right rather than patching over it & seeing how long it stays covered.

lindsaycaron · 2014-03-08T20:29:25Z

From Jesse Cooke:

Suggestions:

If memcached is not doing what you want, but Redis could, switch. I'm not sure how exactly memcached is being used, but you can usually switch to Redis pretty easily. Like @mbijon said, if Redis can get you "pretty far down the road" then it could be a nice stop-gap.
Daniel had mentioned rebuilding the HTML in a background job. You can do that with a library like sucker_punch and not pay the cost of running another dyno on Heroku. I doubt pre-building will solve your problem though.
Russian-doll caching sounds like it could work really well for your use case, but I doubt it'd be easy or cost-effective to get it working in Rails 3.2.x

Where exactly is the bottleneck? Is it in the rendering of the view? Are you rendering too much at once? Do you know exactly what numbers aren't good, and how you'd like to improve them? If not, you need to know the answers to those questions. Performance tuning is all about measurement. There looks to be a lot of stuff going on in the _calendar.html.haml partial.

You know, one thing you could do is pre-populate the calendar structure, and then dynamically add events to the days as needed. I mean, Feb 2014 is never going to change, and you'll know exactly what April 2014 is going to look like. If the generation of that HTML is significant, that could be a good place to cache.

Reminds me of what an old Smalltalker told me once about date objects... once you create it, you can keep it around forever. There's no need to recalculate a date because it's a fact & it'll never change.

lindsaycaron · 2014-03-08T21:16:44Z

Zack Hobson:
In my opinion, the best way forward is to render the calendar asynchronously. As Daniel pointed out, it’s not strictly necessary to use a worker dyno for this

mbijon · 2014-03-08T21:19:39Z

That's an interesting point about the calendar populating dynamically. I
can imagine an async, lazy-load-like fill-in of the calendar would scale
well ... But the current app won't do this with the current controller
methods.

Sucker_punch may be an interesting option if we purposefully dump the cache
at 1-3am. Then we'd (almost) never have a cold cache during the day and
there'd be a few hours in the AM for sucker_punch to catch up before
traffic builds. Do you happen to have New Relic or similar APM tool setup?

lindsaycaron · 2014-03-08T21:46:15Z

Yes, we have New Relic. Sort of. It was set-up for our staging site pre-Beta and pre-transition to Heroku. So... I think the below metrics are for a site that isn't representative of our live version, and only a fraction of its traffic, but presumably uses the same bones? If all of that is true, then it should still provide some insight. And.. we're going to need to sync our live sites.

dhedlund · 2014-03-19T04:34:13Z

I added basic key-based russian doll style fragment caching was added in commit 496ea84. We didn't need to upgrade to rails 4, as rails makes a cache_digests gem available. We already have memcached as a backend cache store which auto expires old data when it fills up, saving us from having to think about it. Most content is now covered and takes less than 100ms to render with a cache hit (about 60ms locally including db query). It still takes about a second on a cache miss to render the events index page when there are a large number of events, but it's way better than it was and it can take advantage of individual calendar days being cached to speed up the process.

@mbjion, if there's anything in this ticket that you think we should continue to pursue to improve performance, please feel free to open up a new ticket to track it.

lindsaycaron · 2014-05-06T00:27:05Z

I believe this solution introduced a bug. #81 When an existing event is edited, upon clicking "Update", it shows up perfectly. Social sharing buttons and maps are formatted properly. But as soon as you 'refresh', the Google Map & "Share It" buttons disappear. Seems it's related to caching.

dhedlund mentioned this issue Aug 24, 2013

Move to Ruby 2.0.0 #48

Closed

dhedlund closed this as completed in 55d5db3 Nov 20, 2013

dhedlund reopened this Jan 17, 2014

dhedlund closed this as completed Mar 19, 2014

lindsaycaron mentioned this issue May 6, 2014

Refreshing a recently edited page deletes Map and Sharing. #81

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve page and partial caching across the site #32

Improve page and partial caching across the site #32

dhedlund commented Aug 2, 2013

dhedlund commented Jan 17, 2014

lindsaycaron commented Mar 1, 2014

mbijon commented Mar 5, 2014

lindsaycaron commented Mar 6, 2014

lindsaycaron commented Mar 8, 2014

mbijon commented Mar 8, 2014

lindsaycaron commented Mar 8, 2014

lindsaycaron commented Mar 8, 2014

mbijon commented Mar 8, 2014

lindsaycaron commented Mar 8, 2014

dhedlund commented Mar 19, 2014

lindsaycaron commented May 6, 2014

Improve page and partial caching across the site #32

Improve page and partial caching across the site #32

Comments

dhedlund commented Aug 2, 2013

Scope of Ticket

dhedlund commented Jan 17, 2014

lindsaycaron commented Mar 1, 2014

mbijon commented Mar 5, 2014

lindsaycaron commented Mar 6, 2014

lindsaycaron commented Mar 8, 2014

mbijon commented Mar 8, 2014

lindsaycaron commented Mar 8, 2014

lindsaycaron commented Mar 8, 2014

mbijon commented Mar 8, 2014

lindsaycaron commented Mar 8, 2014

dhedlund commented Mar 19, 2014

lindsaycaron commented May 6, 2014