add caching headers for posts #786

igalic · 2020-06-19T10:46:48Z

until we've switched to using the async version of Rocket, we'll have serious performance issues.

Most installations of Plume are already behind a (caching) proxy server. we can use this to fix the current performance issues by sending caching headers for posts pages:

How?

the posts page should send an ETag with each response.
The advantage here is that we can have updates, and comments, since this means that we send new ETag header.
to spare a lookup in the database, we could store data about whether the post's cache is invalidated within rocket (that would be a 304 response to a HEAD request)

n.b.: The (caching) proxy server admin could adjust the time themselves, we wouldn't be sending any ourselves.

Alternatives

are currently a lot more effort:

and there's also: rwf2/Rocket#1343
which is similar to #777, but on rocket level

those aren't really alternatives, since this suggestion is operating at HTTP level, it's more orthogonal.

The text was updated successfully, but these errors were encountered:

elegaanz · 2020-06-19T10:55:51Z

We already have some kind of caching for static assets : https://github.com/Plume-org/Plume/blob/master/src/routes/mod.rs#L207-L270

But adding it to templates and/or federation pages could improve performances too.

igalic · 2020-06-19T11:42:37Z

does rocket derive #[head()] for these routes itself?

elegaanz · 2020-06-19T11:47:05Z

Yes

igalic · 2020-06-19T12:28:46Z

if we need a DB lookup to construct a body which might still be in a downstream cache, then we should define HEAD ourselves

elegaanz · 2020-06-19T15:03:50Z

Maybe we can have a Vec<String> storing the slugs of the posts that need an update in Rocket's state, and only show an updated version when needed.

Another solution would be to render each post to a given HTML or AP file when the post or comments change, and serve this files statically, using the modification date to generate the cache headers.

trinity-1686a · 2020-06-21T13:30:56Z

There is already caching for any templated page, with the following limitations :

reply that page is cached only for GET request (should probably be HEAD too actually)
page must not contains form (due to csrf tokens, but actually this should be removed due to next point)
it actually does db lookup and generate the page, then hash it, so it saves only bandwidth, not cpu/io

This should be removed when more intelligent caching get implemented

Implementation is in src/template_utils.rs

igalic · 2020-06-21T16:45:21Z

can you propose a patch out of this situation?

and do you believe it would help us with our current performance issues, before we switch to async rocket?

trinity-1686a · 2020-06-21T18:11:57Z

I'll look into it if I got some time.

Honestly I have no idea, I'm not operating any instances so I don't have neither logs nor a flamegraph of an instance under non-synthetic load, both of which would help with a diagnosis. If someone have a day or a week worth of logs from fediverse.blog that would be interesting

I just noticed the Subscribe button on any post is actually a form, so it is never cached by clients 😐
~~I also noticed that for some reason, caching, for static or templated resources, seems broken, so it might be a good first step to find why and how much it helps~~
So it's broken on most public servers I found, but not when running a local instance, without reverse proxy or with Caddy. It might be related to this discussion

igalic · 2020-06-21T19:01:07Z

https://mailman.nginx.org/pipermail/nginx/2013-September/040425.html

The ETag header is removed if nginx changes a response returned.
That is, if you have gzip, gunzip, sub, addition, ssi or xslt
filters applied to responses returned.

so… instead of adding it's own ETag for such filters, nginx just drops them?

n.b.: Apache HTTPd does not do that.

Subscribe button on any post is actually a form,

this is bad, probably. let's open a bug for that.

trinity-1686a · 2020-06-21T19:24:54Z

Apparently it does not really remove etags, but change them from strong to weak (see rfc for the exact meaning), which means it matching for "tag" or W/"tag" should maybe work.
See patch if someone want to test

patch

From 9916d1692ed8ef12fd1d3ee941ed5796345a2b97 Mon Sep 17 00:00:00 2001
From: Trinity Pointard <[email protected]>
Date: Sun, 21 Jun 2020 21:17:33 +0200
Subject: [PATCH] try fixing cache with nginx

---
 src/template_utils.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/template_utils.rs b/src/template_utils.rs
index 26bf58b..5af9f76 100644
--- a/src/template_utils.rs
+++ b/src/template_utils.rs
@@ -63,7 +63,10 @@ impl<'r> Responder<'r> for Ructe {
         let etag = format!("{:x}", hasher.finish());
         if r.headers()
             .get("If-None-Match")
-            .any(|s| s[1..s.len() - 1] == etag)
+            .any(|s| {
+                s[1..s.len() - 1] == etag ||
+                s[3..s.len() - 1] == etag
+            })
         {
             Response::build()
                 .status(Status::NotModified)
-- 
2.27.0

elegaanz · 2020-06-22T12:19:44Z

I only have the last six hours of the access logs of fediverse.blog, but it case it helps, I can send them to you (and I can configure nginx to keep them longer).

Also, do you want me to deploy the above patch on fediverse.blog to see if it helps?

trinity-1686a · 2020-06-22T12:51:11Z

if it's only 6 hours, could you take them around 8-10pm cest, so that there is afternoon and evening? That's when I guess there is the most load.

You can deploy it, it won't break anything for sure, but I don't know if it'll help, it should only reduce bandwidth usage

elegaanz · 2020-06-22T12:57:45Z

I'll give you the logs this evening then. I'll deploy the patch after that, to see if there is a difference. Thank you for your help!

@trinity-1686a

Based on @trinity-1686a's patch: #786 (comment)

@trinity-1686a

Based on @trinity-1686a's patch: #786 (comment)

elegaanz added a commit that referenced this issue Jun 25, 2020

Better caching

45cebcf

Based on @trinity-1686a's patch: #786 (comment)

elegaanz mentioned this issue Jun 25, 2020

Better caching #794

Merged

elegaanz added a commit that referenced this issue Jun 25, 2020

Better caching

90fdade

Based on @trinity-1686a's patch: #786 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add caching headers for posts #786

add caching headers for posts #786

igalic commented Jun 19, 2020

elegaanz commented Jun 19, 2020 •

edited

Loading

igalic commented Jun 19, 2020

elegaanz commented Jun 19, 2020

igalic commented Jun 19, 2020

elegaanz commented Jun 19, 2020

trinity-1686a commented Jun 21, 2020 •

edited

Loading

igalic commented Jun 21, 2020

trinity-1686a commented Jun 21, 2020 •

edited

Loading

igalic commented Jun 21, 2020

trinity-1686a commented Jun 21, 2020

elegaanz commented Jun 22, 2020

trinity-1686a commented Jun 22, 2020

elegaanz commented Jun 22, 2020

add caching headers for posts #786

add caching headers for posts #786

Comments

igalic commented Jun 19, 2020

How?

Alternatives

elegaanz commented Jun 19, 2020 • edited Loading

igalic commented Jun 19, 2020

elegaanz commented Jun 19, 2020

igalic commented Jun 19, 2020

elegaanz commented Jun 19, 2020

trinity-1686a commented Jun 21, 2020 • edited Loading

igalic commented Jun 21, 2020

trinity-1686a commented Jun 21, 2020 • edited Loading

igalic commented Jun 21, 2020

trinity-1686a commented Jun 21, 2020

elegaanz commented Jun 22, 2020

trinity-1686a commented Jun 22, 2020

elegaanz commented Jun 22, 2020

elegaanz commented Jun 19, 2020 •

edited

Loading

trinity-1686a commented Jun 21, 2020 •

edited

Loading

trinity-1686a commented Jun 21, 2020 •

edited

Loading