OpenFGA: Add request cache to the OpenFGA datastore #14557

gabrielmougard · 2024-11-29T14:39:58Z

This PR decrease the latency of API calls that are going through the OpenFGA fine-grained authorizer. Thanks to a per-request cache mechanism, we avoid calling the database when the cache key is present.

This work has been benchmarked and has been used in the following PR: #14476

openfga_benchmark.pdf

The benchmarking script can be found at: https://paste.ubuntu.com/p/WCwsk6gSSK/

lxd/daemon.go

lxd/db/openfga/openfga.go

markylaing

General LGTM, just a couple of nits :)

gabrielmougard · 2024-11-29T15:24:08Z

Thanks! Fixing this right away

Signed-off-by: Gabriel Mougard <[email protected]>

…name. Signed-off-by: Mark Laing <[email protected]>

Signed-off-by: Mark Laing <[email protected]> Signed-off-by: Gabriel Mougard <[email protected]>

Signed-off-by: Gabriel Mougard <[email protected]>

Signed-off-by: Mark Laing <[email protected]>

Signed-off-by: Gabriel Mougard <[email protected]>

gabrielmougard · 2024-12-04T11:30:32Z

@tomponline @markylaing updated (thanks Mark for helping documenting this)

markylaing

LGTM, thanks!

tomponline · 2024-12-06T13:14:27Z

lxd/db/openfga/openfga.go

@@ -248,10 +345,93 @@ func (o *openfgaStore) ReadUsersetTuples(ctx context.Context, store string, filt
 		return nil, fmt.Errorf("ReadUsersetTuples: Failed to parse entity URL %q: %w", entityURL, err)
 	}

-	var groupNames []string
+	// Get cache from context. If it is not present, we'll fall back to calling the database directly.
+	cache, err := request.GetCtxValue[*RequestCache](ctx, request.CtxOpenFGARequestCache)


why do we need this section, if the cache is not in the context we have a bug right? Why would we mask over that here?

I asked @gabrielmougard to do this in the original PR; which performed caching slightly differently, but the reasoning is the same.

The reason is that for long running requests we may not want to use the cache. For example, currently have an open issue to improve authorization for the events API. I anticipate that this will check can_view on a per-resource basis. If we use this cache for this kind of request, we'll have a snapshot of permissions at the time of the original request, but this may change while the web socket is still connected.

So this logic is in anticipation of setting the value of the request cache to nil in the request context, in the edge cases where we don't want to use a cache.

tomponline · 2024-12-06T13:20:21Z

lxd/auth/drivers/openfga.go

@@ -84,10 +84,6 @@ func (e *embeddedOpenFGA) load(ctx context.Context, identityCache *identity.Cach
 		server.WithDatastore(opts.openfgaDatastore),
 		// Use our logger.
 		server.WithLogger(openfgaLogger{l: e.logger}),
-		// Set the max concurrency to 1 for both read and check requests.


why did we need this before?

What changed?

The commit message would be improved with a why rather than just a what.

The OpenFGADatastore implementation when first implemented wasn't explicitly safe for concurrency. (i.e. I wasn't really thinking about it on the first pass, I was just trying to get something working 😂). So I added these configs initially for a bit of safety.

I think we could have removed the restrictions before this, because it is safe to have concurrent transactions (even though they will be serialised by DQLite).

With caching there is now a significant benefit to concurrency. Since pre-fetching the cache on the first encountered call means that subsequent calls with the same cache may not need to hit the DB.

With caching there is now a significant benefit to concurrency. Since pre-fetching the cache on the first encountered call means that subsequent calls with the same cache may not need to hit the DB.

So let me check I understand this.

There is a single long-lived openfga embedded instance, so enabling concurrency allows for that openfga server to check access for multiple API requests concurrently right?

But within a single request, the cache exists only for the duration of the request, but what is concurrently happening in the openga server then for a single request? Are you saying that we are pushing down the locking from openga server-level to request-level inside the request cache itself?

BTW the reason I'm asking is im trying to understand whether we actually need the complexity of 2 rw locks + an atomic variable or whether we can use a single mutex and check for whether the variables themselves are initialized.

@markylaing explained in a call that the openfga server will traverse the module and make concurrent calls to the DB driver (and thus the cache) even for a single API request.

tomponline

I have some questions about the concurrency aspect.

Why did we restrict openfga concurrency before and why dont we need to now?
For the per-request cache in the context, in what scenarios will openfga concurrently read or write to the cache for a single request?