-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a new, experimental variant of LookupResources as LookupResources2 #1905
Conversation
41b5468
to
b4b801f
Compare
Moved to draft due to steelthread discovering a pagination bug |
There is a unit test error:
|
…ources2 This implementation should be much faster for intersections, exclusions and caveats due to early tree shearing and check hints
Fixed |
…dispatched resources when checking those already received from another dispatch Adds some parallelism back into LR2
Also adds additional testing to ensure check hints are used in LR2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early comments, still making my way through the PR
} | ||
|
||
// For each entrypoint, load the necessary data and re-dispatch if a subproblem was found. | ||
return withParallelizedStreamingIterableInCursor(ctx, ci, entrypoints, parentStream, crr.concurrencyLimit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, since this warrants a bigger refactor, but I thought it was worth calling this out again given what I observed under load: concurrency limits are not very useful if we don't keep a state around of the available "capacity" of the process. E.g. in non-clustered mode, this leads to large spikes in goroutines, because every local dispatch gets the same limit, and in clustered mode, it leads to the same situation, albeit lessened because there is more capacity. Ultimately this affects tail latencies, as the scheduler scrambles to handle all the scheduling spikes with whatever GOMAXPROCS it has defined. With sufficient load, and all the dispatching happening through wide relations, this can easily overload a cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add it as a tracking field to the dispatch (since we're changing it here anyway) and decrease the counter on each dispatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do that as a follow-up, we may want to generalize it for all APIs, and this could be the foundation to what the paper describes as These safeguards include fine-grained cost accounting, quotas, and throttling
. We could e.g. limit clients by the number of dispatches.
// If all the incoming edges are caveated, then the entire status has to be marked as a check | ||
// is required. Otherwise, if there is at least *one* non-caveated incoming edge, then we can | ||
// return the existing status as a short-circuit for those non-caveated found subjects. | ||
if allCaveated { | ||
resources = append(resources, &v1.PossibleResource{ | ||
ResourceId: resourceID, | ||
ForSubjectIds: subjectIDs, | ||
MissingContextParams: missingContextParameters.AsSlice(), | ||
}) | ||
} else { | ||
resources = append(resources, &v1.PossibleResource{ | ||
ResourceId: resourceID, | ||
ForSubjectIds: nonCaveatedSubjectIDs, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we OK with returning a partial list here?
Let's say we do an LR call, but we missed some important caveat arguments - by mistake. Now we are returning "this is the list of resources the subject has access to". The client has no way to know they missed arguments that could have augmented that list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment describes: if we find a path for the resource that is not caveated, then the result is by definition present, so we don't care about the caveated paths to that same resource.
Missed arguments don't matter if at least one of the paths to the resource has no caveats
internal/caveats/run.go
Outdated
currentResult = syntheticResult{ | ||
value: true, | ||
contextValues: map[string]any{}, | ||
exprString: "", | ||
missingContextParams: []string{}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentResult = syntheticResult{ | |
value: true, | |
contextValues: map[string]any{}, | |
exprString: "", | |
missingContextParams: []string{}, | |
} | |
currentResult.value = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't; the type of the result is not a synthetic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then:
var currentResult ExpressionResult = syntheticResult{
value: cop.Op == core.CaveatOperation_AND,
contextValues: map[string]any{},
exprString: "",
missingContextParams: []string{},
}
internal/caveats/run.go
Outdated
|
||
var contextValues map[string]any | ||
var exprStringPieces []string | ||
|
||
var currentResult ExpressionResult = syntheticResult{ | ||
value: false, | ||
contextValues: map[string]any{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initializing syntheticResult.contextValues
with an empty map does an unnecessary allocation to the heap. combineMap
does take care of nil first argument when merging what has been found, which adds one extra unnecessary allocation.
I did some benchmarks, and combineMaps
can be made faster and do one less allocation in the case one of the arguments is nil or empty.
Given the caveat is being executed for each one of the tuples coming out of ReverseQueryRelationships
via redispatchOrReportOverDatabaseQuery
, I thought this could lead to N unnecessary allocations in the critical path.
func combineMaps(first map[string]any, second map[string]any) map[string]any {
if first == nil || len(first) == 0 {
return maps.Clone(second)
} else if second == nil || len(second) == 0 {
return maps.Clone(first)
}
cloned := maps.Clone(first)
maps.Copy(cloned, second)
return cloned
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to clone them if they aren't changing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please write a benchmark like I did, and make sure it reduces allocations and CPU time for:
- first is nil, second is not
- first has elements, second does not
My code above moved the needle from 3 allocations to 2 and made it 50% faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I don't clone now at all unless necessary, it should be 1 less even
checkHint, err := hints.HintForEntrypoint( | ||
rdc.entrypoint, | ||
resource.Resource.ResourceId, | ||
rdc.parentRequest.TerminalSubject, | ||
&v1.ResourceCheckResult{ | ||
Membership: v1.ResourceCheckResult_MEMBER, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a potentially large payload to send over dispatch. By default it would be 100 elements but we've discussed introducing a flag, and we've observed better results as that number increases.
Why does the hint need to be represented as N hint elements, instead of a more compact/normalized representation? Each one of the resources has the same entrypoint and the same terminal subject. The result is N protos with N copies of the same subject and object namespaces and relations. The only difference by the objectID, and that is also going to be duplicated in the ComputeBulkCheck dispatch operation.
Could we normalize this information, and if necessary, have the client-side derive these N hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the hint need to be represented as N hint elements, instead of a more compact/normalized representation?
Because they are isolated hints
Each one of the resources has the same entrypoint and the same terminal subject.
Here, but that may not apply in the future. This is a generalized hint system.
Could we normalize this information, and if necessary, have the client-side derive these N hints?
Yes, I could have each hint support taking in multiple resource IDs, but that would require more complicated code, so its a tradeoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they are isolated hints
I don't understand what this means. Please elaborate.
Yes, I could have each hint support taking in multiple resource IDs, but that would require more complicated code, so its a tradeoff
My tests indicate that under an LR2 workload, SpiceDB is spending 32% of CPU time in GC (21-23% with my optimizations in #1989) . That's almost a fourth of CPU time collecting garbage (❗). And the vast majority of it is proto allocations.
That CPU time that goes to GC means less CPU time for the massive goroutine fan out LR2 tends to do with certain data shapes and sizes. In turn, it increases tail latencies.
My advice is to make the tradeoff and reduce allocations, the impact cannot be ignored and is globally an issue across all API endpoints. This is very relevant for deployments that have low CPU requests. Feel free to do in a follow up PR so this one does not stay around opened any longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
internal/graph/check.go
Outdated
if len(req.CheckHints) > 0 { | ||
filteredResourcesIdsSet := mapz.NewSet(filteredResourcesIds...) | ||
for _, checkHint := range req.CheckHints { | ||
resourceID, ok := hints.AsCheckHintForComputedUserset(checkHint, req.ResourceRelation, req.Subject) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hints can also be issued for tupleset-to-userset, why can we assume here it's a hint for computed usersets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkInternal
is going to loop over the CheckHints
2 times:
- one to filter the resources to dispatch
- another one during combination
You could return the result
from the hint along side the resourceID
, and build a map like the once combineWithComputedHints
accepts, and call the latter instead. This means you don't have to iterate again over all the hints. This adds up, especially with wide relationships that lead to many batches of 100 elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can we assume here it's a hint for computed usersets?
Because this is processing for the relation itself, and not the arrow.
You could return the result from the hint along side the resourceID...
Except when there aren't any hints, then we're doing work we don't need to do. Its also capping at maybe 1000 elements right now, which means the overhead is likely minimal. I can combine if you like, but it makes the code less readable for a very small improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except when there aren't any hints, then we're doing work we don't need to do.
What work, creating an empty map?
Its also capping at maybe 1000 elements right now, which means the overhead is likely minimal. I can combine if you like, but it makes the code less readable for a very small improvement
How is it making it less readable? You already have a method that accepts a map.
Looking at these things as 1 call with 1000 elements wouldn't be much overhead. The issue is a graph with many elements can lead to many, many dispatches. I think every little bit counts.
internal/graph/check.go
Outdated
} | ||
|
||
if req.OriginalRelationName != "" { | ||
resourceID, ok = hints.AsCheckHintForComputedUserset(checkHint, &core.RelationReference{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not allocate a core.RelationReference
here. Instead you can create a new AsCheckHintForComputedUserset
method that receives the strings directly, and that the method with proto args can reuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate breaking nicer interfaces for optimization but oh well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have maintained the interface the used the proto, and just built on the new signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but then it would have been used in exactly one location, so not worth the overhead IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I've left some additional feedback, but we can address in a follow up
This implementation should be much faster for intersections, exclusions and caveats due to early tree shearing and check hints
This change has three major components: