Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new, experimental variant of LookupResources as LookupResources2 #1905

Merged
merged 5 commits into from
Jul 24, 2024

Conversation

josephschorr
Copy link
Member

@josephschorr josephschorr commented May 20, 2024

This implementation should be much faster for intersections, exclusions and caveats due to early tree shearing and check hints

This change has three major components:

  1. Change to caveats expression handling to early terminate when relevant
  2. Ability to specify hints to checks to avoid recompuation
  3. Implementation of LR2

@github-actions github-actions bot added area/CLI Affects the command line area/api v1 Affects the v1 API area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) area/dispatch Affects dispatching of requests labels May 20, 2024
@josephschorr josephschorr marked this pull request as ready for review May 21, 2024 15:26
@josephschorr josephschorr requested a review from a team as a code owner May 21, 2024 15:26
@josephschorr josephschorr marked this pull request as draft July 8, 2024 15:22
@josephschorr
Copy link
Member Author

Moved to draft due to steelthread discovering a pagination bug

@vroldanbet
Copy link
Contributor

There is a unit test error:

--- FAIL: TestSimpleLookupResources2 (0.24s)
    --- FAIL: TestSimpleLookupResources2/document#view->user:legal (0.06s)
        lookupresources2_test.go:122: 
            	Error Trace:	/home/runner/actions-runner/_work/spicedb/spicedb/internal/dispatch/graph/lookupresources2_test.go:122
            	Error:      	Not equal: 
            	            	expected: 0x3
            	            	actual  : 0x1
            	Test:       	TestSimpleLookupResources2/document#view->user:legal
            	Messages:   	Depth required mismatch

…ources2

This implementation should be much faster for intersections, exclusions and caveats due to early tree shearing and check hints
@josephschorr
Copy link
Member Author

There is a unit test error:
...

Fixed

…dispatched resources when checking those already received from another dispatch

Adds some parallelism back into LR2
Also adds additional testing to ensure check hints are used in LR2
Copy link
Contributor

@vroldanbet vroldanbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early comments, still making my way through the PR

internal/dispatch/graph/graph.go Outdated Show resolved Hide resolved
internal/services/v1/permissions.go Show resolved Hide resolved
internal/graph/lookupresources2.go Outdated Show resolved Hide resolved
}

// For each entrypoint, load the necessary data and re-dispatch if a subproblem was found.
return withParallelizedStreamingIterableInCursor(ctx, ci, entrypoints, parentStream, crr.concurrencyLimit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, since this warrants a bigger refactor, but I thought it was worth calling this out again given what I observed under load: concurrency limits are not very useful if we don't keep a state around of the available "capacity" of the process. E.g. in non-clustered mode, this leads to large spikes in goroutines, because every local dispatch gets the same limit, and in clustered mode, it leads to the same situation, albeit lessened because there is more capacity. Ultimately this affects tail latencies, as the scheduler scrambles to handle all the scheduling spikes with whatever GOMAXPROCS it has defined. With sufficient load, and all the dispatching happening through wide relations, this can easily overload a cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add it as a tracking field to the dispatch (since we're changing it here anyway) and decrease the counter on each dispatch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do that as a follow-up, we may want to generalize it for all APIs, and this could be the foundation to what the paper describes as These safeguards include fine-grained cost accounting, quotas, and throttling. We could e.g. limit clients by the number of dispatches.

internal/graph/resourcesubjectsmap2.go Show resolved Hide resolved
Comment on lines +141 to +155
// If all the incoming edges are caveated, then the entire status has to be marked as a check
// is required. Otherwise, if there is at least *one* non-caveated incoming edge, then we can
// return the existing status as a short-circuit for those non-caveated found subjects.
if allCaveated {
resources = append(resources, &v1.PossibleResource{
ResourceId: resourceID,
ForSubjectIds: subjectIDs,
MissingContextParams: missingContextParameters.AsSlice(),
})
} else {
resources = append(resources, &v1.PossibleResource{
ResourceId: resourceID,
ForSubjectIds: nonCaveatedSubjectIDs,
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we OK with returning a partial list here?

Let's say we do an LR call, but we missed some important caveat arguments - by mistake. Now we are returning "this is the list of resources the subject has access to". The client has no way to know they missed arguments that could have augmented that list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment describes: if we find a path for the resource that is not caveated, then the result is by definition present, so we don't care about the caveated paths to that same resource.

Missed arguments don't matter if at least one of the paths to the resource has no caveats

internal/testserver/server.go Outdated Show resolved Hide resolved
internal/caveats/run.go Outdated Show resolved Hide resolved
internal/caveats/run.go Outdated Show resolved Hide resolved
Comment on lines 230 to 235
currentResult = syntheticResult{
value: true,
contextValues: map[string]any{},
exprString: "",
missingContextParams: []string{},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
currentResult = syntheticResult{
value: true,
contextValues: map[string]any{},
exprString: "",
missingContextParams: []string{},
}
currentResult.value = true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't; the type of the result is not a synthetic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then:

	var currentResult ExpressionResult = syntheticResult{
		value:                cop.Op == core.CaveatOperation_AND,
		contextValues:        map[string]any{},
		exprString:           "",
		missingContextParams: []string{},
	}


var contextValues map[string]any
var exprStringPieces []string

var currentResult ExpressionResult = syntheticResult{
value: false,
contextValues: map[string]any{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing syntheticResult.contextValues with an empty map does an unnecessary allocation to the heap. combineMap does take care of nil first argument when merging what has been found, which adds one extra unnecessary allocation.

I did some benchmarks, and combineMaps can be made faster and do one less allocation in the case one of the arguments is nil or empty.

Given the caveat is being executed for each one of the tuples coming out of ReverseQueryRelationships via redispatchOrReportOverDatabaseQuery, I thought this could lead to N unnecessary allocations in the critical path.

func combineMaps(first map[string]any, second map[string]any) map[string]any {
	if first == nil || len(first) == 0 {
		return maps.Clone(second)
	} else if second == nil || len(second) == 0 {
		return maps.Clone(first)
	}

	cloned := maps.Clone(first)
	maps.Copy(cloned, second)
	return cloned
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to clone them if they aren't changing

Copy link
Contributor

@vroldanbet vroldanbet Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please write a benchmark like I did, and make sure it reduces allocations and CPU time for:

  • first is nil, second is not
  • first has elements, second does not

My code above moved the needle from 3 allocations to 2 and made it 50% faster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't clone now at all unless necessary, it should be 1 less even

internal/caveats/run.go Show resolved Hide resolved
internal/caveats/run.go Outdated Show resolved Hide resolved
internal/graph/lr2streams.go Outdated Show resolved Hide resolved
Comment on lines +132 to +138
checkHint, err := hints.HintForEntrypoint(
rdc.entrypoint,
resource.Resource.ResourceId,
rdc.parentRequest.TerminalSubject,
&v1.ResourceCheckResult{
Membership: v1.ResourceCheckResult_MEMBER,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a potentially large payload to send over dispatch. By default it would be 100 elements but we've discussed introducing a flag, and we've observed better results as that number increases.

Why does the hint need to be represented as N hint elements, instead of a more compact/normalized representation? Each one of the resources has the same entrypoint and the same terminal subject. The result is N protos with N copies of the same subject and object namespaces and relations. The only difference by the objectID, and that is also going to be duplicated in the ComputeBulkCheck dispatch operation.

Could we normalize this information, and if necessary, have the client-side derive these N hints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the hint need to be represented as N hint elements, instead of a more compact/normalized representation?

Because they are isolated hints

Each one of the resources has the same entrypoint and the same terminal subject.

Here, but that may not apply in the future. This is a generalized hint system.

Could we normalize this information, and if necessary, have the client-side derive these N hints?

Yes, I could have each hint support taking in multiple resource IDs, but that would require more complicated code, so its a tradeoff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they are isolated hints

I don't understand what this means. Please elaborate.

Yes, I could have each hint support taking in multiple resource IDs, but that would require more complicated code, so its a tradeoff

My tests indicate that under an LR2 workload, SpiceDB is spending 32% of CPU time in GC (21-23% with my optimizations in #1989) . That's almost a fourth of CPU time collecting garbage (❗). And the vast majority of it is proto allocations.

That CPU time that goes to GC means less CPU time for the massive goroutine fan out LR2 tends to do with certain data shapes and sizes. In turn, it increases tail latencies.

My advice is to make the tradeoff and reduce allocations, the impact cannot be ignored and is globally an issue across all API endpoints. This is very relevant for deployments that have low CPU requests. Feel free to do in a follow up PR so this one does not stay around opened any longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

internal/graph/hints/checkhints.go Outdated Show resolved Hide resolved
internal/graph/hints/checkhints.go Outdated Show resolved Hide resolved
if len(req.CheckHints) > 0 {
filteredResourcesIdsSet := mapz.NewSet(filteredResourcesIds...)
for _, checkHint := range req.CheckHints {
resourceID, ok := hints.AsCheckHintForComputedUserset(checkHint, req.ResourceRelation, req.Subject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hints can also be issued for tupleset-to-userset, why can we assume here it's a hint for computed usersets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkInternal is going to loop over the CheckHints 2 times:

  • one to filter the resources to dispatch
  • another one during combination

You could return the result from the hint along side the resourceID, and build a map like the once combineWithComputedHints accepts, and call the latter instead. This means you don't have to iterate again over all the hints. This adds up, especially with wide relationships that lead to many batches of 100 elements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can we assume here it's a hint for computed usersets?

Because this is processing for the relation itself, and not the arrow.

You could return the result from the hint along side the resourceID...

Except when there aren't any hints, then we're doing work we don't need to do. Its also capping at maybe 1000 elements right now, which means the overhead is likely minimal. I can combine if you like, but it makes the code less readable for a very small improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when there aren't any hints, then we're doing work we don't need to do.

What work, creating an empty map?

Its also capping at maybe 1000 elements right now, which means the overhead is likely minimal. I can combine if you like, but it makes the code less readable for a very small improvement

How is it making it less readable? You already have a method that accepts a map.
Looking at these things as 1 call with 1000 elements wouldn't be much overhead. The issue is a graph with many elements can lead to many, many dispatches. I think every little bit counts.

}

if req.OriginalRelationName != "" {
resourceID, ok = hints.AsCheckHintForComputedUserset(checkHint, &core.RelationReference{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not allocate a core.RelationReference here. Instead you can create a new AsCheckHintForComputedUserset method that receives the strings directly, and that the method with proto args can reuse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate breaking nicer interfaces for optimization but oh well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have maintained the interface the used the proto, and just built on the new signature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but then it would have been used in exactly one location, so not worth the overhead IMO

internal/graph/check.go Show resolved Hide resolved
internal/graph/check.go Outdated Show resolved Hide resolved
internal/graph/check.go Show resolved Hide resolved
internal/graph/check.go Show resolved Hide resolved
internal/graph/check.go Show resolved Hide resolved
internal/dispatch/graph/check_test.go Outdated Show resolved Hide resolved
internal/dispatch/graph/check_test.go Outdated Show resolved Hide resolved
internal/graph/hints/checkhints.go Show resolved Hide resolved
internal/dispatch/graph/check_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@vroldanbet vroldanbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I've left some additional feedback, but we can address in a follow up

@vroldanbet vroldanbet added this pull request to the merge queue Jul 24, 2024
Merged via the queue into authzed:main with commit c31f34f Jul 24, 2024
22 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 24, 2024
@josephschorr josephschorr deleted the lrv2 branch July 24, 2024 21:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/api v1 Affects the v1 API area/CLI Affects the command line area/dispatch Affects dispatching of requests area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants