Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revocation file should be read through the driver #218

Open
zone117x opened this issue Mar 25, 2019 · 5 comments
Open

Revocation file should be read through the driver #218

zone117x opened this issue Mar 25, 2019 · 5 comments
Assignees

Comments

@zone117x
Copy link
Member

zone117x commented Mar 25, 2019

When a revocation is not in the memory cache due to eviction, server restart, or whatever else, it should use the driver cloud APIs to read the revocation file content just like it does when writing out the revocation file contents.

Right now it is fetched from the configured hub read endpoint. This presents some potential issues. All of which seem unlikely to occur often, if at all, especially on our hub. Some seem more likely when considering a future with a large, diversely-configured deployment of hubs.

  • If the read endpoint server ends up in a bad state where it only returns 404s, all revoked auth tokens that are not in memory cache could be treated as valid.

  • Flaky/intermittent error responses causing putFile to return invalid auth errors, requiring more error handling / retry logic in the client (I'm aware that in this flaky read path scenario, a client performing getFile's would already face similar issues - but there is no need to add more failure modes).

  • Read endpoint with delayed / misbehaving caching compared to direct cloud API read/writes could cause a (really) unlikely but possible race condition where: 1) a revocation is performed, its stored in memory cache and written to cloud API, 2) revocation gets quickly evicted from cache, 3) request is performed with the revoked token, 4) hub reads the delayed/outdated revocation and authorizes the request.

  • Routing must be successful between the gaia hub server and the read endpoint. Somewhat contrived example: hub operators that setup strict outbound firewalls need now perform additional configuration.

Implementing a DriverModel.performRead(...) in all the drivers for only internal hub usage is quick and easy. To re-iterate a point made after talking with @jcnelson - this would be clearly named and implemented for the purpose of necessary internal hub usage(s).

@jcnelson @kantai any feedback? Otherwise I'll probably knock this out next time I'm doing driver API work.

@zone117x zone117x self-assigned this Mar 25, 2019
@jcnelson
Copy link
Member

These are all good points @zone117x. I support adding a method to the driver model for reading a revocation file. More generally, since I think this sort of thing will be useful for collections, I support adding an API method for directly reading a named piece of Gaia metadata from the underlying data store (be it a revocation file, a collections index, etc.). I would recommend naming the method something like readGaiaMetadata(metadataFileName: string) or the like.

@kantai
Copy link
Member

kantai commented Mar 25, 2019

I don't agree with this. I think that this is a particularly unlikely failure scenario -- if the readUrl is returning stale data or flaky errors, then the drivers probably would be as well (indeed, they are probably more likely to have flaky and intermittent errors). In the event of a poorly configured readUrl, nothing is really going to work on the Gaia hub, and so we shouldn't try to handle the failures differently in different codepaths -- that's just increasing the number of possible failure states, making the failure more difficult to reason about and detect, not less. We'd also now need to increase our testing surface to cover all of these new paths. This seems like a lot of new code and new testing for pretty minimal benefit.

@jcnelson
Copy link
Member

jcnelson commented Mar 25, 2019

If the read endpoint server ends up in a bad state where it only returns 404s, all revoked auth tokens that are not in memory cache could be treated as valid.

Thinking about this a little more, I consider this to be an unrelated problem. A server-side 404 when validating an auth token should cause all writes to be denied as a safety precaution.

EDIT: nevermind

@zone117x
Copy link
Member Author

A server-side 404 when validating an auth token should cause all writes to be denied as a safety precaution.

@jcnelson If we wanted 404's to fail a write, we'd need to implement something like: "if 404, write a no-op revocation file (like timestamp set to 0 or something like that), then re-read, and if still 404, then fail the request." This would end up creating a ton of no-op revocation files. I'm not against the idea, but hopefully we can come up with a clean approach for this.

@kantai I think creating a new network dependency between hub and a the read-endpoint adds more complexity than if we just used the cloud APIs.

Seems like there are more failure states from the asymmetry within the revocation code, where when writing it talks cloud API, yet when reading it talks to a user-configurable endpoint with unpredictable failure states. Wouldn't the error states of the cloud APIs be more familiar and consistent?

If we already need something like a readFile or fileStat for collections, then I don't think we'd be adding a bunch of new tests or paths by swapping the revocation code fetch(...) to driver.read....

@kantai
Copy link
Member

kantai commented Mar 26, 2019

A server-side 404 when validating an auth token should cause all writes to be denied as a safety precaution.

404s causing all writes to fail is not good -- it would mean that any bucket would need to successfully perform a revocation write the first time it was used (even making this possible requires a special case-- the current revocation number governs whether or not a revocation bump request is validated). In turn, this would mean clients would have to know to when to do this, which introduces a new client-server flow for establishing a first connection. I don't think this increases safety -- the storage backend is ultimately in control of the data that gets written, so the fact that the storage backend can prevent a revocation from occurring by serving a 404 doesn't change any security properties of the system.

@kantai I think creating a new network dependency between hub and a the read-endpoint adds more complexity than if we just used the cloud APIs.

Is it a new network dependency, though? If you cannot read data from the readUrl, the hub is broken. Using the cloud APIs for reading the revocation number just masks that brokenness.

Seems like there are more failure states from the asymmetry within the revocation code, where when writing it talks cloud API, yet when reading it talks to a user-configurable endpoint with unpredictable failure states. Wouldn't the error states of the cloud APIs be more familiar and consistent?

Again, see above. If the user-configurable endpoint is broken, the hub is broken. We already live with those failure states. The Cloud APIs may have fewer failure states, but they are new failure states.

If we already need something like a readFile or fileStat for collections, then I don't think we'd be adding a bunch of new tests or paths by swapping the revocation code fetch(...) to driver.read....

That's true, and I think it's ultimately fine to do so, but we shouldn't trick ourselves into thinking this allows us to be resistant to errors in the readUrl configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants