-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design doc for network policies #29814
Conversation
c269b08
to
f78411c
Compare
9f0db7a
to
dd0e74a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up @jubrad! I'll defer to others for the SQL syntax proposals, but structurally this all seems sounds to me. Feels like it tackles just the right amount of scope for v1, while leaving stepping stones to future additions as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo! Thanks for writing this @jubrad, looking forward to getting it built!
- Policy inheritance from associated roles. IE if 'bob' is a member of role 'eng' | ||
we will not apply policies from role 'eng' to 'bob'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW Role specific defaults, e.g. ALTER ROLE parker SET search_path
do not get inherited either
|
||
### Where network policies get applied. | ||
|
||
Network policies could be applied at many layers of our stack, from network firewalls or security groups that intercept traffic before it hits application subnets, to k8s or cilium network policies, balancers, or within the database itself. The above solutions choose to implement policies within the database itself. This comes with some disadvantages. For instance, this layer does not have auto-scaling and requires the database to do some work for each denied request. For this reason, it makes sense to shift the policies left. One possible shift is to the balancer layer. In this scenario, balancers would support both HTTP and pgwire load-balancing as well as network policy enforcement. Balancers have auto-scaling and are relatively stateless. A large number of out-of-policy requests to a balancer would likely not impact any ongoing connections. The biggest challenge with implementing network policies in the balancer is that they do not have access to the policies or roles, which are stored in the database. To move network policies to the balancers we would need some way of sharing all the policies and roles for all the environments a balancer is proxying. Another place we could shift these policies would be to a WAF or network firewalls. Neither one of these seems reasonable to implement for both pgwire and HTTP in a multi-tenant ingress layer, but this could be revisited for private ingress. It would still have the same issues of keeping policies up-to-date as the balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be relevant but something that comes to mind here when thinking about shifting policies left is how we store SECRET
s in an external system. Point being, there is already precedent for environmentd
to store and fetch data from something other than the Catalog or Persist. I could imagine doing something similar for communicating network policies with balancers or something even further left
## Open questions | ||
|
||
#### Single default policy for all resources? | ||
Should there be different default policies for users, sources, and sinks, or should a single default policy be applied to all resources once those resources start supporting policies? It may be difficult to roll out new resources if we only have one default, but it does seem nicer in the long run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how AWS or other cloud providers handle network policies for their various services? Seems like a similar design space maybe? How to handle Webhook sources might be a point in the "per-resource type" strategy instead of a single default
Sources and sinks are planned as a follow-up to user-based policies, but it remains an open question how we provide a user-friendly mechanism for webhook sources where it may be hard to find a list of IPs if the webhook request is coming from | ||
a third party. | ||
|
||
#### The story on lockout is a bit weak. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO explicitly pointing users to our formal support ticket process gets us pretty far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never seen this! Agree. We do need to be cautious about how we validate users are who they say we are, but this is a great mechanism for getting our attention.
formal support ticket process gets us pretty far
```sql | ||
CREATE NETWORK POLICY OFFICE_01 ( | ||
RULE ( ACTION=ALLOW, SOURCE="10.0.0.0/32", COMMENT="OFFICE IP - 2024-9-28" ) | ||
); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me, I do wonder if we want to separate out COMMENT
though and instead point folks towards something like:
COMMENT ON NETWORK POLICY office_01 is 'OFFICE IP - 2024-9-28'
Just so it's the same as other objects. Regardless we should probably pass this through the SQL council!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think comment is good—PostgreSQL (and therefore Materialize) uses "comment" to mean what AWS calls "description" pretty routinely.
Here's my recommendation: force every rule to have a name. Something like this would align with how you declare replicas for unmanaged clusters:
CREATE NETWORK POLICY OFFICE_01 (
RULES(
nikhils_desktop (ACTION=ALLOW, SOURCE="10.0.0.0/32"),
justins_laptop (ACTION=ALLOW, SOURCE="10.0.0.2/32"),
)
);
Then you could name a specific rule in the COMMENT
statement:
COMMENT ON NETWORK RULE office_01.justins_laptop is 'Laptop assigned to Justin Bradfield on 2024-01-03'
It also tees us up to support altering network policies one rule at a time, like:
ALTER NETWORK POLICY DROP RULE justins_laptop
We don't need to support that first thing, but it's nice to have the option to introduce the granular editing syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the alter policy would actually be:
ALTER NETWORK POLICY office_01 DROP RULE justins_laptop
If we go down this path we would probably want to treat rules more like cluster replicas, I was thinking about them more like brokers in a kafka connection. They'll probably want unique names and IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the alter policy would actually be:
ALTER NETWORK POLICY office_01 DROP RULE justins_laptop
Ah whoops, yes!
If we go down this path we would probably want to treat rules more like cluster replicas, I was thinking about them more like brokers in a kafka connection. They'll probably want unique names and IDs?
Yes, exactly! I think you could punt on IDs for now, but yeah you'd want to ensure unique names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good. Thanks for writing this up, @jubrad!
A new `NetworkPolicy` resource will be added to the catalog. | ||
```rust | ||
struct NetworkPolicy { | ||
id: NetworkPolicyId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does NetworkPolicyId
look like? I'm assuming something that looks like enum RoleId
or enum ClusterId
, so roughly:
enum NetworkPolicyId {
System(u64),
User(u64),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will just be a GlobalId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it'll need to be its own type! Today GlobalId
s are only for things that go in the items
collection in the catalog—i.e., things in schemas, rather than things that exist in the global namespace (clusters, roles, databases). @ParkMyCar is also refactoring things to introduce a CatalogItemId
, but again that will still be just for objects that exist in schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think it'll probably be similar to ClusterId
enum NetworkPolicyId {
System(u64),
User(u64),
}
I'll update the doc.
enum NetworkPolicyRule { | ||
Ingress { | ||
action: NetworkPolicyRuleAction, | ||
source: IpNet, | ||
comment: String | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there ever be a case in Materialize where a single policy will be used in a context where there need to be both ingress and egress rules? With sources, we have push sources (webhooks) and pull sources (Kafka), but never both; it seems intuitive to me that a network policy on a webhook source would restrict inbound connections while a network policy on a Kafka source would restrict outbound connections.
If you buy this argument, I think concretely the difference would be using a generic term like ADDRESSES
rather than SOURCE
or TARGET
or DESTINATION
in network rules, and then whether the policy applied in the inbound or outbound direction would be a function of whether it was attached to a role, pull source, or push source.
RULE (ACTION = ALLOW, ADDRESSES = 'cidr')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can intuit whether a policy/rule should be ingress or egress... two reasons:
- If we want to introduce ingress sinks, then sinks would support both ingress and egress making it impossible to define a single policy default for sinks.
- We want to be able to share policies between resource types without having to ensure some color match.
Thought for bullet 2
Let's say a user bob
has a specific network policy bob_policy
, that allows ingress access from IP 1.2.3.4/32
.
It should follow that bob
cannot create a source or sink that would expand their policy scope; i.e, any source or sink they create should inherit the policy applied to the user. This requires that the policy for bob
be applicable to all resource types bob can create.
Alternatives for enum
- have a different ingress and egress rule set for policies,
- add
direction
to NetworkPolicyRule and changesource
toaddress
I like alternative 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If we want to introduce ingress sinks, then sinks would support both ingress and egress making it impossible to define a single policy default for sinks.
Ah this is a very good point. Let's keep the distinction between egress and ingress rules then, and prepare to allow mixing and matching ingress and egress rules in a single policy. I also like the alternative you've proposed (adding a direction to each rule).
Let's say a user
bob
has a specific network policybob_policy
, that allows ingress access from IP1.2.3.4/32
.
It should follow thatbob
cannot create a source or sink that would expand their policy scope; i.e, any source or sink they create should inherit the policy applied to the user. This requires that the policy forbob
be applicable to all resource types bob can create.
Hm, I'm not sure this follows for me! I can imagine connecting as a user that's limited by a network policy to ingress from office IPs only, but wanting to create a webhook source that allows ingress from some EC2 instance somewhere that I've set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... I agree with you the network policies a user is able to associate with a given source/sink should be controlled by the policies on which the user has usage policies for, not the users existing policies. If one wants to lock that down to only the policy of the user, they could grant the user only usage privileges to that policy.
```sql | ||
CREATE NETWORK POLICY OFFICE_01 ( | ||
RULE ( ACTION=ALLOW, SOURCE="10.0.0.0/32", COMMENT="OFFICE IP - 2024-9-28" ) | ||
); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think comment is good—PostgreSQL (and therefore Materialize) uses "comment" to mean what AWS calls "description" pretty routinely.
Here's my recommendation: force every rule to have a name. Something like this would align with how you declare replicas for unmanaged clusters:
CREATE NETWORK POLICY OFFICE_01 (
RULES(
nikhils_desktop (ACTION=ALLOW, SOURCE="10.0.0.0/32"),
justins_laptop (ACTION=ALLOW, SOURCE="10.0.0.2/32"),
)
);
Then you could name a specific rule in the COMMENT
statement:
COMMENT ON NETWORK RULE office_01.justins_laptop is 'Laptop assigned to Justin Bradfield on 2024-01-03'
It also tees us up to support altering network policies one rule at a time, like:
ALTER NETWORK POLICY DROP RULE justins_laptop
We don't need to support that first thing, but it's nice to have the option to introduce the granular editing syntax.
|
||
Example syntax for updating the default_network_policy | ||
```sql | ||
ALTER SYSTEM SET default_network_policy = OFFICE_01; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way our variable inheritance works I think you just want to call this network_policy
; variables that aren't overridden at the role level default to the system parameter of the same name. This is how e.g. cluster
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think I'm not fully understanding this.
My understanding for clusters is that there is a system var for cluster that selects which cluster will be selected by default on login. Users can update this var to select a different cluster. I think we have a different story for policies. Customers shouldn't be able to set their policy through a session var. In order to apply a different policy to a role you'd have to alter the role and set a new policy, or alter the default policy for everyone.
I'm happy to change the name to network_policy
just want to make sure we're agreeing on the flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Customers shouldn't be able to set their policy through a session var. In order to apply a different policy to a role you'd have to alter the role and set a new policy, or alter the default policy for everyone.
💯 agree with this.
The reason I think it's still ideal to use network_policy
for both the system and role variable is because you're going to need to add special cases to deny inheritance as a session variable. By default, anything settable as a role variable (e.g. cluster
) is also settable as a session variable. If you add a system variable named default_network_policy
and a role variable named network_policy
, you're going to need to add special cases in a few places to handle the inheritance properly. Whereas if you just use network_policy
, the existing inheritance should work correctly for the system/role level, and you'll only need to add one special case to ensure that users can't override the network_policy
at the session level.
To mitigate user lockouts, we will prevent users from altering their network policy in a way that will block their current `client_ip`. In the case of a lockout, we would need to modify an admin role using the `mz_system` and temporarily set a network policy that either allowed global access for that user or allowed access to a particular IP they provide. | ||
|
||
### Possible downsides | ||
This design presents a highly configurable solution that guarantees no access to data and is likely the easiest mechanism to implement, however, it does have some downsides. The largest downside is in the guarantee it provides. The best level of network restriction we could provide is that no network traffic reaches the database. The proposed solution only guarantees that no connection can be established with the data plane (coordinator). This has some implications for DOS attacks which must be handled outside the scope of these policies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a downside worth accepting! Makes the implementation much more straightforward, and we can always add on an eventually consistent L3/L4 firewall that reads these policies and can more efficiently enforce them on incoming connections.
- Moving the `SystemVar` from a `Vec<IpNet>` to an `Ident` pointing to a `NetworkPolicy` resource | ||
- Modifying roles to have an `Option<NetworkPolicy>` and adding the SQL to set this policy | ||
- Adding validations to prevent lock out | ||
- Following up with sinks/sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love how you've scoped down each incremental steps here!
dd0e74a
to
dd7de2c
Compare
|
||
``` | ||
|
||
Users will be able to create `NetworkPolicies` directly. A user must have `CREATENETWORKPOLICY` privileges to create, modify, or destroy network policies. Network policies will be limited to 25 rules. This will be controlled by an LD flag. `NetworkPolicyRules` must be created through a policy. The policy rules implementation will initially only contain an `Allow` variant, but we should be an enum to allow for a `Deny` variant in the future. `NetworkPolicyRule` will hold a `NetworkPolicyRuleDirection` enum to allow for both ingress and egress policies. The names of `NetworkPolicyRules` must be unique within a policy. A `NetworkPolicyRules` will contain a single `IpNet`. We will allow for comments on `NetworkPolicies` as well as individual `NetworkPolicyRules`, the latter will not be implemented initially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be a corresponding privilege for viewing network policies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean they will be globally readable, or will CREATENETWORKPOLICY
also be required for reading policy info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any such restrictions on resources like clusters, so probably no restrictions here either.
7f05bac
to
e0a865e
Compare
e0a865e
to
2c98463
Compare
Motivation
Design doc for network policies.
This PR adds a known-desirable feature.
https://github.com/MaterializeInc/database-issues/issues/4637
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.