design doc for network policies #29814

jubrad · 2024-10-01T21:10:01Z

Motivation

Design doc for network policies.

This PR adds a known-desirable feature.
https://github.com/MaterializeInc/database-issues/issues/4637

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

pH14

Thanks for writing this up @jubrad! I'll defer to others for the SQL syntax proposals, but structurally this all seems sounds to me. Feels like it tackles just the right amount of scope for v1, while leaving stepping stones to future additions as needed

ParkMyCar

Woohoo! Thanks for writing this @jubrad, looking forward to getting it built!

ParkMyCar · 2024-10-03T15:20:43Z

doc/developer/design/20240925_network_policies.md

+- Policy inheritance from associated roles. IE if 'bob' is a member of role 'eng'
+we will not apply policies from role 'eng' to 'bob'.


FWIW Role specific defaults, e.g. ALTER ROLE parker SET search_path do not get inherited either

ParkMyCar · 2024-10-03T15:33:22Z

doc/developer/design/20240925_network_policies.md

+
+### Where network policies get applied.
+
+Network policies could be applied at many layers of our stack, from network firewalls or security groups that intercept traffic before it hits application subnets, to k8s or cilium network policies, balancers, or within the database itself. The above solutions choose to implement policies within the database itself. This comes with some disadvantages. For instance, this layer does not have auto-scaling and requires the database to do some work for each denied request. For this reason, it makes sense to shift the policies left. One possible shift is to the balancer layer. In this scenario, balancers would support both HTTP and pgwire load-balancing as well as network policy enforcement. Balancers have auto-scaling and are relatively stateless. A large number of out-of-policy requests to a balancer would likely not impact any ongoing connections. The biggest challenge with implementing network policies in the balancer is that they do not have access to the policies or roles, which are stored in the database. To move network policies to the balancers we would need some way of sharing all the policies and roles for all the environments a balancer is proxying. Another place we could shift these policies would be to a WAF or network firewalls. Neither one of these seems reasonable to implement for both pgwire and HTTP in a multi-tenant ingress layer, but this could be revisited for private ingress. It would still have the same issues of keeping policies up-to-date as the balancer.


It might not be relevant but something that comes to mind here when thinking about shifting policies left is how we store SECRETs in an external system. Point being, there is already precedent for environmentd to store and fetch data from something other than the Catalog or Persist. I could imagine doing something similar for communicating network policies with balancers or something even further left

ParkMyCar · 2024-10-03T15:35:07Z

doc/developer/design/20240925_network_policies.md

+## Open questions
+
+#### Single default policy for all resources?
+Should there be different default policies for users, sources, and sinks, or should a single default policy be applied to all resources once those resources start supporting policies? It may be difficult to roll out new resources if we only have one default, but it does seem nicer in the long run.


Do you know how AWS or other cloud providers handle network policies for their various services? Seems like a similar design space maybe? How to handle Webhook sources might be a point in the "per-resource type" strategy instead of a single default

ParkMyCar · 2024-10-03T15:36:14Z

doc/developer/design/20240925_network_policies.md

+Sources and sinks are planned as a follow-up to user-based policies, but it remains an open question how we provide a user-friendly mechanism for webhook sources where it may be hard to find a list of IPs if the webhook request is coming from
+a third party.
+
+#### The story on lockout is a bit weak.


IMO explicitly pointing users to our formal support ticket process gets us pretty far

Never seen this! Agree. We do need to be cautious about how we validate users are who they say we are, but this is a great mechanism for getting our attention.

formal support ticket process gets us pretty far

ParkMyCar · 2024-10-03T15:43:05Z

doc/developer/design/20240925_network_policies.md

+```sql
+CREATE NETWORK POLICY OFFICE_01 (
+ RULE ( ACTION=ALLOW, SOURCE="10.0.0.0/32", COMMENT="OFFICE IP - 2024-9-28" )
+);
+```


Overall this looks good to me, I do wonder if we want to separate out COMMENT though and instead point folks towards something like:

COMMENT ON NETWORK POLICY office_01 is 'OFFICE IP - 2024-9-28'

Just so it's the same as other objects. Regardless we should probably pass this through the SQL council!

Yes we'd want the comment to be on each rule which is not a resource that's stored outside of the policy.

... maybe description is a better name for this field.

FWIW my inspiration was, in part, aws security groups

I think comment is good—PostgreSQL (and therefore Materialize) uses "comment" to mean what AWS calls "description" pretty routinely.

Here's my recommendation: force every rule to have a name. Something like this would align with how you declare replicas for unmanaged clusters:

CREATE NETWORK POLICY OFFICE_01 ( RULES( nikhils_desktop (ACTION=ALLOW, SOURCE="10.0.0.0/32"), justins_laptop (ACTION=ALLOW, SOURCE="10.0.0.2/32"), ) );

Then you could name a specific rule in the COMMENT statement:

COMMENT ON NETWORK RULE office_01.justins_laptop is 'Laptop assigned to Justin Bradfield on 2024-01-03'

It also tees us up to support altering network policies one rule at a time, like:

ALTER NETWORK POLICY DROP RULE justins_laptop

We don't need to support that first thing, but it's nice to have the option to introduce the granular editing syntax.

I assume the alter policy would actually be:
ALTER NETWORK POLICY office_01 DROP RULE justins_laptop

If we go down this path we would probably want to treat rules more like cluster replicas, I was thinking about them more like brokers in a kafka connection. They'll probably want unique names and IDs?

I assume the alter policy would actually be:
ALTER NETWORK POLICY office_01 DROP RULE justins_laptop

Ah whoops, yes!

If we go down this path we would probably want to treat rules more like cluster replicas, I was thinking about them more like brokers in a kafka connection. They'll probably want unique names and IDs?

Yes, exactly! I think you could punt on IDs for now, but yeah you'd want to ensure unique names.

benesch

Looking really good. Thanks for writing this up, @jubrad!

benesch · 2024-10-07T00:26:46Z

doc/developer/design/20240925_network_policies.md

+A new `NetworkPolicy` resource will be added to the catalog.
+```rust
+struct NetworkPolicy {
+    id: NetworkPolicyId


What does NetworkPolicyId look like? I'm assuming something that looks like enum RoleId or enum ClusterId, so roughly:

enum NetworkPolicyId { System(u64), User(u64), }

I think this will just be a GlobalId

No, it'll need to be its own type! Today GlobalIds are only for things that go in the items collection in the catalog—i.e., things in schemas, rather than things that exist in the global namespace (clusters, roles, databases). @ParkMyCar is also refactoring things to introduce a CatalogItemId, but again that will still be just for objects that exist in schemas.

In that case I think it'll probably be similar to ClusterId

enum NetworkPolicyId { System(u64), User(u64), }

I'll update the doc.

benesch · 2024-10-07T01:32:43Z

doc/developer/design/20240925_network_policies.md

+enum NetworkPolicyRule {
+    Ingress {
+        action: NetworkPolicyRuleAction,
+        source: IpNet,
+        comment: String
+ }
+}


Will there ever be a case in Materialize where a single policy will be used in a context where there need to be both ingress and egress rules? With sources, we have push sources (webhooks) and pull sources (Kafka), but never both; it seems intuitive to me that a network policy on a webhook source would restrict inbound connections while a network policy on a Kafka source would restrict outbound connections.

If you buy this argument, I think concretely the difference would be using a generic term like ADDRESSES rather than SOURCE or TARGET or DESTINATION in network rules, and then whether the policy applied in the inbound or outbound direction would be a function of whether it was attached to a role, pull source, or push source.

RULE (ACTION = ALLOW, ADDRESSES = 'cidr')

I don't think you can intuit whether a policy/rule should be ingress or egress... two reasons:

If we want to introduce ingress sinks, then sinks would support both ingress and egress making it impossible to define a single policy default for sinks.

We want to be able to share policies between resource types without having to ensure some color match.

Thought for bullet 2
Let's say a user bob has a specific network policy bob_policy, that allows ingress access from IP 1.2.3.4/32.
It should follow that bob cannot create a source or sink that would expand their policy scope; i.e, any source or sink they create should inherit the policy applied to the user. This requires that the policy for bob be applicable to all resource types bob can create.

Alternatives for enum

have a different ingress and egress rule set for policies,

add direction to NetworkPolicyRule and change source to address

I like alternative 2.

If we want to introduce ingress sinks, then sinks would support both ingress and egress making it impossible to define a single policy default for sinks.

Ah this is a very good point. Let's keep the distinction between egress and ingress rules then, and prepare to allow mixing and matching ingress and egress rules in a single policy. I also like the alternative you've proposed (adding a direction to each rule).

Let's say a user bob has a specific network policy bob_policy, that allows ingress access from IP 1.2.3.4/32.
It should follow that bob cannot create a source or sink that would expand their policy scope; i.e, any source or sink they create should inherit the policy applied to the user. This requires that the policy for bob be applicable to all resource types bob can create.

Hm, I'm not sure this follows for me! I can imagine connecting as a user that's limited by a network policy to ingress from office IPs only, but wanting to create a webhook source that allows ingress from some EC2 instance somewhere that I've set up.

yeah... I agree with you the network policies a user is able to associate with a given source/sink should be controlled by the policies on which the user has usage policies for, not the users existing policies. If one wants to lock that down to only the policy of the user, they could grant the user only usage privileges to that policy.

benesch · 2024-10-07T01:37:54Z

doc/developer/design/20240925_network_policies.md

+```sql
+CREATE NETWORK POLICY OFFICE_01 (
+ RULE ( ACTION=ALLOW, SOURCE="10.0.0.0/32", COMMENT="OFFICE IP - 2024-9-28" )
+);
+```


I think comment is good—PostgreSQL (and therefore Materialize) uses "comment" to mean what AWS calls "description" pretty routinely.

Here's my recommendation: force every rule to have a name. Something like this would align with how you declare replicas for unmanaged clusters:

CREATE NETWORK POLICY OFFICE_01 ( RULES( nikhils_desktop (ACTION=ALLOW, SOURCE="10.0.0.0/32"), justins_laptop (ACTION=ALLOW, SOURCE="10.0.0.2/32"), ) );

Then you could name a specific rule in the COMMENT statement:

COMMENT ON NETWORK RULE office_01.justins_laptop is 'Laptop assigned to Justin Bradfield on 2024-01-03'

It also tees us up to support altering network policies one rule at a time, like:

ALTER NETWORK POLICY DROP RULE justins_laptop

We don't need to support that first thing, but it's nice to have the option to introduce the granular editing syntax.

benesch · 2024-10-07T01:38:52Z

doc/developer/design/20240925_network_policies.md

+
+Example syntax for updating the default_network_policy
+```sql
+ALTER SYSTEM SET default_network_policy = OFFICE_01;


The way our variable inheritance works I think you just want to call this network_policy; variables that aren't overridden at the role level default to the system parameter of the same name. This is how e.g. cluster works.

ok, I think I'm not fully understanding this.

My understanding for clusters is that there is a system var for cluster that selects which cluster will be selected by default on login. Users can update this var to select a different cluster. I think we have a different story for policies. Customers shouldn't be able to set their policy through a session var. In order to apply a different policy to a role you'd have to alter the role and set a new policy, or alter the default policy for everyone.

I'm happy to change the name to network_policy just want to make sure we're agreeing on the flow.

Customers shouldn't be able to set their policy through a session var. In order to apply a different policy to a role you'd have to alter the role and set a new policy, or alter the default policy for everyone.

💯 agree with this.

The reason I think it's still ideal to use network_policy for both the system and role variable is because you're going to need to add special cases to deny inheritance as a session variable. By default, anything settable as a role variable (e.g. cluster) is also settable as a session variable. If you add a system variable named default_network_policy and a role variable named network_policy, you're going to need to add special cases in a few places to handle the inheritance properly. Whereas if you just use network_policy, the existing inheritance should work correctly for the system/role level, and you'll only need to add one special case to ensure that users can't override the network_policy at the session level.

benesch · 2024-10-07T01:40:40Z

doc/developer/design/20240925_network_policies.md

+To mitigate user lockouts, we will prevent users from altering their network policy in a way that will block their current `client_ip`. In the case of a lockout, we would need to modify an admin role using the `mz_system` and temporarily set a network policy that either allowed global access for that user or allowed access to a particular IP they provide.
+
+### Possible downsides
+This design presents a highly configurable solution that guarantees no access to data and is likely the easiest mechanism to implement, however, it does have some downsides. The largest downside is in the guarantee it provides. The best level of network restriction we could provide is that no network traffic reaches the database.  The proposed solution only guarantees that no connection can be established with the data plane (coordinator). This has some implications for DOS attacks which must be handled outside the scope of these policies.


Seems like a downside worth accepting! Makes the implementation much more straightforward, and we can always add on an eventually consistent L3/L4 firewall that reads these policies and can more efficiently enforce them on incoming connections.

benesch · 2024-10-07T01:41:13Z

doc/developer/design/20240925_network_policies.md

+- Moving the `SystemVar` from a `Vec<IpNet>` to an `Ident` pointing to a `NetworkPolicy` resource
+- Modifying roles to have an `Option<NetworkPolicy>` and adding the SQL to set this policy
+- Adding validations to prevent lock out
+- Following up with sinks/sources


Love how you've scoped down each incremental steps here!

doc/developer/design/20240925_network_policies.md

arusahni · 2024-10-17T16:00:35Z

doc/developer/design/20240925_network_policies.md

+
+```
+
+Users will be able to create `NetworkPolicies` directly.  A user must have `CREATENETWORKPOLICY` privileges to create, modify, or destroy network policies. Network policies will be limited to 25 rules. This will be controlled by an LD flag. `NetworkPolicyRules` must be created through a policy. The policy rules implementation will initially only contain an `Allow` variant, but we should be an enum to allow for a `Deny` variant in the future. `NetworkPolicyRule` will hold a `NetworkPolicyRuleDirection` enum to allow for both ingress and egress policies. The names of `NetworkPolicyRules` must be unique within a policy. A `NetworkPolicyRules` will contain a single `IpNet`. We will allow for comments on `NetworkPolicies` as well as individual `NetworkPolicyRules`, the latter will not be implemented initially.


Will there be a corresponding privilege for viewing network policies?

Does that mean they will be globally readable, or will CREATENETWORKPOLICY also be required for reading policy info?

I don't think we have any such restrictions on resources like clusters, so probably no restrictions here either.

jubrad force-pushed the network-policies-design-doc branch 4 times, most recently from c269b08 to f78411c Compare October 2, 2024 16:59

jubrad requested a review from pH14 October 2, 2024 16:59

jubrad force-pushed the network-policies-design-doc branch 4 times, most recently from 9f0db7a to dd0e74a Compare October 3, 2024 13:55

pH14 reviewed Oct 3, 2024

View reviewed changes

jubrad requested review from pH14 and ParkMyCar October 3, 2024 14:35

jubrad marked this pull request as ready for review October 3, 2024 14:35

ParkMyCar approved these changes Oct 3, 2024

View reviewed changes

jubrad changed the title ~~WIP design doc for network policies~~ design doc for network policies Oct 3, 2024

benesch reviewed Oct 7, 2024

View reviewed changes

jubrad force-pushed the network-policies-design-doc branch from dd0e74a to dd7de2c Compare October 14, 2024 14:12

jubrad self-assigned this Oct 14, 2024

arusahni reviewed Oct 17, 2024

View reviewed changes

jubrad force-pushed the network-policies-design-doc branch from 7f05bac to e0a865e Compare October 28, 2024 18:19

jubrad mentioned this pull request Oct 30, 2024

Feature/network policy sql #30172

Merged

5 tasks

design doc for network policies

2c98463

jubrad force-pushed the network-policies-design-doc branch from e0a865e to 2c98463 Compare November 1, 2024 02:50

jubrad enabled auto-merge November 1, 2024 02:51

jubrad merged commit cfb7218 into MaterializeInc:main Nov 1, 2024
9 checks passed

jubrad mentioned this pull request Nov 1, 2024

Feature/network policy predefined #30261

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design doc for network policies #29814

design doc for network policies #29814

jubrad commented Oct 1, 2024

pH14 left a comment

ParkMyCar left a comment

ParkMyCar Oct 3, 2024

ParkMyCar Oct 3, 2024

ParkMyCar Oct 3, 2024

ParkMyCar Oct 3, 2024

ggnall Oct 7, 2024

ParkMyCar Oct 3, 2024

jubrad Oct 3, 2024

benesch Oct 7, 2024

jubrad Oct 8, 2024 •

edited

Loading

benesch Oct 8, 2024

benesch left a comment

benesch Oct 7, 2024

jubrad Oct 8, 2024

benesch Oct 8, 2024

jubrad Oct 8, 2024

benesch Oct 7, 2024

jubrad Oct 8, 2024 •

edited

Loading

benesch Oct 8, 2024

jubrad Oct 14, 2024 •

edited

Loading

benesch Oct 7, 2024

benesch Oct 7, 2024

jubrad Oct 8, 2024

benesch Oct 9, 2024

benesch Oct 7, 2024

benesch Oct 7, 2024

arusahni Oct 17, 2024

jubrad Oct 22, 2024

arusahni Oct 22, 2024

jubrad Oct 22, 2024

		- Policy inheritance from associated roles. IE if 'bob' is a member of role 'eng'
		we will not apply policies from role 'eng' to 'bob'.


		### Where network policies get applied.

		Network policies could be applied at many layers of our stack, from network firewalls or security groups that intercept traffic before it hits application subnets, to k8s or cilium network policies, balancers, or within the database itself. The above solutions choose to implement policies within the database itself. This comes with some disadvantages. For instance, this layer does not have auto-scaling and requires the database to do some work for each denied request. For this reason, it makes sense to shift the policies left. One possible shift is to the balancer layer. In this scenario, balancers would support both HTTP and pgwire load-balancing as well as network policy enforcement. Balancers have auto-scaling and are relatively stateless. A large number of out-of-policy requests to a balancer would likely not impact any ongoing connections. The biggest challenge with implementing network policies in the balancer is that they do not have access to the policies or roles, which are stored in the database. To move network policies to the balancers we would need some way of sharing all the policies and roles for all the environments a balancer is proxying. Another place we could shift these policies would be to a WAF or network firewalls. Neither one of these seems reasonable to implement for both pgwire and HTTP in a multi-tenant ingress layer, but this could be revisited for private ingress. It would still have the same issues of keeping policies up-to-date as the balancer.


		```

		Users will be able to create `NetworkPolicies` directly. A user must have `CREATENETWORKPOLICY` privileges to create, modify, or destroy network policies. Network policies will be limited to 25 rules. This will be controlled by an LD flag. `NetworkPolicyRules` must be created through a policy. The policy rules implementation will initially only contain an `Allow` variant, but we should be an enum to allow for a `Deny` variant in the future. `NetworkPolicyRule` will hold a `NetworkPolicyRuleDirection` enum to allow for both ingress and egress policies. The names of `NetworkPolicyRules` must be unique within a policy. A `NetworkPolicyRules` will contain a single `IpNet`. We will allow for comments on `NetworkPolicies` as well as individual `NetworkPolicyRules`, the latter will not be implemented initially.

design doc for network policies #29814

design doc for network policies #29814

Conversation

jubrad commented Oct 1, 2024

Motivation

Tips for reviewer

Checklist

pH14 left a comment

Choose a reason for hiding this comment

ParkMyCar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jubrad Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jubrad Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jubrad Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jubrad Oct 8, 2024 •

edited

Loading

jubrad Oct 8, 2024 •

edited

Loading

jubrad Oct 14, 2024 •

edited

Loading