diff --git a/README.md b/README.md
index 9ac4a717..fbac7522 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@ This reference implementation demonstrates the _recommended starting (baseline)
| 🎓 Foundational Understanding |
|:------------------------------|
-| **If you haven't familiarized yourself with the general-purpose [AKS baseline cluster](https://github.com/mspnp/aks-secure-baseline) architecture, you should start there before continuing here.** This architecture rationalizes and is constructed from the AKS baseline, which is the foundation for this body of work. This reference implementation avoids rearticulating points that are already addressed in the AKS baseline cluster. |
+| **If you haven't familiarized yourself with the general-purpose [AKS baseline cluster](https://github.com/mspnp/aks-secure-baseline) architecture, you should start there before continuing here.** This architecture is constructed from the AKS baseline, which is the foundation for this body of work. This reference implementation avoids rearticulating points that are already addressed in the AKS baseline cluster. |
## Compliance
@@ -17,15 +17,15 @@ Even if you are not in a regulated environment, this infrastructure demonstrates
## Azure Architecture Center guidance
-This project has a companion set of articles that describe challenges, design patterns, and best practices for a AKS cluster designed to host workloads that fall in **PCI-DSS 3.2.1** scope. You can find this article on the Azure Architecture Center at [Azure Kubernetes Service (AKS) regulated cluster for PCI-DSS 3.2.1](https://aka.ms/architecture/aks-baseline-regulated). If you haven't reviewed it, we suggest you read it; as it will give added context to the considerations applied in this implementation.
+This project has a companion set of articles that describe challenges, design patterns, and best practices for a AKS cluster designed to host workloads that fall in **PCI-DSS 3.2.1** scope. You can find this article on the Azure Architecture Center at [Azure Kubernetes Service (AKS) regulated cluster for PCI-DSS 3.2.1](https://aka.ms/architecture/aks-baseline-regulated). If you haven't reviewed it, we suggest you read it; as it will give added context to the considerations applied in this implementation. This repo primarly focuses on _deployment concerns_, while _compliance concerns_ are mostly addressed in the linked article series above.
## Architecture
-**This reference implementation is _infrastructure focused, more so than workload_.** It concentrates on compliance concerns dealing with the AKS cluster itself. This implementation will touch on workload concerns, but does not contain end-to-end guidance on in-scope workload architecture, container security, or isolation. There are some good practices demonstrated and others talked about, but it is not exhaustive.
+**This reference implementation is _infrastructure focused, more so than workload_.** It concentrates on dealing with the AKS cluster itself. This implementation will touch on workload concerns, but does not contain end-to-end guidance on in-scope workload architecture, container security, or isolation. There are some good practices demonstrated and others talked about, but it is not exhaustive.
The implementation presented here is the _minimum starting point for most AKS clusters falling into a compliance scope_. This implementation integrates with Azure services that will deliver observability, provide a network topology that will support public traffic isolation, and keep the in-cluster traffic secure as well. This architecture should be considered your architectural starting point for pre-production and production stages of clusters hosting regulated workloads.
-The material here is relatively dense. We strongly encourage you to dedicate _at least four hours_ to walk through these instructions, with a mind to learning. You will not find any "one click" deployment here. However, once you've understood the components involved and identified the shared responsibilities between your team and your greater IT organization, it is encouraged that you build auditable deployment processes around your final infrastructure.
+The material here is relatively dense. We strongly encourage you to start by reading the Azure Architecture Center guidance linked above and then dedicate _at least four hours_ to walk through these instructions, with a mind to learning. You will not find any "one click" deployment here. However, once you've understood the components involved and identified the shared responsibilities between your team and your greater IT organization, it is encouraged that you build auditable deployment processes around your final infrastructure.
Finally, this implementation uses a small, custom application as an example workload. This workload is minimally interesting, as it is here exclusively to help you experience the infrastructure and illustrate network and security controls in place. The workload, and its deployment, does not represent any sort of "best practices" for regulated workloads.
@@ -44,8 +44,8 @@ Finally, this implementation uses a small, custom application as an example work
* Azure Virtual Networks (hub-spoke)
* Azure Firewall managed egress
* Hub-proxied DNS
- * BYO Private DNS Zone for AKS
-* Azure Application Gateway (WAF - OWASP 3.1)
+ * BYO Private DNS Zone for AKS with no public DNS representation
+* Azure Application Gateway (WAF - OWASP 3.2)
* AKS-managed Internal Load Balancers
* Azure Bastion for maintenance access
* Private Link enabled Key Vault and Azure Container Registry
@@ -137,11 +137,13 @@ Most of the Azure resources deployed in the prior steps will have ongoing billin
All workloads that find themselves in compliance scope usually require a documented separation of duties/concern implementation plan. Kubernetes poses an interesting challenge in that it involves a significant number of roles typically found across an IT organization. Networking, identity, SecOps, governance, workload teams, cluster operations, deployment pipelines, any many more. If you're looking for a starting point on how you might consider breaking up the roles that are adjacent to the AKS cluster, consider **reviewing our [Azure AD role guide](./docs/rbac-suggestions.md)** shipped as part of this reference implementation.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 7, 8, and 9 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-identity).
+
## Is that all, what about … !?
Yes, there are concerns that do extend beyond what this implementation could reasonably demonstrate for a general audience. This reference implementation strived to be accessible for most people without putting undo burdens on the subscription brought to this walkthrough. This means SKU choices with relatively large default quotas, not using features that have very limited regional availability, not asking for learners to be overwhelmed with "Bring your own encryption key" options for services, and similar. All in hopes that more people can complete this walkthrough without disruption or excessive coordination with subscription or management group owners.
-For your implementation, take this starting point and please add on additional security measures talked about throughout the walkthrough that were not directly implemented. For example, enable JIT and Conditional Access Policies, leverage Encryption-at-Host features if applicable to your workload, etc.
+For your implementation, take this _starting point_ and please add on additional security measures talked about throughout the walkthrough and the Azure Architecture Center guidance that were not directly implemented. For example, enable JIT and Conditional Access Policies, leverage Encryption-at-Host features if applicable to your workload, etc.
**For a list of additional considerations for your architecture, please see our [Additional Considerations](./docs/additional-considerations.md) document.**
@@ -149,7 +151,7 @@ For your implementation, take this starting point and please add on additional s
This reference implementation runs idle around $95 (US Dollars) per day within the first 30 days; and you can expect it to increase over time as some Security Center tooling has free-trial period and logs will continue to accrue. The largest contributors to the starting cost are Azure Firewall, the AKS nodepools (VM Scale Sets), and Log Analytics. While some costs are usually cluster operator costs, such as nodepool VMSS, log analytics, incremental Azure Defender costs; others will likely be amortized across multiple business units and/or applications, such as Azure Firewall.
-While some customers will amortize cluster costs across workloads by hosting a multi-tenant cluster within their organization, maximizing density with workload diversity, doing so with regulated workloads is not advised. Regulated environments will generally prioritize compliance and security (isolation) over cost (diverse density).
+Some customers will opt to amortize cluster costs across workloads by hosting a multi-tenant cluster within their organization, maximizing density with workload diversity. Doing so with regulated workloads is not advised. Regulated environments will generally prioritize compliance and security (isolation) over cost (diverse density).
## Final thoughts
diff --git a/cluster-stamp.json b/cluster-stamp.json
index 42d272f1..7b41a1dd 100644
--- a/cluster-stamp.json
+++ b/cluster-stamp.json
@@ -611,7 +611,7 @@
"enabled": true,
"firewallMode": "Prevention",
"ruleSetType": "OWASP",
- "ruleSetVersion": "3.1",
+ "ruleSetVersion": "3.2",
"disabledRuleGroups": [],
"requestBodyCheck": true,
"maxRequestBodySizeInKb": 128,
diff --git a/cluster-stamp.v2.json b/cluster-stamp.v2.json
index 51cb9431..6700a597 100644
--- a/cluster-stamp.v2.json
+++ b/cluster-stamp.v2.json
@@ -611,7 +611,7 @@
"enabled": true,
"firewallMode": "Prevention",
"ruleSetType": "OWASP",
- "ruleSetVersion": "3.1",
+ "ruleSetVersion": "3.2",
"disabledRuleGroups": [],
"requestBodyCheck": true,
"maxRequestBodySizeInKb": 128,
diff --git a/docs/additional-considerations.md b/docs/additional-considerations.md
index a1d127f1..66dc1ec4 100644
--- a/docs/additional-considerations.md
+++ b/docs/additional-considerations.md
@@ -2,6 +2,8 @@
This reference implementation is designed to be a starting point for your eventual architecture. It does not enable "every security option possible." While some of the features that are not enabled we'd encourage their usage, practicality supporting high completion rate of this material prevents the enablement of some feature. For example, your subscription permissions, already existing security policies in the subscription, existing policies that might block deployment, simplicity in demonstration, etc. Not all going through this walkthrough have the luxury of exploring it in a situation where they are subscription owner and Azure AD administrator. Because they were not trivial to deploy in this walkthrough, we wanted to ensure you at least have a list of things we'd have liked to include out of the box or at least have introduced as a consideration. Review these and add them into your final architecture as you see fit.
+In addition to the ones mentioned below, the [Azure Architecture Center guidance for PCI-DSS 3.2.1 with AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-intro) also includes sepcific recommendations that might not be implemented in this solution. Some of those will be called out below.
+
## Host/disk encryption
@@ -9,17 +11,17 @@ This reference implementation is designed to be a starting point for your eventu
### Customer-managed OS and data disk encryption
-While OS and data disks (and their caches) are already encrypted at rest with Microsoft-managed keys, for additional control over encryption keys you can use customer-managed keys for encryption at rest for both the OS and the data disks in your AKS cluster. This reference implementation doesn't actually use any disks in the cluster, and the OS disk is ephemeral. But if you use non-ephemeral OS disks or add data disks, consider using this added security solution.
-
-Read more about [Bing your own keys (BYOK) with Azure disks](https://docs.microsoft.com/azure/aks/azure-disk-customer-managed-keys).
+While OS and data disks (and their caches) are already encrypted at rest with Microsoft-managed keys, for additional control over encryption keys you can use customer-managed keys for encryption at rest for both the OS and the data disks in your AKS cluster. This reference implementation doesn't actually use any disks in the cluster, and the OS disk is ephemeral.
-Consider using BYOK for any other disks that might be in your final solution, such as your Azure Bastion-fronted jumpboxes. Please note that your SKU choice for VMs will be limited to only those that support this feature, and regional availability will be restricted as well.
+> :notebook: See the [Azure Architecture Center PCI-DSS 3.2.1 Disc encryption article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#disk-encryption).
Note, we enable an Azure Policy alert detecting clusters without this feature enabled. The reference implementation will trip this policy alert because there is no `diskEncryptionSetID` provided on the cluster resource. The policy is in place as a reminder of this security feature that you might wish to use. The policy is set to "audit" not "block."
### Host-based encryption
-You can take OS and data disk encryption one step further and also bring the encryption up to the Azure host. Using [Host-Based Encryption](https://docs.microsoft.com/azure/aks/enable-host-encryption) means that the temp disks now will be encrypted at rest using platform-managed keys. This will then cover encryption of the VMSS ephemeral OS disk and temp disks. Your SKU choice for VMs will be limited to only those that support this feature, and regional availability will be restricted as well. This feature is currently in preview. See more details about [VM support for host-based encryption](https://docs.microsoft.com/azure/virtual-machines/disk-encryption#encryption-at-host---end-to-end-encryption-for-your-vm-data).
+You can take OS and data disk encryption one step further and also bring the encryption up to the Azure host. Using [Host-Based Encryption](https://docs.microsoft.com/azure/aks/enable-host-encryption) means that the temp disks now will be encrypted at rest using platform-managed keys. This will then cover encryption of the VMSS ephemeral OS disk and temp disks.
+
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Disc encryption article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#disk-encryption).
Note, like above, we enable an Azure Policy detecting clusters without this feature enabled. The reference implementation will trip this policy alert because this feature is not enabled on the `agentPoolProfiles`. The policy is in place as a reminder of this security feature that you might wish to use once it is GA. The policy is set to "audit" not "block."
@@ -32,21 +34,20 @@ Note, like above, we enable an Azure Policy detecting clusters without this feat
### Enable Network Watcher and Traffic Analytics
-Observability into your network is critical for compliance. [Network Watcher](https://docs.microsoft.com/azure/network-watcher/network-watcher-monitoring-overview), combined with [Traffic Analysis](https://docs.microsoft.com/azure/network-watcher/traffic-analytics) will help provide a perspective into traffic traversing your networks. This reference implementation does not deploy NSG Flow Logs or Traffic Analysis by default. These features depend on a regional Network Watcher resource being installed on your subscription. Network Watchers are singletons in a subscription, and there is no reasonable way to include them in these specific ARM templates and account for both pre-existing network watchers (which might exist in a resource group you do not have RBAC access to) and non-preexisting situations. We strongly encourage you to enable [NSG flow logs](https://docs.microsoft.com/azure/network-watcher/network-watcher-nsg-flow-logging-overview) on your AKS Cluster subnets, build agent subnets, Azure Application Gateway, and other subnets that may be a source of traffic into and out of your cluster. Ensure you're sending your NSG Flow Logs to a **V2 Storage Account** and set your retention period in the Storage Account for these logs to a value that is at least as long as your compliance needs (e.g. 90 days).
+Observability into your network is critical for compliance. [Network Watcher](https://docs.microsoft.com/azure/network-watcher/network-watcher-monitoring-overview), combined with [Traffic Analytics](https://docs.microsoft.com/azure/network-watcher/traffic-analytics) will help provide a perspective into traffic traversing your networks. This reference implementation will _attempt_ to deploy NSG Flow Logs and Traffic Analytics. These features depend on a regional Network Watcher resource being installed on your subscription. Network Watchers are singletons in a subscription, and their creation is _usually_ automatic and might exist in a resource group you do not have RBAC access to. We strongly encourage you to enable [NSG flow logs](https://docs.microsoft.com/azure/network-watcher/network-watcher-nsg-flow-logging-overview) on your AKS Cluster subnets, build agent subnets, Azure Application Gateway, and other subnets that may be a source of traffic into and out of your cluster. Ensure you're sending your NSG Flow Logs to a **V2 Storage Account** and set your retention period in the Storage Account for these logs to a value that is at least as long as your compliance needs (e.g. 90 days).
In addition to Network Watcher aiding in compliance considerations, it's also a highly valuable network troubleshooting utility. As your network is private and heavy with flow restrictions, troubleshooting network flow issues can be time consuming. Network Watcher can help provide additional insight when other troubleshooting means are not sufficient.
-If you do not have Network Watchers and NSG Flow Logs enabled on your subscription, consider doing so via Azure Policy at the Subscription or Management Group level to provide consistent naming and region selection. See the [Deploy network watcher when virtual networks are created](https://portal.azure.com/#blade/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2Fa9b99dd8-06c5-4317-8629-9d86a3c6e7d9) policy combined with the [Flow logs should be enabled for every network security group](https://portal.azure.com/#blade/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2F27960feb-a23c-4577-8d36-ef8b5f35e0be) policy.
-
-The [subscription.json](../subscription.json) file does include related Network Watcher policies, and will attempt to deploy Network Watcher if you do not already have evidence of them in your subscription. Reference those policy implementations if you would like to evaluate your current Network Watcher deployment strategy.
+As an added measure use apply the [Flow logs should be enabled for every network security group](https://portal.azure.com/#blade/Microsoft_Azure_Policy/PolicyDetailBlade/definitionId/%2Fproviders%2FMicrosoft.Authorization%2FpolicyDefinitions%2F27960feb-a23c-4577-8d36-ef8b5f35e0be) Azure Policy at the Subscription or Management Group level.
### More strict Network Security Groups (NSGs)
-The NSGs that exist around the cluster node pool subnets specifically block any SSH access attempts only allow traffic from the vnet into them. As your workloads, system security agents, etc are deployed, consider adding even more NSG rules that help define the type of traffic that should and should not be traversing those subnet boundaries. Because each nodepool lives in its own subnet, you can apply more specific rules based on known/expected traffic patterns of your workload.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Subnet security through NSGs article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#subnet-security-through-network-security-groups-nsgs).
### Azure Key Vault network restrictions
-In this reference implementation, Azure Application Gateway (AAG) is sourcing its public-facing certificate from Azure Key Vault. This is great as it help support easier certificate rotation and certificate control. However, currently Azure Application Gateway does not support this on Azure Key Vault instances that are exclusively network restricted via Private Link. This reference implementation deploys Azure Key Vault in a hybrid model, supporting private link and public access specifically to allow AAG integration. Once [Azure Application Gateway supports private link access to Key Vault](https://docs.microsoft.com/azure/application-gateway/key-vault-certs#how-integration-works), we'll update this reference implementation. If this topology will not be suitable for your deployment, change the certificate management process in AAG to abandon the use of Key Vault for the public-facing TLS certificate and [handle the management of that certificate directly within AAG](https://docs.microsoft.com/azure/application-gateway/tutorial-ssl-cli). Doing so will allow your Key Vault instance to be fully isolated.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Azure Key Vault network restrictions article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#azure-key-vault-network-restrictions).
+
### Expanded NetworkPolicies
@@ -54,7 +55,8 @@ Not all user-provided namespaces in this reference implementation employ a zero-
### Enable DDoS Protection
-While not typically a feature of any specific regulated workloads, generally speaking [Azure DDoS Protection Standard](https://docs.microsoft.com/azure/ddos-protection/manage-ddos-protection) should be enabled for any virtual networks with a subnet that contains an Application Gateway with a public IP. This protects your workload from becoming overwhelmed with fraudulent requests which at best could cause a service disruption or at worst be a cover (distraction, log spam, etc) for another concurrent attack. Azure DDoS comes at a significant cost, and is typically amortized across many workloads that span many IP addresses -- work with your networking team to coordinate coverage for your workload.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [DDoS protection article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#ddos-protection).
+
@@ -65,7 +67,9 @@ While not typically a feature of any specific regulated workloads, generally spe
### Make use of container securityContext options
-When describing your workload's security needs, leverage all relevant [`securityContext` settings](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) for your containers. This includes basic items like `fsGroup`, `runAsUser` / `runAsGroup`, and setting `allowPriviledgeEscalation` to `false` (unless required). But it also means being explicit about defining/removing Linux `capabilities` and defining your SELinux options in `seLinuxOptions`. The workloads deployed in this reference implementation do NOT represent best practices, as this reference implementation was mainly infrastructure focused.
+When describing your workload's security needs, leverage all relevant [`securityContext` settings](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) for your containers. The workloads deployed in this reference implementation do NOT represent best practices, as this reference implementation was mainly infrastructure focused.
+
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Pod security article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#pod-security).
### Pin image versions
@@ -84,16 +88,12 @@ This guidance should also be followed when using the Dockerfile `FROM` command.
### Customized Azure Policies for AKS
-
-Generally speaking, the Azure Policies applied do not have workload-tuned settings applied. Specifically we're applying the **Kubernetes cluster pod security restricted standards for Linux-based workloads** initiative which does not allow tuning of settings. Consider exporting this initiative and customizing its values for your specific workload. You may wish to include all Gatekeeper `deny` Azure Policies under one custom Initiative and all `audit` Azure Policies under another to know strong "blocks" from "awareness only" policies.
-
-While it's common for Azure Policy to exclude `kube-system` and `gatekeeper-system` to policies, consider _including_ them in your `audit` policies for _added visibility_. Including those namespaces in `deny` policies could cause cluster failure due to an unsupported configuration. You may find some that are relatively safe, such as enforcing internal load balancers and HTTPS ingresses, but be aware if you apply these you may run into support concerns.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Azure Policy considerations article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#azure-policy-considerations).
### Customized Azure Policies for Azure resources
The reference implementation includes a few examples of Azure Policy that can act to help guard your environment against undesired configuration. One such example included in this reference implementation is the preventing of Network Interfaces or VM Scale Sales that have Public IPs from joining your cluster's Virtual Network. It's strongly recommended that you add prevents (deny-based policy) for resource configuration that would violate your regulatory requirements. If a built-in policy is not available, create custom policies like the ones illustrated in this reference implementation.
-
### Allow list for resource types
The reference implementation puts in place an allow list for what resource types are allowed in the various resource groups. This helps control what gets deployed, which can prevent an unexpected resource type from being deployed. If your subscription is exclusively for your regulated workload, then also consider only having the necessary [resource providers registered](https://docs.microsoft.com/azure/azure-resource-manager/management/azure-services-resource-providers#registration) to cover that service list. Don't register [resource providers for Azure services](https://docs.microsoft.com/azure/azure-resource-manager/management/azure-services-resource-providers) that are not going to be part of your environment. This will guard against a misconfiguration in Azure Policy's enforcement.
@@ -111,7 +111,7 @@ This reference implementation is expected to be deployed in a standalone subscri
### Enterprise onboarding to Security Center
-The Security Center onboarding in this reference implementation is relatively simplistic. Organizations inboard in Security Center and Azure Policy typically in a more holistic and governed fashion. Review the [Azure Security Center Enterprise Onboarding Guide](https://aka.ms/ASCOnboarding) for a complete end-to-end perspective on protecting your workloads (regulated and non) with Azure Security Center. This addresses enrollment, data exports to your SIEM or ITSM solution, Logic Apps for responding to alerts, building workflow automation, etc. All things that go beyond the base architecture of any one AKS solution, and should be addressed at the enterprise level.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Security monitoring article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#security-monitoring).
### Create triage process for alerts
@@ -141,9 +141,7 @@ While container images and other OCI artifacts typically do not contain sensitiv
### JIT and Conditional Access Policies
-AKS' control plane supports both [Azure AD PAM JIT](https://docs.microsoft.com/azure/aks/managed-aad#configure-just-in-time-cluster-access-with-azure-ad-and-aks) and [Conditional Access Policies](https://docs.microsoft.com/azure/aks/managed-aad#use-conditional-access-with-azure-ad-and-aks). We recommend that you minimize standing permissions and leverage JIT access when performing SRE/Ops interactions with your cluster. Likewise, Conditional Access Policies will add additional layers of required authentication validation for privileged access, based on the rules you build.
-
-For more details on using PowerShell to configure conditional access, see [Azure AD Conditional Access](./conditional-access.md)
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 7.2.1 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-identity#requirement-721) and this repo's [Azure AD Conditional Access](./conditional-access.md) page.
### Custom Cluster Roles
@@ -192,6 +190,8 @@ Inline, we talked about many ISV's security agents being able to detect relevant
Azure Sentinel was enabled in this reference implementation. No alerts were created or any sort of "usage" of it, other than enabling it. You may already be using another SIEM, likewise you may find that a SIEM is not cost effective for your solution. Evaluate if you will derive benefit from Azure Sentinel in your solution, and tune as needed.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 10.5 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-monitor#requirement-105)
+
## Disaster Recovery
@@ -203,11 +203,9 @@ Azure Sentinel was enabled in this reference implementation. No alerts were crea
While we generally discourage any storage of state within a cluster, you may find your workload demands in-cluster storage. Regardless if that data is in compliance scope or not, you'll often require a robust and secure process for backup and recovery. You may find a solution like Azure Backup (for Azure Disks and Azure Files), [Veeam Kasten K10](https://kasten.io), or [VMware Velero](https://velero.io/) instrumental in achieving any `PersistantVolumeClaim` backup and recovery strategies.
-As a bonus, your selected backup system might also handle Kubernetes resource (Deployments, ConfigMaps, etc) snapshots/backups. While Flux may be your primary method to reconcile your cluster back to a well-known state, you may wish to supplement with a solution like this to provide alternative methods for critical system recovery techniques (when reconcile or rebuild is not an option). A tool like this can also be a key source of data for drift detection and cataloging system state changes over time; akin to how File Integrity Monitoring solves for file-system level drift detection, but at the Kubernetes resource level.
-
All backup process needs to classify the data contained within the backup. This is true of data both within and external to your cluster. If the data falls within regulatory scope, you'll need extend your compliance boundaries to the lifecycle and destination of the backup -- which will be outside of the cluster. Consider geographic restrictions, encryption at rest, access controls, roles and responsibilities, auditing, time-to-live, and tampering prevention (check-sums, etc) when designing your backup system. Backups can be a vector for malicious intent, with a bad actor compromising a backup and then forcing an event in which their backup is restored.
-Lastly, in-cluster backup systems usually depend on begin run as highly-privileged during its operations; so consider the risk vs benefit when deciding to bring an agent like this into your cluster. Some agent's might overlap with another management solution you've brought to your cluster already for security concerns; evaluate what is the minimum set of tooling you'll need to accomplish this task and not introduce additional exposure/management into your cluster.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Cluster backups (state and resources) article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#cluster-backups-state-and-resources).
@@ -222,7 +220,7 @@ While this reference implementation uses Tresor as its TLS certificate provider
### Ingress Controller
-The ingress controller implemented in this reference implementation is relatively simplistic in implementation. It's currently using a wild-card certificate to handle default traffic when an `Ingress` resource doesn't contain a specific certificate. This might be fine for most customers, but if you have an organizational policy against using wildcard certs (even on your internal, private network), you may need to adjust your ingress controller to not support a "default certificate" and instead require ever workload to surface their own named certificate. This will impact how Azure Application Gateway is performing backend health checks.
+The ingress controller implemented in this reference implementation is relatively simplistic in implementation. It's currently using a wildcard certificate to handle default traffic when an `Ingress` resource doesn't contain a specific certificate. This might be fine for most customers, but if you have an organizational policy against using wildcard certs (even on your internal, private network), you may need to adjust your ingress controller to not support a "default certificate" and instead require ever workload to surface their own named certificate. This will impact how Azure Application Gateway is performing backend health checks.
@@ -235,9 +233,9 @@ The ingress controller implemented in this reference implementation is relativel
The in-cluster `omsagent` pods running in `kube-system` are the Log Analytics collection agent. They are responsible for gathering telemetry, scraping container `stdout` and `stderr` logs, and collecting Prometheus metrics. You can tune its collection settings by updating the [`container-azm-ms-agentconfig.yaml`](/cluster-manifests/kube-system/container-azm-ms-agentconfig.yaml) ConfigMap file. In this reference implementation, logging is enabled across `kube-system` and all your workloads. By default, `kube-system` is excluded from logging. Ensure you're adjusting the log collection process to achieve balance cost objectives, SRE efficiency when reviewing logs, and compliance needs.
-### Retention
+### Retention and continous export
-All Log Analytics workspaces deployed as part of this solution are set to a 90-day retention period. If you wish to retain logs longer than that for organizational or compliance reasons, consider setting up [continuous export](https://docs.microsoft.com/azure/azure-monitor/logs/logs-data-export) to a long term storage solution such as Azure Storage. Ideally log data should not contain sensitive information, however in case they do (even unintentionally); ensure access to archived log data is treated with the same due diligence as recent log data.
+> :notebook: See the Azure Architecture Center PCI-DSS 3.2.1 for AKS [Security monitoring article](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#security-monitoring).
diff --git a/docs/conditional-access.md b/docs/conditional-access.md
index f50f51f7..64164623 100644
--- a/docs/conditional-access.md
+++ b/docs/conditional-access.md
@@ -8,6 +8,8 @@ Work with your Conditional Access administrator [to apply a policy](https://docs
Remember to test all conditional access policies using a safe and controlled rollout procedure before applying to all users. Paired with [Azure AD JIT access](https://docs.microsoft.com/azure/aks/managed-aad#configure-just-in-time-cluster-access-with-azure-ad-and-aks), this provides a very robust access control solution for your private cluster.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 8.2 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-identity#requirement-82).
+
## Applying via Windows PowerShell
For many administrators, PowerShell is already an understood scripting tool. The following example shows how to use the Azure AD PowerShell module to apply a Conditional Access policy.
diff --git a/docs/deploy/02-ca-certificates.md b/docs/deploy/02-ca-certificates.md
index 3f89360c..ced09e8c 100644
--- a/docs/deploy/02-ca-certificates.md
+++ b/docs/deploy/02-ca-certificates.md
@@ -13,6 +13,8 @@ To support end-to-end TLS encryption, the following TLS certificates are procure
:warning: Do not use the certificates created by these instructions for actual deployments. The use of self-signed certificates are provided for ease of illustration purposes only. For your cluster, use your organization's requirements for procurement and lifetime management of TLS certificates, _even for development purposes_.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 3.6 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-data#requirement-36) and [TLS encryption architecture considerations](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#tls-encryption).
+
## Steps
1. Create the certificate for Azure Application Gateway with a common name of `bicycle.contoso.com`.
diff --git a/docs/deploy/03-aad.md b/docs/deploy/03-aad.md
index bfb2d2a4..5a3820b1 100644
--- a/docs/deploy/03-aad.md
+++ b/docs/deploy/03-aad.md
@@ -69,7 +69,10 @@ Following the steps below will result in an Azure AD configuration that will be
1. Set up Azure AD conditional access policies. _Optional. Requires Azure AD Premium._
- To support an even stronger authentication model, consider [setting up Conditional Access Policies in Azure AD for your cluster](https://docs.microsoft.com/azure/aks/managed-aad#use-conditional-access-with-azure-ad-and-aks). This allows you to further apply restrictions on access to the Kubernetes control plane (e.g. management commands executed through `kubectl`). With conditional access policies in place, you can for example, _require_ multi-factor authentication, restrict authentication to devices that are managed by your Azure AD tenant, or block non-typical sign-in attempts. You will want to apply this to Azure AD groups that are assigned to your cluster with permissions you deem warrant the extra policies (most notability the cluster admin group created above). You will not be setting that up as part of this walkthrough, but [strongly consider doing so](../conditional-access.md) for your final implementation as part of your defense-in-depth strategy and to support compliance requirements.
+ To support an even stronger authentication model, consider [setting up Conditional Access Policies in Azure AD for your cluster](https://docs.microsoft.com/azure/aks/managed-aad#use-conditional-access-with-azure-ad-and-aks). This allows you to further apply restrictions on access to the Kubernetes control plane (e.g. management commands executed through `kubectl`). With conditional access policies in place, you can for example, _require_ multi-factor authentication, restrict authentication to devices that are managed by your Azure AD tenant, or block non-typical sign-in attempts. You will want to apply this to Azure AD groups that are assigned to your cluster with permissions you deem warrant the extra policies (most notability the cluster admin group created above). You will not be setting that up as part of this walkthrough.
+
+ > :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 8.2 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-identity#requirement-82).
+
### Next step
diff --git a/docs/deploy/04-subscription.md b/docs/deploy/04-subscription.md
index 3a8a0afc..ccd4fb1d 100644
--- a/docs/deploy/04-subscription.md
+++ b/docs/deploy/04-subscription.md
@@ -4,7 +4,7 @@ In the prior step, you've set up an Azure AD tenant to fullfil your [cluster's c
## Subscription and resource group topology
-This reference implementation is split across several resource groups in a single subscription. This is to replicate the fact that many organizations will split certain responsibilities into specialized subscriptions (e.g. regional hubs/vwan in a _Connectivity_ subscription and workloads in landing zone subscriptions). We expect you to explore this reference implementation within a single subscription, but when you implement this cluster at your organization, you will need to take what you've learned here and apply it to your expected subscription and resource group topology (such as those [offered by the Cloud Adoption Framework](https://docs.microsoft.com/azure/cloud-adoption-framework/decision-guides/subscriptions/).) This single subscription, multiple resource group model is for simplicity of demonstration purposes only.
+This reference implementation is split across several resource groups in a single subscription. This is to replicate the fact that many organizations will split certain responsibilities into specialized subscriptions (e.g. regional hubs in a _Connectivity_ subscription and workloads in landing zone subscriptions). We expect you to explore this reference implementation within a single subscription, but when you implement this cluster at your organization, you will need to take what you've learned here and apply it to your expected subscription and resource group topology; such as those [offered by the Cloud Adoption Framework](https://docs.microsoft.com/azure/cloud-adoption-framework/decision-guides/subscriptions/). This single subscription, multiple resource group model is for simplicity of demonstration purposes only.
## Expected results
@@ -47,6 +47,8 @@ For this reference implementation, our Azure Policies applied to these resource
This is not an exhaustive list of Azure Policies that you can create or assign, and instead an example of the types of polices you should consider having in place. Policies like these help prevent a misconfiguration of a service that would expose you to unexpected compliance concerns. Let the Azure control plane guard against configurations that are untenable for your compliance requirements as an added safeguard. While we deploy policies at the subscription and resource group scope, your organization may also utilize management groups. We've found it's best to also ensure your target subscription and target resource groups have "scope-local" policies specific to their needs; so it doesn't take a dependency on a higher order policy existing or not -- even if that leads to a duplication of policy.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 2.2.4 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-224) and [Azure Policy considerations](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#azure-policy-considerations).
+
Also, depending on your workload subscription scope, some of the policies applied above may be better suited at the subscription level (like no public AKS clusters). Since we don't assume you're coming to this walkthrough with a dedicated subscription, we've scoped the restrictions to only those resource groups we ask you to create. Apply your policies where it makes the most sense to do so in your final implementation.
### Security Center activated
@@ -110,6 +112,8 @@ Not only do we enable them in the steps below by default, but also set up an Azu
It is recommended that your Azure _subscription_ have the **Azure Security Benchmark** Azure Policy initiative applied. We could not deploy it in ARM above, as we don't want to overwrite anything already existing in your subscription. This policy can only be applied once for Security Center to detect it properly, and if we deployed a version above, you might inadvertently break existing policy configuration on your subscription. If you have the ability to apply it without any negative impact on other resources your subscription, you can do so by doing the following.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 2.2 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-22).
+
### Steps
1. Open the [**Regulatory Compliance** screen in Security Center](https://portal.azure.com/#blade/Microsoft_Azure_Security/SecurityMenuBlade/22)
diff --git a/docs/deploy/05-networking-hub.md b/docs/deploy/05-networking-hub.md
index 9d048d9e..1a5fe31d 100644
--- a/docs/deploy/05-networking-hub.md
+++ b/docs/deploy/05-networking-hub.md
@@ -4,7 +4,9 @@ Now that your subscription has your [Azure Policies and target resource-groups i
## Networking in this architecture
-Egressing your spoke traffic through a hub network (following the hub-spoke model), is a critical component of this AKS architecture. Your organization's networking team will likely have a specific strategy already in place for this; such as a _Connectivity_ subscription with a [Virtual WAN](https://docs.microsoft.com/azure/virtual-wan/virtual-wan-about) already configured for regional egress. In this walk through, we are going to implement this recommended strategy in an illustrative manner, however you will need to adjust based on your specific situation when you implement this cluster for production. Hubs are usually a centrally-managed and governed resource in an organization, and not typically workload specific. The steps that follow create the hub (and spokes) as a stand-in for the work that you'd coordinate with your networking team.
+Egressing your spoke traffic through a hub network (following the hub-spoke model), is a critical component of this AKS architecture. Your organization's networking team will likely have a specific strategy already in place for this; such as a _Connectivity_ subscription already configured for regional egress. In this walkthrough, we are going to implement this recommended strategy in an illustrative manner, however you will need to adjust based on your specific situation when you implement this cluster for production. Hubs are usually a centrally-managed and governed resource in an organization, and not typically workload specific. The steps that follow create the hub (and spokes) as a stand-in for the work that you'd coordinate with your networking team.
+
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 1 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-1install-and-maintain-a-firewall-configuration-to-protect-cardholder-data) and [Networking configuration](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#networking-configuration).
## Expected results
diff --git a/docs/deploy/06-aks-jumpboximage.md b/docs/deploy/06-aks-jumpboximage.md
index 90e4124c..d621e10a 100644
--- a/docs/deploy/06-aks-jumpboximage.md
+++ b/docs/deploy/06-aks-jumpboximage.md
@@ -108,6 +108,8 @@ Now that we have our image building network created, egressing through our hub,
This specific jump box image is considered general purpose; its creation process and supply chain has not been hardened. For example, the jump box image is built on a public base image, and is pulling OS package updates from Ubuntu and Microsoft public servers. Additionally tooling such as Azure CLI, Helm, Flux, and Terraform are installed straight from the Internet. Ensure processes like these adhere to your organizational policies; pulling updates from your organization's patch servers, and storing well-known 3rd party dependencies in trusted locations that are available from your builder's subnet. If all necessary resources have been brought "network-local", the NSG and Azure Firewall allowances should be made even tighter. Also apply all standard OS hardening procedures your organization requires for privileged access machines such as these. Finally, ensure all desired security and logging agents are installed and configured. All jump boxes (or similar access solutions) should be _hardened and monitored_, as they span two distinct security zones. **Both the jump box and its image/container are attack vectors that needs to be considered when evaluating cluster access solutions**; they must be considered as part of your compliance concerns.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 1.4 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-14) and [PCI-DSS 3.2.1 Requirement 5 & 6 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-malware).
+
## Pipelines and other considerations
Image building using Azure Image Builder lends itself well to having a secured, auditable, and transient image building infrastructure. Consider building pipelines around the generation of hardened and approved images to create a repeatably compliant output. Also we recommend pushing these images to your organization's [Azure Shared Image Gallery](https://docs.microsoft.com/azure/virtual-machines/shared-image-galleries) for geo-distribution and added management capabilities. These features were skipped for this reference implementation to avoid added illustrative complexity.
diff --git a/docs/deploy/07-aks-jumpbox-users.md b/docs/deploy/07-aks-jumpbox-users.md
index 2dd2a5de..6ab2c67c 100644
--- a/docs/deploy/07-aks-jumpbox-users.md
+++ b/docs/deploy/07-aks-jumpbox-users.md
@@ -6,6 +6,8 @@ You've [built the jump box image](./06-aks-jumpboximage.md), now you need to bui
You have multiple options on how you manage your jump box users. Because jump box user management isn't the focus of the walkthrough, we'll stick with a relatively straight-forward mechanism to keep you moving. However, generally you'll want to ensure you're using a solution like [Linux Active Directory sign-in](https://docs.microsoft.com/azure/virtual-machines/linux/login-using-aad) so that you can take advantage of Azure AD Conditional Access policies, JIT permissions, etc. Employ whatever user governance mechanism will help you achieve your desired compliance outcome and still being able to easily on- and off-board users as your ops teams' needs and personnel change.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 7 & 8 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-identity).
+
## Expected results
Following the steps below, you'll end up with a SSH public-key-based solution that leverages [cloud-init](https://docs.microsoft.com/azure/virtual-machines/linux/using-cloud-init). The results will be captured in `jumpBoxCloudInit.yml` which you will later convert to Base64 for use in your cluster's ARM template.
diff --git a/docs/deploy/10-pre-bootstrap.md b/docs/deploy/10-pre-bootstrap.md
index 0c490466..1a2a7428 100644
--- a/docs/deploy/10-pre-bootstrap.md
+++ b/docs/deploy/10-pre-bootstrap.md
@@ -18,6 +18,8 @@ You'll end up with the following images imported into your ACR instance, after h
Quarantining first- and third-party images is a recommended security practice. This allows you to get your images onto a dedicated container registry and subject them to any sort of security/compliance scrutiny you wish to apply. Once validated, they can then be promoted to being available to your cluster. There are many variations on this pattern, with different tradeoffs for each. For simplicity in this walkthrough we are simply going to import our images to repository names that starts with `quarantine/`. We'll then show you Azure Security Center's scan of those images, and then you'll import those same images directly from `quarantine/` to `live/` repositories (retaining their sha256 digest). We've restricted our cluster to only allow pulling from `live/` repositories and we've built an alert if an image was imported to `live/` from a source other than `quarantine/`. This isn't a preventative security control; _this won't block a direct import_ request or _validate_ that the image actually passed quarantine checks. There are other solutions you can use for this pattern that are more exhaustive. [Aquasec](https://go.microsoft.com/fwlink/?linkid=2002601&clcid=0x409) and [Twistlock](https://go.microsoft.com/fwlink/?linkid=2002600&clcid=0x409) both offer integrated solutions specifically for Azure Container Registry scanning and compliance management. Azure Container Registry has an [integrated quarantine feature](https://docs.microsoft.com/azure/container-registry/container-registry-faq#how-do-i-enable-automatic-image-quarantine-for-a-registry) as well that could be considered, however it is in preview at this time.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 6.3.2 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-malware#requirement-632).
+
## Deployment pipelines
Your deployment pipelines are one of the first lines of defense in container image security. Shifting left by introducing build steps like [GitHub Image Scanning](https://github.com/Azure/container-scan) (which leverages common tools like [dockle](https://github.com/goodwithtech/dockle) and [Aquasec trivy](https://github.com/aquasecurity/trivy)) will help ensure that, at build time, your images are linted, CIS benchmarked, and free from known vulnerabilities. You can use any tooling at this step that you trust, including paid, ISV solutions that help provide your desired level of confidence and compliance.
@@ -34,6 +36,8 @@ Using a security agent that is container-aware and can operate from within the c
**Static analysis, registry scanning, and continuous scanning should be the workflow for all of your images; both your own first party and any third party images you use.**
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 5 & 6 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-malware).
+
## Steps
1. Quarantine Flux and other public bootstrap security/utility images.
diff --git a/docs/deploy/11-gitops.md b/docs/deploy/11-gitops.md
index bbfd8bbf..b4223686 100644
--- a/docs/deploy/11-gitops.md
+++ b/docs/deploy/11-gitops.md
@@ -217,10 +217,12 @@ Your dependency on or choice of in-cluster tooling to achieve your compliance ne
This reference implementation also installs a very basic deployment of [Falco](https://falco.org/). It is not configured for alerts, nor tuned to any specific needs. It uses the default rules as they were defined when its manifests were generated. This is also being installed for illustrative purposes, and you're encouraged to evaluate if a solution like Falco is relevant to you. If so, in your final implementation, review and tune its deployment to fit your needs (E.g. add custom rules like [CVE detection](https://artifacthub.io/packages/search?ts_query_web=cve&org=falco), [sudo usage](https://artifacthub.io/packages/falco/security-hub/admin-activities), [basic FIM](https://artifacthub.io/packages/falco/security-hub/file-integrity-monitoring), [SSH Connection monitoring](https://artifacthub.io/packages/falco/security-hub/ssh-connections), and [NGINX containment](https://artifacthub.io/packages/falco/security-hub/nginx)). This tooling, _as most security tooling will be_, is **highly-privileged within your cluster**. Usually running as DaemonSets with access to the underlying node in a manor that is well beyond any typical workload in your cluster. Remember to consider the runtime compute and networking requirements of your security tooling when sizing your cluster, as these can often be overlooked when initial cluster sizing conversations are happening.
-It's worth repeating again, **most customers with regulated workloads are bringing ISV or open source security solutions to their clusters**. Azure Kubernetes Service is a managed Kubernetes platform, it does not imply that you will exclusively be using Microsoft products/solutions to solve your requirements. For the most part, after the deployment of the infrastructure and some out-of-the-box addons (like Azure Policy, Azure Monitor, AAD Pod Identity, Open Service Mesh), you're in charge of what you choose to run in your hosted Kubernetes platform. Bring the business and compliance solving solutions you need to the cluster from the [vast and ever-growing Kubernetes and CNCF ecosystem](https://l.cncf.io/?fullscreen=yes).
+It's worth repeating again, **most customers with regulated workloads are bringing ISV or open source security solutions to their clusters**. Azure Kubernetes Service is a managed Kubernetes platform, it does not imply that you will exclusively be using Microsoft products/solutions to solve your requirements. For the most part, after the deployment of the infrastructure and some out-of-the-box addons (like Azure Policy, Azure Monitor, Azure AD Pod Identity, Open Service Mesh), you're in charge of what you choose to run in your hosted Kubernetes platform. Bring the business and compliance solving solutions you need to the cluster from the [vast and ever-growing Kubernetes and CNCF ecosystem](https://l.cncf.io/?fullscreen=yes).
**You should ensure all necessary tooling and related reporting/alerting is applied as part of your _initial bootstrapping process_ to ensure coverage _immediately_ after cluster creation.**
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 5 & 6 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-malware).
+
### Next step
:arrow_forward: [Deploy your workload](./12-workload.md)
diff --git a/docs/deploy/12-workload.md b/docs/deploy/12-workload.md
index ae259cc5..0b54cf7d 100644
--- a/docs/deploy/12-workload.md
+++ b/docs/deploy/12-workload.md
@@ -55,6 +55,8 @@ While typically workload deployment happens via deployment pipelines, to keep th
In addition to namespaces, the cluster also has dedicated node pools for the "in-scope" components. This helps ensure that out-of-scope workload components (where possible), do not run on the same hardware as the in-scope components. Ideally your in-scope node pools will run just those workloads that deal with in-scope regulatory data and the security agents to support the your regulatory obligations. These two node pools benefit from being on separate subnets as well, which allows finer control as the Azure Network level (NSG rules and Azure Firewall rules).
+ > :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 2.2.1 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-221).
+
```bash
cd ../workload
@@ -82,6 +84,8 @@ The foundation of in-cluster network security is Kubernetes Network Policies. Th
For a namespace in which services will be talking to other services, **the recommended zero-trust network policy for AKS can be found in [networkpolicy-denyall.yaml](cluster-manifests/a0005-i/networkpolicy-denyall.yaml)**. This blocks ALL traffic (in and out) other than outbound to kube-dns (which is CoreDNS in AKS). If you don't need DNS resolution across all workloads in the namespace, then you can remove that from the deny all and apply it selectively to pods that do require it (if any).
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 1.1.4 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-114).
+
#### Alternatives
If you want to have more advanced network policies than what Azure NPM supports, you'll need to bring in another solution, such as [Project Calico](https://docs.microsoft.com/azure/aks/use-network-policies#network-policy-options-in-aks). Solutions like Project Calico these extend Kubernetes Network Policies to be more advanced in their scope (such as introducing "cluster-wide" policies, advanced pod/namespace selectors, and may even support Layer 7 constructs (routes and FQDNs). See more choices in the [CNCF Networking landscape](https://landscape.cncf.io/card-mode?category=cloud-native-network&grouping=category).
@@ -102,6 +106,8 @@ This reference implementation is using [Open Service Mesh](https://openserviceme
**Using a service mesh is not a requirement.** The most obvious benefit is the transparent mTLS features that typically come with service mesh implementations. **Not all regulatory requirements demand TLS between components in your already private network.** Consider the management and complexity cost of any solution you bring into your cluster, and understand where your regulatory obligations fit into that.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 1.1.4 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-network#requirement-114) and [TLS encryption architecture considerations](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-ra-code-assets#tls-encryption).
+
### Defense in depth
Network security in AKS is indeed a defense-in-depth strategy; as affordances exist at various levels, with various capabilities, both within and external to the cluster.
@@ -135,6 +141,8 @@ This reference implementation doesn't dive into security best practices of your
Likewise, this reference implementation does not get into workload architecture with regard to compliance concerns. This includes things like data regulations like encryption in transit and at rest, data access controls, selecting and configuring storage technology, data residency, etc. Your regulatory scope extends beyond the infrastructure and into your workloads. While you must have a compliant foundation to deploy those workloads into, your workloads need a similar level of compliance scrutiny applied to them. All of these topics are beyond the scope of the learning objective in this walkthrough, which is AKS cluster architecture for regulated workloads.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 3 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-data#requirement-31).
+
### Next step
:arrow_forward: [End-to-End Validation](./13-validation.md)
diff --git a/docs/deploy/13-validation.md b/docs/deploy/13-validation.md
index 8b720e64..d97f9133 100644
--- a/docs/deploy/13-validation.md
+++ b/docs/deploy/13-validation.md
@@ -34,13 +34,15 @@ This section will help you to validate the workload is exposed correctly and res
Your workload is placed behind a Web Application Firewall (WAF), which has rules designed to stop intentionally malicious activity. You can test this by triggering one of the built-in rules with a request that looks malicious.
-> :bulb: This reference implementation enables the built-in OWASP 3.1 ruleset, in **Prevention** mode.
+> :bulb: This reference implementation enables the built-in OWASP 3.2 ruleset, in **Prevention** mode.
### Steps
1. Browse to the site with the following appended to the URL: `?sql=DELETE%20FROM` (e.g. ).
1. Observe that your request was blocked by Application Gateway's WAF rules and your workload never saw this potentially dangerous request.
+For a more exhaustive WAF test and validation experience, try the [Azure Web Applicationg Firewall Security Protection and Detection Lab](https://techcommunity.microsoft.com/t5/azure-network-security/tutorial-overview-azure-web-application-firewall-security/ba-p/2030423).
+
### Next step
:arrow_forward: [Access resource logs & Azure Security Center data](./13-validation-logs.md)
diff --git a/docs/rbac-suggestions.md b/docs/rbac-suggestions.md
index 82142d41..333147d6 100644
--- a/docs/rbac-suggestions.md
+++ b/docs/rbac-suggestions.md
@@ -6,6 +6,8 @@ This reference implementation is mostly focused on infrastructure, and minimal a
If you're looking for a list of recommended roles to delineate responsibilities across, consider the following. Obviously you'll need to build roles that are reasonable for your organization and workload.
+> :notebook: See [Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 7.1.1 in AKS](https://docs.microsoft.com/azure/architecture/reference-architectures/containers/aks-pci/aks-pci-identity#requirement-711).
+
* **Application Developers** are responsible for developing software in service to the business. All code developed by this role is subject to a set of training and quality gates upholding compliance, attestation, and release management processes. This role might be granted some read privileges in related kubernetes namespaces and read privileges on related workload Azure resources, but this role is not responsible for deploying or modifying any transitioning state in a running system. This team may manage build pipelines, but usually not deployment pipelines.
* **Application Owners** are responsible for defining and prioritizing features; aligning with business outcomes. They need to understand how features impact the compliance scoping of the workload, and balance customer data protection and ownership with business objectives.