diff --git a/self-hosted/aws/onboard.mdx b/self-hosted/aws/onboard.mdx index 4c8a7dc4..c69f9554 100644 --- a/self-hosted/aws/onboard.mdx +++ b/self-hosted/aws/onboard.mdx @@ -12,8 +12,10 @@ sidebarTitle: Onboarding After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the -deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you -must first set up your AWS account as follows. +deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options: + +- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure. +- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure. ## Questions? Need help? @@ -22,9 +24,94 @@ email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io [contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible. -## Onboarding checklist +## Do it all for me + +If you want Unstructured to set up the required infrastructure for you in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with +the access credentials for an IAM user or service principal in your AWS account that has the following required permissions. + +### Core networking permissions + +For VPC and subnet management: + +- `ec2:CreateVpc` +- `ec2:CreateSubnet` +- `ec2:CreateRouteTable` +- `ec2:CreateInternetGateway` +- `ec2:CreateNatGateway` +- `ec2:ModifyVpcAttribute` (for DNS settings) +- `ec2:AssociateRouteTable`, `ec2:CreateRoute` (for public and private route tables) +- `ec2:AllocateAddress` (for Elastic IP assignment to the NAT Gateway) + +For security group rules: + +- `ec2:AuthorizeSecurityGroupIngress/Egress` (to configure cluster and node security groups to allow VPC CIDR traffic) + +### EKS permissions + +For the cluster role: + +- Attach the managed policies `AmazonEKSClusterPolicy` and `AmazonEKSVPCResourceController` to a role with `sts:AssumeRole` trust for `eks.amazonaws.com` + +For the node group role: + +Attach these managed policies: + +- `AmazonEKSWorkerNodePolicy` (for node operations) +- `AmazonEKS_CNI_Policy` (for networking) +- `AmazonEC2ContainerRegistryReadOnly` (for ECR access) + +For OIDC integration: + +- `iam:CreateOpenIDConnectProvider` (to associate the EKS cluster with IAM OIDC) +- `iam:CreateRole` + `iam:AttachRolePolicy` (for service accounts in the `recommender`, `etl-operator`, and `data-broker` namespaces) + +### Storage and database + +These permissions: + +- `s3:CreateBucket` +- `s3:PutBucketVersioning` +- `s3:PutBucketEncryption` + +For these S3 buckets: + +- `u10d-*-etl-blob-cache` +- `u10d-*-etl-job-db` +- `u10d-*-etl-job-status` +- `u10d-*-job-files` + +For RDS: + +- `rds:CreateDBInstance` +- `rds:CreateDBSubnetGroup` +- `rds:CreateDBSecurityGroup` + `ec2:AuthorizeSecurityGroupIngress` (to allow VPC CIDR access) + +### Add-ons and utilities + +For the EBS CSI Driver: + +- `eks:CreateAddon` with IAM role attachment permissions for the `ebs.csi.aws.com` service account + +For the SSH Key: + +- `ec2:CreateKeyPair` + `ec2:ExportKeyPair` (for node group remote access) + +### Cross-service requirements + +- For IAM: `iam:PassRole` (to assign roles to EKS, RDS, and S3) +- For KMS: `kms:CreateKey` (if using CMK for S3 and RDS encryption) +- For CloudFormation: `cloudformation:*` + +For least privilege, scope resource ARNs in policies (for example, restrict S3 bucket names with wildcards such as `u10d-*-etl*`). +The EKS Pod Identity Agent requires `eks-auth:AssumeRoleForPodIdentity` permission on node roles when used with IRSA. + +## Bring my own infrastructure + +If you want to set up the required infrastructure yourself, set things up as follows within your AWS account for Unstructured to deploy the Unstructured UI and API into. -Set up the following infrastructure within your AWS account for Unstructured to deploy the Unstructured UI and API into. +You must also provide your Unstructured sales representative or technical enablement contact with +the access credentials for an IAM user or service principal in your AWS account that has access to the target Amazon Elastic Kubernetes Service (EKS) cluster to deploy the +Unstructured UI and API into. ### VPC and networking diff --git a/self-hosted/azure/onboard.mdx b/self-hosted/azure/onboard.mdx index 97a6d1bc..ce1cd3a4 100644 --- a/self-hosted/azure/onboard.mdx +++ b/self-hosted/azure/onboard.mdx @@ -12,8 +12,10 @@ sidebarTitle: Onboarding After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the -deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you -must first set up your Azure account as follows. +deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options: + +- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure. +- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure. ## Questions? Need help? @@ -22,9 +24,68 @@ email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io [contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible. -## Onboarding checklist +## Do it all for me + +If you want Unstructured to set up the required infrastructure for you into your Azure account and then deploy the Unstructured UI and API into that newly created infrastrucrure, then provide your Unstructured sales representative or technical enablement contact with +the access credentials for a Microsoft Entra ID user or service principal in your Azure account that has the following required permissions. + +### Subscription and resource group + +- `Microsoft.Resources/subscriptions/resourceGroups/write` (to create the resource group) +- `Microsoft.Resources/subscriptions/resourceGroups/read` (to read the resource group) + +### VNet and networking + +- `Microsoft.Network/virtualNetworks/write` (to create the VNet) +- `Microsoft.Network/virtualNetworks/read` (to read the VNet) +- `Microsoft.Network/publicIPAddresses/write` (to create the public IPs) +- `Microsoft.Network/publicIPAddresses/read` (to read the public IPs) +- `Microsoft.Network/natGateways/write` (to create the NAT Gateway) +- `Microsoft.Network/natGateways/read` (to read the NAT Gateway) +- `Microsoft.Network/routeTables/write` (to create the route tables) +- `Microsoft.Network/routeTables/read` (to read the route tables) +- `Microsoft.Network/networkSecurityGroups/write` (to create the NSGs) +- `Microsoft.Network/networkSecurityGroups/read` (to read the NSGs) + +### AKS cluster + +- `Microsoft.ContainerService/managedClusters/write` (to create the AKS cluster) +- `Microsoft.ContainerService/managedClusters/read` (to read the AKS cluster) +- `Microsoft.ContainerService/agentPools/write` (to create the node pools) +- `Microsoft.ContainerService/agentPools/read` (to read the node pools) + +### Managed identities and RBAC + +- `Microsoft.ManagedIdentity/userAssignedIdentities/write` (to create the managed identities) +- `Microsoft.ManagedIdentity/userAssignedIdentities/read` (to read managed identities) +- Assign built-in roles such as: + + - **Contributor** or scoped **Network Contributor** for the AKS cluster identity + - **Monitoring Metrics Publisher**, **AcrPull**, and **Storage Blob Data Reader** for the node pool identity + - **Storage Blob Data Contributor** for workload identities + +### Kubernetes add-ons + +Permissions depend on the Helm/YAML installation, but Azure RBAC integration requires `Microsoft.ContainerService/managedClusters/accessProfiles/*/read` (to access kubeconfig) + +### Storage class + +- `Microsoft.Storage/storageAccounts/write` (to create the storage account for CSI driver provisioning) +- `Microsoft.Storage/storageAccounts/read` + +### PostgreSQL database + +- `Microsoft.DBforPostgreSQL/flexibleServers/write` (to create the PostgreSQL server) +- `Microsoft.DBforPostgreSQL/flexibleServers/read` +- NSG permissions for database access: allow traffic from the VNet CIDR + +## Bring my own infrastructure + +If you want to set up the required infrastructure yourself, set things up as follows within your Azure account for Unstructured to deploy the Unstructured UI and API into. -Set up the following infrastructure within your Azure account for Unstructured to deploy the Unstructured UI and API into. +You must also provide your Unstructured sales representative or technical enablement contact with +the access credentials for an IAM user or service principal in your AWS account that has access to the target Azure Kubernetes Service (AKS) cluster to deploy the +Unstructured UI and API into. ### **Azure subscription and resource group** diff --git a/self-hosted/gcp/onboard.mdx b/self-hosted/gcp/onboard.mdx index 1f7069b2..bf044b75 100644 --- a/self-hosted/gcp/onboard.mdx +++ b/self-hosted/gcp/onboard.mdx @@ -12,8 +12,10 @@ sidebarTitle: Onboarding After your organization has signed the self-hosting agreement with Unstructured, a member of the Unstructured technical enablement team will reach out to you to begin the -deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. To do this, you -must first set up your GCP account as follows. +deployment onboarding process. To streamline this process, you are encouraged to begin setting up your target environment as soon as possible. Choose one of the following setup options: + +- [Do it all for me](#do-it-all-for-me): Have Unstructured set up the required infrastructure in your AWS account and then deploy the Unstructured UI and API into that newly created infrastructure. +- [Bring my own infrastructure](#bring-my-own-infrastructure): Set up the required infrastructure yourself in your AWS account, and then have Unstructured deploy the Unstructured UI and API into your existing infrastructure. ## Questions? Need help? @@ -22,9 +24,102 @@ email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io [contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or technical enablement teams will get back to you as soon as possible. -## Onboarding checklist +## Do it all for me + +If you want Unstructured to set up the required infrastructure for you in your GCP account and then deploy the Unstructured UI and API into that newly created infrastructure, then provide your Unstructured sales representative or technical enablement contact with +the access credentials for an IAM user or service account in your GCP account that has the following required permissions: + +### Core networking permissions + +VPC/subnet management: + +- `compute.networks.create` +- `compute.subnetworks.create` +- `compute.routers.create` (for Cloud NAT) +- `compute.addresses.create` (for NAT IPs) +- `compute.firewalls.create` (for intra-cluster traffic rules) + +Shared VPC (if used): + +- `compute.organizations.admin` (for the host project) +- `compute.networks.use` (for the service project) + +### GKE cluster permissions + +Control plane: + +- `container.clusters.create` +- `container.clusters.update` (for private cluster settings) +- `compute.networks.useExternalIp` (for public endpoint access) + +Node pools: + +- `compute.instances.create` +- `compute.disks.create` (for node disks) +- `compute.instanceGroups.create` (for autoscaling) + +IAM roles: + +- For the GKE cluster SA service account: `roles/container.hostServiceAgentUser` +- For the node SA service account: `roles/container.nodeServiceAccount` +- For the workload identity service account: `roles/iam.workloadIdentityUser` + +### Storage and database + +GCS buckets: + +- `storage.buckets.create` +- `storage.objects.create` (for versioning) +- `storage.buckets.update` (for encryption/lifecycle rules) + +Cloud SQL: + +- `cloudsql.instances.create` +- `cloudsql.instances.connect` (for private IPs) +- `vpcaccess.connectors.use` (if using Serverless VPC Access) + +Persistent disks (CSI): + +- `compute.disks.create` (for `pd.csi.storage.gke.io`) +- `compute.subnetworks.use` (for regional disks) + +### Advanced configurations + +Workload identity: + +- `iam.serviceAccounts.getAccessToken` (for federated access) +- `iam.serviceAccounts.setIamPolicy` (to bind Kubernetes SAs to GCP SAs) + +Cloud NAT: + +- `compute.routers.update` (for NAT configuration) +- `compute.addresses.use` (for NAT IP allocation) + +OS login/SSH: + +- `compute.projects.setCommonInstanceMetadata` (for SSH key upload) +- `compute.instances.osAdminLogin` + +### Minimum required roles + +Project level: + +- `roles/editor` (broad access, or scope with custom roles) + +Scoped roles: + +- `roles/compute.networkAdmin` (for VPC and subnets) +- `roles/container.admin` (for GKE) +- `roles/storage.admin` (for GCS) +- `roles/cloudsql.admin` (for Postgres) + +## Bring my own infrastructure + +If you want to set up the required infrastructure yourself, set things up as follows within your GCP account for Unstructured to deploy the Unstructured UI and API into. -Set up the following infrastructure within your GCP account for Unstructured to deploy the Unstructured UI and API into. +You must also provide your Unstructured sales representative or technical enablement contact with +the access credentials for an IAM user or service account in your GCP account that has access to the target Google Kubernetes Engine (GKE) cluster to deploy the +Unstructured UI and API into. ### **VPC and networking (GCP equivalent)**