Skip to content

(2023.05) PCUI release broken in us gov west 1

Charlie edited this page Jun 2, 2023 · 7 revisions

The issue

Due to an issue with ParallelCluster UI 2023.05 deployment of PCUI in GovCloud west region will fail with the following error message show in the CloudFormation events :

Partition "aws" is not valid for resource "arn:aws:ce:us-east-1::/*".
 (Service: AmazonIdentityManagement; Status Code: 400; Error Code: MalformedPolicyDocument; Request ID: 3ef760f7-3467-487a-8eb3-130712e1791c; Proxy: null)

The issue is attributed to Cost Monitoring feature not being supported in GovCloud west region, and the PCUI deployment trying to activate it resulting in the error and failure to deploy.

Affected versions

AWS ParallelCluster UI 2023.05 is impacted due to the fact that Cost monitoring feature is not supported in GovCloud west region

Mitigation

The suggested mitigation is to patch the latest available version of PCUI on the PCUI Github repository and run the script provided below to launch the PCUI CloudFormation stack to successfully deploy PCUI. This mitigation will result in some expected errors on the PCUI console. We plan to release a version of PCUI that will overcome these errors. Please follow the following steps that will help guide you through the mitigation provided:

  1. Download PCUI from GitHub repository
git clone https://github.com/aws/aws-parallelcluster-ui.git
  1. Create a new pcui.patch file and copy/paste the following content:
diff --git a/infrastructure/parallelcluster-ui.yaml b/infrastructure/parallelcluster-ui.yaml
index facea1fa..e9238d71 100644
--- a/infrastructure/parallelcluster-ui.yaml
+++ b/infrastructure/parallelcluster-ui.yaml
@@ -755,14 +755,6 @@ Resources:
       PolicyDocument:
         Version: '2012-10-17'
         Statement:
-          - Action:
-              - ce:ListCostAllocationTags
-              - ce:UpdateCostAllocationTagsStatus
-              - ce:GetCostAndUsage
-            Resource:
-              - !Sub 'arn:aws:ce:us-east-1:${AWS::AccountId}:/*' # CE only available in us-east-1, aws partition, see https://docs.aws.amazon.com/general/latest/gr/billing.html
-            Effect: Allow
-            Sid: CostMonitoringPolicy
           - Action:
               - pricing:GetProducts
             Resource:
  1. Run git apply pcui.patch

Alternatively, manually remove lines 758-765 from infrastructure/parallelcluster-ui.yaml file

  1. Export the following environment variables that will be used in the following script

    • Make sure AWS_DEFAULT_REGION is set to the region you are planning to deploy (e.g. us-gov-west-1)
    • Make sure ADMIN_EMAIL is set to the email address you would like to use
  2. Run the following script to deploy the patched local template:

#!/bin/bash
dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
if [[ $OSTYPE == 'darwin'* ]]; then
    work_dir=`mktemp -d -t pc`
else
    work_dir=`mktemp -d -p "$dir"`
fi

random-string()
{
if [[ $OSTYPE == 'darwin'* ]]; then
    echo $RANDOM | md5 | head -c 5; echo;
else
    echo $RANDOM | md5sum | head -c 5; echo;
fi
}

function cleanup {
  rm -rf ${work_dir}
  echo "Deleted temp working directory $work_dir"
}

trap cleanup EXIT

cfn_template=${dir}/infrastructure/parallelcluster-ui.yaml
stack_name=$(echo "pcui"-`random-string`)

pushd ${work_dir}

local_template=template.yaml
echo "Deploying: " ${work_dir}/${local_template} "->" ${stack_name}
region=us-east-1
bucket=parallelcluster-ui-release-artifacts-${region}
template_url=https://${bucket}.s3.${region}.amazonaws.com
cat ${cfn_template} | sed -e "s#PLACEHOLDER#${template_url}#" > ${local_template}

aws cloudformation deploy \
    --stack-name ${stack_name} \
    --parameter-overrides AdminUserEmail=${ADMIN_EMAIL} \
    --template-file ${local_template} \
    --capabilities CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND

aws cloudformation describe-stacks --stack-name ${stack_name}

popd

This should launch the PCUI CloudFormation stack which should complete successfully. There are some errors appearing on the PCUI console when performing cluster operations. Here is a list of expected errors seen:

List of expected Errors

Failed to load resource: the server responded with a status of 400 ()/cost-monitoring:1
Error: An error occurred while trying to complete your request. Please try again later. If the problem persists, please contact support for further assistance.

A screenshot of the expected errors is provided below:

image

Clone this wiki locally