Skip to content

Releases: GoogleCloudPlatform/gcpdiag

gcpdiag 0.64

14 Aug 23:07
Compare
Choose a tag to compare

0.64 (2023-08-14)

New rules

  • gke/bp_2023_005_gateway_crd: manually installed gateway crd GKE
  • gke/err_2023_010_nodelocal_timeout: nodelocal dns timeout GKE
  • gke/err_2023_009_missing_cpu_req: Missing CPU request GKE
  • gke/err_2023_008_crashloopbackoff: gke cluster had pods in crashloopbackoff error GKE
  • gke/err_2023_006_gw_controller_annotation_error: GKE Gateway controller reporting misconfigured annotations in Gateway resource GKE
  • gke/err_2023_007_gw_controller_http_route_misconfig: GKE Gateway controller reporting invalid HTTPRoute for Gateway GKE
  • dataflow/bp_2023_001_dataflow_supported_sdk_version_check: Dataflow job using supported sdk version dataflow
  • cloudsql/warn_2023_003_high_mem_usage: Cloud SQL instance's memory usage does not exceed 90%
  • gke/warn_2023_003_monitoring_api_disabled: Cloud Monitoring API enabled when GKE monitoring is enabled

Fixes

  • Remove references to deprecated oauth option in docs b/281956212
  • Update diagram titles to remove “gcp doctor” reference
  • Fix wrong cloudsql/WARN/2023_003 MQL query cloudsql (external submission)
  • cloudsql/bp_ext_2023_003_auto_storage_increases
  • gcs/bp_2022_001_bucket_access_uniform: skip cloud build and dataproc buckets issue/61 b/293951741
  • gce/warn_2022_001_iap_tcp_forwarding: skip check for dataproc cluster vm instances
  • gce/bp_2021_001_serial_logging_enabled: skip check for dataproc cluster vm instances
  • gke/bp_2022_003_cluster_eol: end of life version list dates updated

Many thanks to all contributors!

--Rodney

gcpdiag 0.63

25 Jul 02:19
Compare
Choose a tag to compare

0.63 (2023-07-10)

Fixes

  • Fix futures timeout error.

0.62 (2023-07-10)

New rules

  • cloudsql/SEC/2023_001: Cloud SQL is not publicly accessible (github #73)
  • dataproc/ERR/2023_002: Orphaned YARN application
  • dataflow/ERR/2023_007: Streaming Dataflow doesn't report being stuck because of firewall rules
  • dataflow/ERR/2023_008: Dataflow worker service account has roles/dataflow.worker role
  • dataflow/WARN/2023_004: Dataflow job doen't stuck at draining state for more than 3 hours
  • notebooks/BP/2023_002: Vertex AI Workbench user-managed notebook instances are up to date

Fixes

  • Fix GCE API being erroneously required to run gcpdiag
  • Fix locking issues in multi-threaded code
  • Improve caching of API exceptions
  • Remove documentation references to deprecated oauth option

0.61 (2023-06-30)

Fixes

  • Fix attribute error on dnssec API call

What's Changed

Full Changelog: v0.60...v0.63

gcpdiag 0.60

29 Jun 15:59
Compare
Choose a tag to compare

New rules

  • apigee/ERR/2023_003: Private Google Access (PGA) for subnet of Managed Instance Group is enabled
  • apigee/ERR/2023_004: Service Networking API is enabled and SA account has the required role
  • apigee/ERR/2023_005: External Load Balancer (XLB) is able to connect to the MIG
  • bigquery/ERR/2023_001: Jobs called via the API are all found
  • bigquery/ERR/2023_002: BigQuery hasn't reported any unknown datasets
  • bigquery/ERR/2023_003: BigQuery query job do not encounter resource exceeded error
  • bigquery/ERR/2023_004: BigQuery query job do not encounter dml concurrency issue
  • bigquery/ERR/2023_005: Scheduled query not failing due to outdated credentials
  • bigquery/WARN/2023_003: BigQuery query job does not fail with too many output columns error
  • bigquery/WARN/2023_004: BigQuery CMEK-related operations do not fail due to missing permissions
  • bigquery/WARN/2023_005: No errors querying wildcard tables
  • cloudsql/BP/2023_001: Cloud SQL is not assigned Public IP (github #65)
  • cloudsql/BP/2023_002: Cloud SQL is configured with automated backup
  • cloudsql/BP_EXT/2023_001: Cloud SQL is defined with Maintenance Window as any (github #67)
  • cloudsql/BP_EXT/2023_002: Cloud SQL is configured with Deletion Protection (github #68)
  • cloudsql/BP_EXT/2023_003: Cloud SQL enables automatic storage increases feature
  • cloudsql/BP_EXT/2023_004: Cloud SQL instance is covered by the SLA
  • cloudsql/ERR/2023_001: Cloud SQL instance should not be in SUSPENDED state
  • cloudsql/WARN/2023_001: Cloud SQL instance's log_output flag is not configured as TABLE
  • cloudsql/WARN/2023_002: Cloud SQL instance's avg CPU utilization is not over 98% for 6 hours
  • cloudsql/WARN/2023_003: Cloud SQL instance's memory usage does not exceed 90%
  • composer/BP/2023_001: Cloud Composer logging level is set to INFO
  • composer/BP/2023_002: Cloud Composer's worker concurrency is not limited by parallelism
  • composer/BP/2023_003: Cloud Composer does not override the StatsD configurations
  • composer/BP_EXT/2023_001: Cloud Composer has no more than 2 Airflow schedulers
  • composer/BP_EXT/2023_002: Cloud Composer has higher version than airflow-2.2.3
  • composer/ERR/2023_001: Cloud Composer is not in ERROR state
  • composer/WARN/2023_001: Cloud Composer does not override Kerberos configurations
  • composer/WARN/2023_002: Cloud Composer tasks are not interrupted by SIGKILL
  • composer/WARN/2023_003: Cloud Composer tasks are not failed due to resource pressure
  • composer/WARN/2023_004: Cloud Composer database CPU usage does not exceed 80%
  • composer/WARN/2023_005: Cloud Composer is consistently in healthy state
  • composer/WARN/2023_006: Airflow schedulers are healthy for the last hour
  • composer/WARN/2023_007: Cloud Composer Scheduler CPU limit exceeded
  • composer/WARN/2023_008: Cloud Composer Airflow database is in healthy state
  • dataflow/ERR/2023_001: Dataflow service account has dataflow.serviceAgent role
  • dataflow/ERR/2023_002: Dataflow job does not fail during execution due to IP space exhaustion
  • dataflow/ERR/2023_003: Dataflow job does not fail during execution due to incorrect subnet
  • dataflow/ERR/2023_004: Dataflow job does not fail due to organization policy constraints
  • dataflow/ERR/2023_005: Dataflow job does not fail during execution due credential or permission issue
  • dataflow/ERR/2023_006: Dataflow job fails if Private Google Access is disabled on subnetwork
  • dataflow/WARN/2023_001: Dataflow job does not have a hot key
  • dataproc/ERR/2023_002: No orphaned YARN application found
  • dataproc/ERR/2023_003: Dataproc cluster service account permissions
  • dataproc/ERR/2023_004: Dataproc firewall rules for connectivity between master and worker nodes
  • dataproc/ERR/2023_005: Dataproc cluster has sufficient quota
  • dataproc/ERR/2023_006: DataProc cluster user has networking permissions on host project
  • gce/WARN/2023_001: GCE snapshot policies are defined only for used disks
  • gke/ERR/2023_004: GKE ingresses are well configured
  • gke/ERR/2023_005: Workloads not reporting misconfigured CNI plugins
  • iam/BP/2023_001: Policy constraint 'AutomaticIamGrantsForDefaultServiceAccounts' enforced
  • interconnect/BP/2023_001: VLAN attachments deployed in same metro are in different EADs
  • lb/BP/2023_001: Cloud CDN is enabled on backends for global external load balancers
  • notebooks/BP/2023_001: Vertex AI Workbench instance enables system health report
  • notebooks/BP/2023_003: Vertex AI Workbench runtimes for managed notebooks are up to date
  • notebooks/ERR/2023_002: Vertex AI Workbench account has compute.subnetworks permissions
  • notebooks/ERR/2023_003: Vertex AI Workbench account has permissions to create and use notebooks
  • notebooks/ERR/2023_004: Vertex AI Workbench runtimes for managed notebooks are healthy
  • notebooks/WARN/2023_001: Vertex AI Workbench instance is not being OOMKilled
  • notebooks/WARN/2023_002: Vertex AI Workbench instance is in healty data disk space status
  • notebooks/WARN/2023_003: Vertex AI Workbench instance is in healty boot disk space status
  • vpc/SEC/2023_001: DNSSEC is enabled for public zones
  • vpc/WARN/2023_002: Private zone is attached to a VPC

Enhancements

  • Support for sub project resource filtering (--name, --location, --label)
  • Support fetching serial port output logs from Compute API (--enable-gce-serial-buffer)
  • New product: Cloud Dataflow
  • New product: Cloud Interconnect
  • Add kubectl query module
  • Optimizations for logging based composer rules

Fixes

  • gke/BP/2022_003: updated EOL schedule for GKE
  • Fix billing project id not set at startup (github #58)
  • Fix JSON format with --output=json (github #62)
  • Fix GCS uniform bucket access detection (github #69)
  • dataproc/WARN/2022_002: fix attribute lookup error (github #57)
  • gke/WARN/2021_003: update GKE pod cidr rule to report values per pod cidr range

gcpdiag 0.59

17 Apr 12:33
Compare
Choose a tag to compare

New rules

  • apigee/ERR/2023_001: Customer's network is peered to Apigee's network
  • apigee/ERR/2023_002: Network bridge managed instance group is correctly configured
  • bigquery/WARN/2022_003: BigQuery copy job does not exceed the daily copy quota
  • bigquery/WARN/2022_004: BigQuery copy job does not exceed the cross-region daily copy quota
  • bigquery/WARN/2023_001: BigQuery query job does not time out during execution
  • composer/WARN/2022_003: Composer scheduler parses all DAG files without overloading
  • datafusion/ERR/2022_008: Cloud Data Fusion SA has Service Account User permissions on the Dataproc SA
  • datafusion/ERR/2022_009: Cloud Dataproc Service Account has a Cloud Data Fusion Runner role
  • datafusion/ERR/2022_010: Cloud Dataproc Service Account has a Dataproc Worker role
  • datafusion/ERR/2022_011: The Dataproc SA for a CDF instance with version > 6.2.0 has Storage Admin role
  • dataproc/ERR/2022_004: Dataproc on GCE master VM is able to communicate with atleast one worker VM
  • dataproc/ERR/2023_001: Dataproc cluster initialization completed by the end of the timeout period
  • dataproc/WARN/2022_004: Cluster should normally spend most of the time in RUNNING state
  • dataproc/WARN/2023_001: Concurrent Job limit was not exceeded
  • dataproc/WARN/2023_002: Master Node System Memory utilization under threshold
  • gae/ERR/2023_001: App Engine: VPC Connector creation failure due to Org Policy
  • gae/ERR/2023_002: App Engine: VPC Connector creation due to subnet overlap
  • gcb/ERR/2022_004: Cloud Build Service Agent has the cloudbuild.serviceAgent role
  • gce/BP/2023_001: GCE Instances follows access scope best practice
  • gce/BP/2023_001: Instance time source is configured with Google NTP server
  • gce/ERR/2022_002: Serial logs don't contain Guest OS activation errors
  • gce/WARN/2022_010: GCE has enough resources available to fulfill requests
  • gce/WARN/2022_011: GCE VM service account is valid
  • gce/WARN/2022_012: Validate if a Microsoft Windows instance is able to activate using GCP PAYG licence
  • gke/BP/2023_001: GKE network policy minimum requirements
  • gke/BP/2023_002: Stateful workloads do not run on preemptible node
  • gke/BP/2023_004: GKE clusters are VPC-native
  • gke/BP_EXT/2023_003: GKE maintenance windows are defined
  • gke/ERR/2023_001: Container File System API quota not exceeded
  • gke/ERR/2023_002: GKE private clusters are VPC-native
  • gke/ERR/2023_003: containerd config.toml is valid
  • gke/WARN/2023_001: Container File System has the required scopes for Image Streaming
  • gke/WARN/2023_002: GKE workload timeout to Compute Engine metadata server
  • lb/BP/2022_001: LocalityLbPolicy compatible with sessionAffinity
  • notebooks/ERR/2023_001: Vertex AI Workbench user-managed notebook instances are healthy
  • vpc/BP/2022_001: Explicit routes for Google APIs if the default route is modified
  • vpc/BP/2023_001: DNS logging is enabled for public zones

Enhancements

  • New product: Cloud Load Balancing
  • New product: Vertex AI Workbench Notebooks
  • Experimental asynchronous IO execution (not enabled by default)
  • gcb/ERR/2022_002: Check access to images hosted in gcr.io repositories
  • Add support for interconnect API
  • Extract project id from email when fetching service accounts instead of using
    wildcard, making IAM service account checks more reliable.
  • --project now accepts project numbers in addition to project ids

Fixes

  • gke/BP/2022_003: updated EOL schedule for GKE
  • Fix 403 error on userinfo API call

gcpdiag 0.58

08 Nov 17:38
Compare
Choose a tag to compare

Deprecation

  • Python 3.9+ is required for gcpdiag. Python 3.8 and older versions support is deprecated.
  • Deprecated authentication using OAuth (--auth-oauth) has been removed.

New rules

  • apigee/ERR/2022_002: Verify whether Cloud KMS key is enabled and could be accessed by Apigee Service Agent
  • datafusion/ERR/2022_003: Private Data Fusion instance is peered to the tenant project
  • datafusion/ERR/2022_004: Cloud Data Fusion Service Account has necessary permissions
  • datafusion/ERR/2022_005: Private Data Fusion instance has networking permissions
  • datafusion/ERR/2022_006: Private Google Access enabled for private Data Fusion instance subnetwork
  • datafusion/ERR/2022_007: Cloud Data Fusion Service Account exists at a Project
  • gke/BP/2022_004: GKE clusters should have HTTP load balancing enabled to use GKE ingress

Enhancements

  • Python dependencies updated

Fixes

  • gke/ERR/2021_002: skip if there are no GKE clusters

gcpdiag 0.57

29 Sep 10:06
Compare
Choose a tag to compare

Deprecation

  • Default authentication using OAuth (--auth-oauth) is now deprecated and Application Default Credentials (--auth-adc) will be used instead. Alternatively you can use Service Account private key (--auth-key FILE).

New rules

  • apigee/WARN/2022_001: Verify whether all environments has been attached to Apigee X instances
  • apigee/WARN/2022_002: Environment groups are created in the Apigee runtime plane
  • cloudrun/ERR/2022_001: Cloud Run service agent has the run.serviceAgent role
  • datafusion/ERR/2022_001: Firewall rules allow for Data Fusion to communicate to Dataproc VMs
  • datafusion/ERR/2022_002: Private Data Fusion instance has valid host VPC IP range
  • dataproc/WARN/2022_001: Dataproc VM Service Account has necessary permissions
  • dataproc/WARN/2022_002: Job rate limit was not exceeded
  • gcf/ERR/2022_002: Cloud Function deployment failure due to Resource Location Constraint
  • gcf/ERR/2022_003: Function invocation interrupted due to memory limit exceeded
  • gke/WARN/2022/_008: GKE connectivity: possible dns timeout in some gke versions
  • gke/WARN/2022_007: GKE nodes need Storage API access scope to retrieve build artifacts
  • gke/WARN/2022_008: GKE connectivity: possible dns timeout in some gke versions

Enhancements

  • New product: Cloud Run
  • New product: Data Fusion

Fixes

  • gcf/WARN/2021_002: Added check for MATCH_STR
  • gcs/BP/2022_001: KeyError: 'iamConfiguration'
  • gke/ERR/2022_003: unhandled exception
  • gke/WARN/2022_005: Incorrectly report missing "nvidia-driver-installer" daemonset
  • iam/SEC/2021_001: unhandled exception

gcpdiag 0.56

18 Jul 13:22
Compare
Choose a tag to compare

New rules

  • bigquery/ERR/2022_001: BigQuery is not exceeding rate limits
  • bigquery/ERR/2022_001: BigQuery jobs not failing due to concurrent DML updates on the same table
  • bigquery/ERR/2022_002: BigQuery jobs are not failing due to results being larger than the maximum response size
  • bigquery/ERR/2022_003: BigQuery jobs are not failing while accessing data in Drive due to a permission issue
  • bigquery/ERR/2022_004: BigQuery jobs are not failing due to shuffle operation resources exceeded
  • bigquery/WARN/2022_002: BigQuery does not violate column level security
  • cloudsql/WARN/2022_001: Docker bridge network should be avoided
  • composer/WARN/2022_002: fluentd pods in Composer enviroments are not crashing
  • dataproc/ERR/2022_003: Dataproc Service Account permissions
  • dataproc/WARN/2022_001: Dataproc clusters are not failed to stop due to the local SSDs
  • gae/WARN/2022_002: App Engine Flexible versions don't use deprecated runtimes
  • gcb/ERR/2022_002: Cloud Build service account registry permissions
  • gcb/ERR/2022_003: Builds don't fail because of retention policy set on logs bucket
  • gce/BP/2022_003: detect orphaned disks
  • gce/ERR/2022_001: Project limits were not exceeded
  • gce/WARN/2022_004: Cloud SQL Docker bridge network should be avoided
  • gce/WARN/2022_005: GCE CPU quota is not near the limit
  • gce/WARN/2022_006: GCE GPU quota is not near the limit
  • gce/WARN/2022_007: VM has the proper scope to connect using the Cloud SQL Admin API
  • gce/WARN/2022_008: GCE External IP addresses quota is not near the limit
  • gce/WARN/2022_009: GCE disk quota is not near the limit
  • gcf/ERR/2022_001: Cloud Functions service agent has the cloudfunctions.serviceAgent role
  • gcf/WARN/2021_002: Cloud Functions have no scale up issues
  • gke/BP_EXT/2022_001: Google Groups for RBAC enabled (github #12)
  • gke/WARN/2022_006: GKE NAP nodes use a containerd image
  • tpu/WARN/2022_001: Cloud TPU resource availability
  • vpc/WARN/2022_001: Cross Project Networking Service projects quota is not near the limit

Updated rules

  • dataproc/ERR/2022_002: fix os version detection (github #26)
  • gke/BP/2022_003: update GKE EOL schedule
  • gke/ERR/2022_001: fix KeyError exception
  • gke/BP/2022_002: skip legacy VPC

Enhancements

  • Add support for multiple output formats (--output=csv, --output=json)
  • Better handle CTRL-C signal
  • Org policy support
  • New product: CloudSQL
  • New product: VPC
  • Renamed product "GAES" to "GAE" (Google App Engine)
  • Publish internal API documentation on https://gcpdiag.dev/docs/development/api/
  • Update Python dependencies

gcpdiag 0.54

25 Apr 12:03
Compare
Choose a tag to compare

New rules

  • apigee/ERR/2022_001: Apigee Service Agent permissions

Enhancements

  • dynamically load gcpdiag lint rules for all products
  • support IAM policy retrieval for Artifact Registry
  • move gcpdiag release buckets to new location

Fixes

  • gke/ERR/2022_002: use correct network for shared VPC scenario (#24)
  • error out early if service accounts of inspected projects can't be retrieved
  • fix docker wrapper script for --config and --auth-key options
  • allow to create test projects in an org folder
  • ignore more system service accounts (ignore all accounts starting with gcp-sa)

Note: gcpdiag 0.55 was also released with the same code. The release was used to facilitate the transition of binaries to another location.

gcpdiag 0.53

31 Mar 10:24
Compare
Choose a tag to compare

New rules

  • composer/ERR/2022_001: Composer Service Agent permissions
  • composer/ERR/2022_002: Composer Environment Service Account permissions
  • composer/WARN/2022_001: Composer Service Agent permissions for Composer 2.x
  • gce/BP_EXT/2022_001: GCP project has VM Manager enabled
  • gce/WARN/2022_003: GCE VM instances quota is not near the limit
  • gke/BP/2022_002: GKE clusters are using unique subnets
  • gke/BP/2022_003: GKE cluster is not near to end of life
  • gke/WARN/2022_003: GKE service account permissions to manage project firewall rules
  • gke/WARN/2022_004: Cloud Logging API enabled when GKE logging is enabled
  • gke/WARN/2022_005: NVIDIA GPU device drivers are installed on GKE nodes with GPU

Enhancements

Fixes

  • Fix various unhandled exceptions

gcpdiag 0.52

11 Feb 16:25
Compare
Choose a tag to compare

New rules

  • dataproc/BP/2022_001: Cloud Monitoring agent is enabled.
  • dataproc/ERR/2022_002: Dataproc is not using deprecated images.
  • gce/WARN/2022_001: IAP service can connect to SSH/RDP port on instances.
  • gce/WARN/2022_002: Instance groups named ports are using unique names.
  • gke/ERR/2022_002: GKE nodes of private clusters can access Google APIs and services.
  • gke/ERR/2022_003: GKE connectivity: load balancer to node communication (ingress).

Updated rules

  • gcb/ERR/2022_001: Fix false positive when no build is configured.
  • gke/WARN/2021_008: Improve Istio deprecation message

Enhancements

  • Introduce "extended" rules (BP_EXT, ERR_EXT, etc.), disabled by default
    and which can be enabled with --include-extended.
  • Large IAM policy code refactorings in preparation for org-level IAM
    policy support.

Fixes

  • More API retry fixes.
  • Fix --billing-project which had no effect before.
  • Fix exception related to GCE instance scopes.