Releases: GoogleCloudPlatform/gcpdiag
Releases · GoogleCloudPlatform/gcpdiag
gcpdiag 0.64
0.64 (2023-08-14)
New rules
- gke/bp_2023_005_gateway_crd: manually installed gateway crd GKE
- gke/err_2023_010_nodelocal_timeout: nodelocal dns timeout GKE
- gke/err_2023_009_missing_cpu_req: Missing CPU request GKE
- gke/err_2023_008_crashloopbackoff: gke cluster had pods in crashloopbackoff error GKE
- gke/err_2023_006_gw_controller_annotation_error: GKE Gateway controller reporting misconfigured annotations in Gateway resource GKE
- gke/err_2023_007_gw_controller_http_route_misconfig: GKE Gateway controller reporting invalid HTTPRoute for Gateway GKE
- dataflow/bp_2023_001_dataflow_supported_sdk_version_check: Dataflow job using supported sdk version dataflow
- cloudsql/warn_2023_003_high_mem_usage: Cloud SQL instance's memory usage does not exceed 90%
- gke/warn_2023_003_monitoring_api_disabled: Cloud Monitoring API enabled when GKE monitoring is enabled
Fixes
- Remove references to deprecated oauth option in docs b/281956212
- Update diagram titles to remove “gcp doctor” reference
- Fix wrong cloudsql/WARN/2023_003 MQL query cloudsql (external submission)
- cloudsql/bp_ext_2023_003_auto_storage_increases
- gcs/bp_2022_001_bucket_access_uniform: skip cloud build and dataproc buckets issue/61 b/293951741
- gce/warn_2022_001_iap_tcp_forwarding: skip check for dataproc cluster vm instances
- gce/bp_2021_001_serial_logging_enabled: skip check for dataproc cluster vm instances
- gke/bp_2022_003_cluster_eol: end of life version list dates updated
Many thanks to all contributors!
--Rodney
gcpdiag 0.63
0.63 (2023-07-10)
Fixes
- Fix futures timeout error.
0.62 (2023-07-10)
New rules
- cloudsql/SEC/2023_001: Cloud SQL is not publicly accessible (github #73)
- dataproc/ERR/2023_002: Orphaned YARN application
- dataflow/ERR/2023_007: Streaming Dataflow doesn't report being stuck because of firewall rules
- dataflow/ERR/2023_008: Dataflow worker service account has roles/dataflow.worker role
- dataflow/WARN/2023_004: Dataflow job doen't stuck at draining state for more than 3 hours
- notebooks/BP/2023_002: Vertex AI Workbench user-managed notebook instances are up to date
Fixes
- Fix GCE API being erroneously required to run gcpdiag
- Fix locking issues in multi-threaded code
- Improve caching of API exceptions
- Remove documentation references to deprecated oauth option
0.61 (2023-06-30)
Fixes
- Fix attribute error on dnssec API call
What's Changed
- CloudSQL New Rule cloudsql/bp_2023_004 by @abhigupta1207 in #73
Full Changelog: v0.60...v0.63
gcpdiag 0.60
New rules
- apigee/ERR/2023_003: Private Google Access (PGA) for subnet of Managed Instance Group is enabled
- apigee/ERR/2023_004: Service Networking API is enabled and SA account has the required role
- apigee/ERR/2023_005: External Load Balancer (XLB) is able to connect to the MIG
- bigquery/ERR/2023_001: Jobs called via the API are all found
- bigquery/ERR/2023_002: BigQuery hasn't reported any unknown datasets
- bigquery/ERR/2023_003: BigQuery query job do not encounter resource exceeded error
- bigquery/ERR/2023_004: BigQuery query job do not encounter dml concurrency issue
- bigquery/ERR/2023_005: Scheduled query not failing due to outdated credentials
- bigquery/WARN/2023_003: BigQuery query job does not fail with too many output columns error
- bigquery/WARN/2023_004: BigQuery CMEK-related operations do not fail due to missing permissions
- bigquery/WARN/2023_005: No errors querying wildcard tables
- cloudsql/BP/2023_001: Cloud SQL is not assigned Public IP (github #65)
- cloudsql/BP/2023_002: Cloud SQL is configured with automated backup
- cloudsql/BP_EXT/2023_001: Cloud SQL is defined with Maintenance Window as any (github #67)
- cloudsql/BP_EXT/2023_002: Cloud SQL is configured with Deletion Protection (github #68)
- cloudsql/BP_EXT/2023_003: Cloud SQL enables automatic storage increases feature
- cloudsql/BP_EXT/2023_004: Cloud SQL instance is covered by the SLA
- cloudsql/ERR/2023_001: Cloud SQL instance should not be in SUSPENDED state
- cloudsql/WARN/2023_001: Cloud SQL instance's log_output flag is not configured as TABLE
- cloudsql/WARN/2023_002: Cloud SQL instance's avg CPU utilization is not over 98% for 6 hours
- cloudsql/WARN/2023_003: Cloud SQL instance's memory usage does not exceed 90%
- composer/BP/2023_001: Cloud Composer logging level is set to INFO
- composer/BP/2023_002: Cloud Composer's worker concurrency is not limited by parallelism
- composer/BP/2023_003: Cloud Composer does not override the StatsD configurations
- composer/BP_EXT/2023_001: Cloud Composer has no more than 2 Airflow schedulers
- composer/BP_EXT/2023_002: Cloud Composer has higher version than airflow-2.2.3
- composer/ERR/2023_001: Cloud Composer is not in ERROR state
- composer/WARN/2023_001: Cloud Composer does not override Kerberos configurations
- composer/WARN/2023_002: Cloud Composer tasks are not interrupted by SIGKILL
- composer/WARN/2023_003: Cloud Composer tasks are not failed due to resource pressure
- composer/WARN/2023_004: Cloud Composer database CPU usage does not exceed 80%
- composer/WARN/2023_005: Cloud Composer is consistently in healthy state
- composer/WARN/2023_006: Airflow schedulers are healthy for the last hour
- composer/WARN/2023_007: Cloud Composer Scheduler CPU limit exceeded
- composer/WARN/2023_008: Cloud Composer Airflow database is in healthy state
- dataflow/ERR/2023_001: Dataflow service account has dataflow.serviceAgent role
- dataflow/ERR/2023_002: Dataflow job does not fail during execution due to IP space exhaustion
- dataflow/ERR/2023_003: Dataflow job does not fail during execution due to incorrect subnet
- dataflow/ERR/2023_004: Dataflow job does not fail due to organization policy constraints
- dataflow/ERR/2023_005: Dataflow job does not fail during execution due credential or permission issue
- dataflow/ERR/2023_006: Dataflow job fails if Private Google Access is disabled on subnetwork
- dataflow/WARN/2023_001: Dataflow job does not have a hot key
- dataproc/ERR/2023_002: No orphaned YARN application found
- dataproc/ERR/2023_003: Dataproc cluster service account permissions
- dataproc/ERR/2023_004: Dataproc firewall rules for connectivity between master and worker nodes
- dataproc/ERR/2023_005: Dataproc cluster has sufficient quota
- dataproc/ERR/2023_006: DataProc cluster user has networking permissions on host project
- gce/WARN/2023_001: GCE snapshot policies are defined only for used disks
- gke/ERR/2023_004: GKE ingresses are well configured
- gke/ERR/2023_005: Workloads not reporting misconfigured CNI plugins
- iam/BP/2023_001: Policy constraint 'AutomaticIamGrantsForDefaultServiceAccounts' enforced
- interconnect/BP/2023_001: VLAN attachments deployed in same metro are in different EADs
- lb/BP/2023_001: Cloud CDN is enabled on backends for global external load balancers
- notebooks/BP/2023_001: Vertex AI Workbench instance enables system health report
- notebooks/BP/2023_003: Vertex AI Workbench runtimes for managed notebooks are up to date
- notebooks/ERR/2023_002: Vertex AI Workbench account has compute.subnetworks permissions
- notebooks/ERR/2023_003: Vertex AI Workbench account has permissions to create and use notebooks
- notebooks/ERR/2023_004: Vertex AI Workbench runtimes for managed notebooks are healthy
- notebooks/WARN/2023_001: Vertex AI Workbench instance is not being OOMKilled
- notebooks/WARN/2023_002: Vertex AI Workbench instance is in healty data disk space status
- notebooks/WARN/2023_003: Vertex AI Workbench instance is in healty boot disk space status
- vpc/SEC/2023_001: DNSSEC is enabled for public zones
- vpc/WARN/2023_002: Private zone is attached to a VPC
Enhancements
- Support for sub project resource filtering (
--name
,--location
,--label
) - Support fetching serial port output logs from Compute API (
--enable-gce-serial-buffer
) - New product: Cloud Dataflow
- New product: Cloud Interconnect
- Add kubectl query module
- Optimizations for logging based composer rules
Fixes
- gke/BP/2022_003: updated EOL schedule for GKE
- Fix billing project id not set at startup (github #58)
- Fix JSON format with --output=json (github #62)
- Fix GCS uniform bucket access detection (github #69)
- dataproc/WARN/2022_002: fix attribute lookup error (github #57)
- gke/WARN/2021_003: update GKE pod cidr rule to report values per pod cidr range
gcpdiag 0.59
New rules
- apigee/ERR/2023_001: Customer's network is peered to Apigee's network
- apigee/ERR/2023_002: Network bridge managed instance group is correctly configured
- bigquery/WARN/2022_003: BigQuery copy job does not exceed the daily copy quota
- bigquery/WARN/2022_004: BigQuery copy job does not exceed the cross-region daily copy quota
- bigquery/WARN/2023_001: BigQuery query job does not time out during execution
- composer/WARN/2022_003: Composer scheduler parses all DAG files without overloading
- datafusion/ERR/2022_008: Cloud Data Fusion SA has Service Account User permissions on the Dataproc SA
- datafusion/ERR/2022_009: Cloud Dataproc Service Account has a Cloud Data Fusion Runner role
- datafusion/ERR/2022_010: Cloud Dataproc Service Account has a Dataproc Worker role
- datafusion/ERR/2022_011: The Dataproc SA for a CDF instance with version > 6.2.0 has Storage Admin role
- dataproc/ERR/2022_004: Dataproc on GCE master VM is able to communicate with atleast one worker VM
- dataproc/ERR/2023_001: Dataproc cluster initialization completed by the end of the timeout period
- dataproc/WARN/2022_004: Cluster should normally spend most of the time in RUNNING state
- dataproc/WARN/2023_001: Concurrent Job limit was not exceeded
- dataproc/WARN/2023_002: Master Node System Memory utilization under threshold
- gae/ERR/2023_001: App Engine: VPC Connector creation failure due to Org Policy
- gae/ERR/2023_002: App Engine: VPC Connector creation due to subnet overlap
- gcb/ERR/2022_004: Cloud Build Service Agent has the cloudbuild.serviceAgent role
- gce/BP/2023_001: GCE Instances follows access scope best practice
- gce/BP/2023_001: Instance time source is configured with Google NTP server
- gce/ERR/2022_002: Serial logs don't contain Guest OS activation errors
- gce/WARN/2022_010: GCE has enough resources available to fulfill requests
- gce/WARN/2022_011: GCE VM service account is valid
- gce/WARN/2022_012: Validate if a Microsoft Windows instance is able to activate using GCP PAYG licence
- gke/BP/2023_001: GKE network policy minimum requirements
- gke/BP/2023_002: Stateful workloads do not run on preemptible node
- gke/BP/2023_004: GKE clusters are VPC-native
- gke/BP_EXT/2023_003: GKE maintenance windows are defined
- gke/ERR/2023_001: Container File System API quota not exceeded
- gke/ERR/2023_002: GKE private clusters are VPC-native
- gke/ERR/2023_003: containerd config.toml is valid
- gke/WARN/2023_001: Container File System has the required scopes for Image Streaming
- gke/WARN/2023_002: GKE workload timeout to Compute Engine metadata server
- lb/BP/2022_001: LocalityLbPolicy compatible with sessionAffinity
- notebooks/ERR/2023_001: Vertex AI Workbench user-managed notebook instances are healthy
- vpc/BP/2022_001: Explicit routes for Google APIs if the default route is modified
- vpc/BP/2023_001: DNS logging is enabled for public zones
Enhancements
- New product: Cloud Load Balancing
- New product: Vertex AI Workbench Notebooks
- Experimental asynchronous IO execution (not enabled by default)
- gcb/ERR/2022_002: Check access to images hosted in gcr.io repositories
- Add support for interconnect API
- Extract project id from email when fetching service accounts instead of using
wildcard, making IAM service account checks more reliable. - --project now accepts project numbers in addition to project ids
Fixes
- gke/BP/2022_003: updated EOL schedule for GKE
- Fix 403 error on userinfo API call
gcpdiag 0.58
Deprecation
- Python 3.9+ is required for gcpdiag. Python 3.8 and older versions support is deprecated.
- Deprecated authentication using OAuth (
--auth-oauth
) has been removed.
New rules
- apigee/ERR/2022_002: Verify whether Cloud KMS key is enabled and could be accessed by Apigee Service Agent
- datafusion/ERR/2022_003: Private Data Fusion instance is peered to the tenant project
- datafusion/ERR/2022_004: Cloud Data Fusion Service Account has necessary permissions
- datafusion/ERR/2022_005: Private Data Fusion instance has networking permissions
- datafusion/ERR/2022_006: Private Google Access enabled for private Data Fusion instance subnetwork
- datafusion/ERR/2022_007: Cloud Data Fusion Service Account exists at a Project
- gke/BP/2022_004: GKE clusters should have HTTP load balancing enabled to use GKE ingress
Enhancements
- Python dependencies updated
Fixes
- gke/ERR/2021_002: skip if there are no GKE clusters
gcpdiag 0.57
Deprecation
- Default authentication using OAuth (
--auth-oauth
) is now deprecated and Application Default Credentials (--auth-adc
) will be used instead. Alternatively you can use Service Account private key (--auth-key FILE
).
New rules
- apigee/WARN/2022_001: Verify whether all environments has been attached to Apigee X instances
- apigee/WARN/2022_002: Environment groups are created in the Apigee runtime plane
- cloudrun/ERR/2022_001: Cloud Run service agent has the run.serviceAgent role
- datafusion/ERR/2022_001: Firewall rules allow for Data Fusion to communicate to Dataproc VMs
- datafusion/ERR/2022_002: Private Data Fusion instance has valid host VPC IP range
- dataproc/WARN/2022_001: Dataproc VM Service Account has necessary permissions
- dataproc/WARN/2022_002: Job rate limit was not exceeded
- gcf/ERR/2022_002: Cloud Function deployment failure due to Resource Location Constraint
- gcf/ERR/2022_003: Function invocation interrupted due to memory limit exceeded
- gke/WARN/2022/_008: GKE connectivity: possible dns timeout in some gke versions
- gke/WARN/2022_007: GKE nodes need Storage API access scope to retrieve build artifacts
- gke/WARN/2022_008: GKE connectivity: possible dns timeout in some gke versions
Enhancements
- New product: Cloud Run
- New product: Data Fusion
Fixes
- gcf/WARN/2021_002: Added check for MATCH_STR
- gcs/BP/2022_001: KeyError: 'iamConfiguration'
- gke/ERR/2022_003: unhandled exception
- gke/WARN/2022_005: Incorrectly report missing "nvidia-driver-installer" daemonset
- iam/SEC/2021_001: unhandled exception
gcpdiag 0.56
New rules
- bigquery/ERR/2022_001: BigQuery is not exceeding rate limits
- bigquery/ERR/2022_001: BigQuery jobs not failing due to concurrent DML updates on the same table
- bigquery/ERR/2022_002: BigQuery jobs are not failing due to results being larger than the maximum response size
- bigquery/ERR/2022_003: BigQuery jobs are not failing while accessing data in Drive due to a permission issue
- bigquery/ERR/2022_004: BigQuery jobs are not failing due to shuffle operation resources exceeded
- bigquery/WARN/2022_002: BigQuery does not violate column level security
- cloudsql/WARN/2022_001: Docker bridge network should be avoided
- composer/WARN/2022_002: fluentd pods in Composer enviroments are not crashing
- dataproc/ERR/2022_003: Dataproc Service Account permissions
- dataproc/WARN/2022_001: Dataproc clusters are not failed to stop due to the local SSDs
- gae/WARN/2022_002: App Engine Flexible versions don't use deprecated runtimes
- gcb/ERR/2022_002: Cloud Build service account registry permissions
- gcb/ERR/2022_003: Builds don't fail because of retention policy set on logs bucket
- gce/BP/2022_003: detect orphaned disks
- gce/ERR/2022_001: Project limits were not exceeded
- gce/WARN/2022_004: Cloud SQL Docker bridge network should be avoided
- gce/WARN/2022_005: GCE CPU quota is not near the limit
- gce/WARN/2022_006: GCE GPU quota is not near the limit
- gce/WARN/2022_007: VM has the proper scope to connect using the Cloud SQL Admin API
- gce/WARN/2022_008: GCE External IP addresses quota is not near the limit
- gce/WARN/2022_009: GCE disk quota is not near the limit
- gcf/ERR/2022_001: Cloud Functions service agent has the cloudfunctions.serviceAgent role
- gcf/WARN/2021_002: Cloud Functions have no scale up issues
- gke/BP_EXT/2022_001: Google Groups for RBAC enabled (github #12)
- gke/WARN/2022_006: GKE NAP nodes use a containerd image
- tpu/WARN/2022_001: Cloud TPU resource availability
- vpc/WARN/2022_001: Cross Project Networking Service projects quota is not near the limit
Updated rules
- dataproc/ERR/2022_002: fix os version detection (github #26)
- gke/BP/2022_003: update GKE EOL schedule
- gke/ERR/2022_001: fix KeyError exception
- gke/BP/2022_002: skip legacy VPC
Enhancements
- Add support for multiple output formats (--output=csv, --output=json)
- Better handle CTRL-C signal
- Org policy support
- New product: CloudSQL
- New product: VPC
- Renamed product "GAES" to "GAE" (Google App Engine)
- Publish internal API documentation on https://gcpdiag.dev/docs/development/api/
- Update Python dependencies
gcpdiag 0.54
New rules
- apigee/ERR/2022_001: Apigee Service Agent permissions
Enhancements
- dynamically load gcpdiag lint rules for all products
- support IAM policy retrieval for Artifact Registry
- move gcpdiag release buckets to new location
Fixes
- gke/ERR/2022_002: use correct network for shared VPC scenario (#24)
- error out early if service accounts of inspected projects can't be retrieved
- fix docker wrapper script for --config and --auth-key options
- allow to create test projects in an org folder
- ignore more system service accounts (ignore all accounts starting with gcp-sa)
Note: gcpdiag 0.55 was also released with the same code. The release was used to facilitate the transition of binaries to another location.
gcpdiag 0.53
New rules
- composer/ERR/2022_001: Composer Service Agent permissions
- composer/ERR/2022_002: Composer Environment Service Account permissions
- composer/WARN/2022_001: Composer Service Agent permissions for Composer 2.x
- gce/BP_EXT/2022_001: GCP project has VM Manager enabled
- gce/WARN/2022_003: GCE VM instances quota is not near the limit
- gke/BP/2022_002: GKE clusters are using unique subnets
- gke/BP/2022_003: GKE cluster is not near to end of life
- gke/WARN/2022_003: GKE service account permissions to manage project firewall rules
- gke/WARN/2022_004: Cloud Logging API enabled when GKE logging is enabled
- gke/WARN/2022_005: NVIDIA GPU device drivers are installed on GKE nodes with GPU
Enhancements
- Support IAM policies for service accounts and subnetworks
- Skip rules using logs if Cloud Logging API is disabled
- New option: --logs-query-timeout
- Add support for configuration files
(see https://gcpdiag.dev/docs/usage/#configuration-file)
Fixes
- Fix various unhandled exceptions
gcpdiag 0.52
New rules
- dataproc/BP/2022_001: Cloud Monitoring agent is enabled.
- dataproc/ERR/2022_002: Dataproc is not using deprecated images.
- gce/WARN/2022_001: IAP service can connect to SSH/RDP port on instances.
- gce/WARN/2022_002: Instance groups named ports are using unique names.
- gke/ERR/2022_002: GKE nodes of private clusters can access Google APIs and services.
- gke/ERR/2022_003: GKE connectivity: load balancer to node communication (ingress).
Updated rules
- gcb/ERR/2022_001: Fix false positive when no build is configured.
- gke/WARN/2021_008: Improve Istio deprecation message
Enhancements
- Introduce "extended" rules (BP_EXT, ERR_EXT, etc.), disabled by default
and which can be enabled with --include-extended. - Large IAM policy code refactorings in preparation for org-level IAM
policy support.
Fixes
- More API retry fixes.
- Fix --billing-project which had no effect before.
- Fix exception related to GCE instance scopes.