Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CBR] Provider produced inconsistent result after apply #5002

Labels
bug service/CBR Issues related to Context Based Restrictions

Comments

@Ak-sky
Copy link

Ak-sky commented Dec 19, 2023

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform IBM Provider Version

Terraform Version v1.5.7
Terraform IBM Provider Version v1.60.1

Affected Resource(s)

  • ibm_cbr_zone

Terraform Configuration Files

We have noticed an issue during one of the test for TF Event Notification module example which is

Provider produced inconsistent result after apply

When applying changes to module.cbr_zone.ibm_cbr_zone.cbr_zone, provider
"provider[\"registry.terraform.io/ibm-cloud/ibm\"]" produced an unexpected
new value: Root resource was present, but now absent.
This is a bug in the provider, which should be reported in the provider's
own issue tracker.

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please share a link to the ZIP file.

Debug Output

Adding Logs
TF_debug.log
TF_stdout.txt

Panic Output

Expected Behavior

All the zones should get created/destroyed during the TF operation.

Actual Behavior

Provider produced inconsistent result after apply.

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • #0000
@github-actions github-actions bot added bug service/CBR Issues related to Context Based Restrictions labels Dec 19, 2023
@shemau
Copy link

shemau commented Jan 25, 2024

The same error is reported

│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to module.cbr_zone_schematics.ibm_cbr_zone.cbr_zone,
│ provider "provider[\"registry.terraform.io/ibm-cloud/ibm\"]" produced an
│ unexpected new value: Root resource was present, but now absent.
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

This is an intermittent problem, a re-run in the same environment was successful. So it presents as a timing issue on the backend or a network issue to the backend that the provider might resolve with retry logic.

Ref: terraform-ibm-cos 518

@zhenwan
Copy link
Contributor

zhenwan commented Jan 25, 2024

From TF_stdout.txt, I see the error was caused by creating a new zone with a zone name which already exists:

{
   "errors":[
      {
         "code":"zone_already_exists_in_store",
         "message":"A network zone with the same name already exists. Assign each network zone a unique name.",
         "target":{
            "name":"name",
            "type":"field",
            "value":"event-notification-complete-1jbr01-VPC-network-zone"
         }
      }
   ],
   "status_code":409,
   "trace":"edbf12a0-5a4d-4353-9a35-bfc6fcf707f9"
}

@shemau
Copy link

shemau commented Jan 26, 2024

@zhenwan my take is that causality is the other way around.

At 06:12:08 during an apply the error reported here is observed.
At 06:12:13 a retry occurs (a new/second terraform apply is run).
At 06:12:31 the terraform delta shows CBR zone and rule are missing (relative to the statefile/partial deploy) followed by a duplicate zone error.

The causality is not that the absent error reported in this issue was caused by the zone existing; The zone existing error was caused by the absent error.

IMHO we see the duplicate because Root resource was present, but now absent had in fact created the resource/zone but failed to store it in the state file because it was absent. The subsequent deploy then tried to create it again, but it already existed.

@ocofaigh
Copy link
Contributor

ocofaigh commented Jan 26, 2024

We have seen this in the past with other services. The terraform provider executes a POST api to create something, and then it does a GET to check if it is there. If the GET is done against a database which may not yet have the entry for the new resources (e.g. slow replication), the GET can return nothing, and terraform can retry the POST to create.
That might be what is happening here? Perhaps a retry, or sleep needs to be added to the GET before retrying the POST?

@Ak-sky
Copy link
Author

Ak-sky commented Feb 5, 2024

Again just got hit with the same issue-
Attached TF Trace Log File: cbr-fscloud_Trace_TFA_05.02.2024-12.47.50.log.
TF Apply Std Output: cbr-fscloud_TFA_05.02.2024-12.47.50.log

ibm_is_subnet.testacc_subnet: Creating...
ibm_is_subnet.testacc_subnet: Still creating... [10s elapsed]
ibm_is_subnet.testacc_subnet: Creation complete after 13s [id=0717-e319bdc9-e139-493a-bb80-b687fbaded17]

│ Error: Provider produced inconsistent result after apply

│ When applying changes to
│ module.cbr_account_level.module.cbr_zone["is"].ibm_cbr_zone.cbr_zone,
│ provider "provider["registry.terraform.io/ibm-cloud/ibm"]" produced an
│ unexpected new value: Root resource was present, but now absent.

│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.


│ Error: Provider produced inconsistent result after apply

│ When applying changes to
│ module.cbr_account_level.module.cbr_zone["databases-for-enterprisedb"].ibm_cbr_zone.cbr_zone,
│ provider "provider["registry.terraform.io/ibm-cloud/ibm"]" produced an
│ unexpected new value: Root resource was present, but now absent.

│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

                                                                                                                                                                             [1m-3.0s]

@shemau
Copy link

shemau commented Feb 5, 2024

2024-02-05T12:47:59.312+0530 [INFO] provider.terraform-provider-ibm_v1.56.1: 2024/02/05 12:47:59 [Debug] Request:

appears to show a retry loop following a rate limit (429) with 7 retries in 0.05 of a second, ignoring the retry-in field in the response? Showing X-Ratelimit-Remaining: 0 just before the error. I am not sure retries could be that quick, so maybe it is not.

At 12:48:00 shows the zone created (zone id: 8cbe290919a4bd58ea821e7f420f03bd)
2024-02-05T12:48:00.837+0530 [INFO] provider.terraform-provider-ibm_v1.56.1: 2024/02/05 12:48:00 [Debug] Response:

At 12:48:01 shows the zone does not exist
2024-02-05T12:48:01.439+0530 [INFO] provider.terraform-provider-ibm_v1.56.1: 2024/02/05 12:48:01 [Debug] Response:

@ocofaigh
Copy link
Contributor

ocofaigh commented Feb 5, 2024

@Ak-sky please escalate internally if getting no attention here

@zhenwan
Copy link
Contributor

zhenwan commented Feb 6, 2024

@Ak-sky I tried TF Event Notification module example several times, I could not re-produce the issue that you saw.

I have noticed that you are using Terraform IBM Provider Version v1.60.1, the provider version I am using is the latest one, v1.62.0, which contains several fixes to the CBR provider.

Can you upgrade your version to the latest one, I just want to know if you still see the problem? Please upload the log files to this issue if the problem still exists.

@shemau
Copy link

shemau commented Feb 7, 2024

TestRunFSCloudExample 2024-02-06T20:10:26Z logger.go:66: - Installing ibm-cloud/ibm v1.62.0...
...
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: ╷
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ Error: Provider produced inconsistent result after apply
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ 
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ When applying changes to
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ module.cos_fscloud.module.buckets.module.buckets["cos-fscloud-wcvtor-bucket"].module.bucket_cbr_rule[0].ibm_cbr_rule.cbr_rule,
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ provider "provider[\"registry.terraform.io/ibm-cloud/ibm\"]" produced an
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ unexpected new value: Root resource was present, but now absent.
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ 
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ This is a bug in the provider, which should be reported in the provider's
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66: │ own issue tracker.
TestRunFSCloudExample 2024-02-06T20:11:34Z logger.go:66:

This is encountered multiple times a day at different provider versions from much older providers to the latest.

@zhenwan there is IMHO a long standing race condition in the provider, as @ocofaigh notes without appropriate retry logic in the IBM provider. A resource is created and the state file is not updated which causes a cascading issue within a terraform environment.

I don't have log files this run, it was just a regular run I was performing.

@Ak-sky
Copy link
Author

Ak-sky commented Mar 7, 2024

Ran into this issue again-
TF Apply std output- cbr-fscloud_TFA_07.03.2024-23.01.57.log
TF Trace Log- cbr-fscloud_Trace_TFA_07.03.2024-23.01.57.log

@zhenwan
Copy link
Contributor

zhenwan commented Mar 7, 2024

@Ak-sky Thank you for the logs. After checking logDNA, it looks the issue was caused by GET request sending to a different region immediately after the POST request.

This is a known issue caused by Eventual consistency patten. We will see how we can mitigate this issue.

@zhenwan
Copy link
Contributor

zhenwan commented May 1, 2024

@Ak-sky I want to let you know if you want to test the fix to this issue, you need to use v1.65.0-beta0 version. The version v1.64.2 does NOT include the fix

@tyao117
Copy link
Contributor

tyao117 commented May 3, 2024

@Ak-sky, please use the https://github.com/IBM-Cloud/terraform-provider-ibm/releases/tag/v1.65.0-beta0 to test if the fix is correct because all beta releases can still be used to target the production environment. The CBR team needs to determine is the fix is working; that's why it is labeled beta release.

ocofaigh pushed a commit to terraform-ibm-modules/terraform-ibm-cbr that referenced this issue Jun 24, 2024
@Ak-sky
Copy link
Author

Ak-sky commented Jul 4, 2024

Ran into this issue again, running with TF version - Terraform v1.5.7, IBM provider version- v1.66.0.
Looks like this issue still persists in v1.66.0 also.

image

TF Trace Log - cbr-fscloud_Trace_TFA_04.07.2024-20.30.10.log
TF Apply std. output - cbr-fscloud_TFA_04.07.2024-20.30.10.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment