Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-3168 Retry KMS requests on transient errors. #1887

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

qingyang-hu
Copy link
Collaborator

@qingyang-hu qingyang-hu commented Nov 11, 2024

@mongodb-drivers-pr-bot mongodb-drivers-pr-bot bot added the priority-3-low Low Priority PR for Review label Nov 11, 2024
Copy link
Contributor

mongodb-drivers-pr-bot bot commented Nov 11, 2024

API Change Report

./v2/mongo

incompatible changes

(*ReplaceOneModel).SetSort: removed
(UpdateOneModel).SetSort: removed
##Connect: changed from func(...
./v2/mongo/options.ClientOptions) (*Client, error) to func(..../v2/mongo/options.Lister[./v2/mongo/options.ClientOptions]) (*Client, error)
ReplaceOneModel.Sort: removed
UpdateOneModel.Sort: removed

./v2/mongo/options

incompatible changes

(*AutoEncryptionOptions).SetBypassAutoEncryption: removed
(*AutoEncryptionOptions).SetBypassQueryAnalysis: removed
(*AutoEncryptionOptions).SetEncryptedFieldsMap: removed
(*AutoEncryptionOptions).SetExtraOptions: removed
(*AutoEncryptionOptions).SetKeyVaultClientOptions: removed
(*AutoEncryptionOptions).SetKeyVaultNamespace: removed
(*AutoEncryptionOptions).SetKmsProviders: removed
(*AutoEncryptionOptions).SetSchemaMap: removed
(*AutoEncryptionOptions).SetTLSConfig: removed
(*ClientOptions).ApplyURI: removed
(*ClientOptions).SetAppName: removed
(*ClientOptions).SetAuth: removed
(*ClientOptions).SetAutoEncryptionOptions: removed
(*ClientOptions).SetBSONOptions: removed
(*ClientOptions).SetCompressors: removed
(*ClientOptions).SetConnectTimeout: removed
(*ClientOptions).SetDialer: removed
(*ClientOptions).SetDirect: removed
(*ClientOptions).SetDisableOCSPEndpointCheck: removed
(*ClientOptions).SetDriverInfo: removed
(*ClientOptions).SetHTTPClient: removed
(*ClientOptions).SetHeartbeatInterval: removed
(*ClientOptions).SetHosts: removed
(*ClientOptions).SetLoadBalanced: removed
(*ClientOptions).SetLocalThreshold: removed
(*ClientOptions).SetLoggerOptions: removed
(*ClientOptions).SetMaxConnIdleTime: removed
(*ClientOptions).SetMaxConnecting: removed
(*ClientOptions).SetMaxPoolSize: removed
(*ClientOptions).SetMinPoolSize: removed
(*ClientOptions).SetMonitor: removed
(*ClientOptions).SetPoolMonitor: removed
(*ClientOptions).SetReadConcern: removed
(*ClientOptions).SetReadPreference: removed
(*ClientOptions).SetRegistry: removed
(*ClientOptions).SetReplicaSet: removed
(*ClientOptions).SetRetryReads: removed
(*ClientOptions).SetRetryWrites: removed
(*ClientOptions).SetSRVMaxHosts: removed
(*ClientOptions).SetSRVServiceName: removed
(*ClientOptions).SetServerAPIOptions: removed
(*ClientOptions).SetServerMonitor: removed
(*ClientOptions).SetServerMonitoringMode: removed
(*ClientOptions).SetServerSelectionTimeout: removed
(*ClientOptions).SetTLSConfig: removed
(*ClientOptions).SetTimeout: removed
(*ClientOptions).SetWriteConcern: removed
(*ClientOptions).SetZlibLevel: removed
(*ClientOptions).SetZstdLevel: removed
(*ClientOptions).Validate: removed
(*DistinctOptionsBuilder).SetHint: removed
(*FindOneOptionsBuilder).SetOplogReplay: removed
(*FindOptionsBuilder).SetOplogReplay: removed
(*LoggerOptions).SetComponentLevel: removed
(*LoggerOptions).SetMaxDocumentLength: removed
(*LoggerOptions).SetSink: removed
(*ReplaceOptionsBuilder).SetSort: removed
(*ServerAPIOptions).SetDeprecationErrors: removed
(*ServerAPIOptions).SetStrict: removed
(*UpdateOneOptionsBuilder).SetSort: removed
AutoEncryption: changed from func() *AutoEncryptionOptions to func() *AutoEncryptionOptionsBuilder
AutoEncryptionOptions.KeyVaultClientOptions: changed from *ClientOptions to Lister[ClientOptions]
Client: changed from func() *ClientOptions to func() *ClientOptionsBuilder
ClientOptions.AutoEncryptionOptions: changed from *AutoEncryptionOptions to Lister[AutoEncryptionOptions]
ClientOptions.DriverInfo: removed
ClientOptions.LoggerOptions: changed from *LoggerOptions to Lister[LoggerOptions]
ClientOptions.ServerAPIOptions: changed from *ServerAPIOptions to Lister[ServerAPIOptions]
DistinctOptions.Hint: removed
DriverInfo: removed
FindOneOptions.OplogReplay: removed
FindOptions.OplogReplay: removed
Logger: changed from func() *LoggerOptions to func() *LoggerOptionsBuilder
MergeClientOptions: removed
ReplaceOptions.Sort: removed
ServerAPI: changed from func(ServerAPIVersion) *ServerAPIOptions to func(ServerAPIVersion) *ServerAPIOptionsBuilder
UpdateOneOptions.Sort: removed

compatible changes

AutoEncryptionOptionsBuilder: added
ClientOptionsBuilder: added
LoggerOptionsBuilder: added
ServerAPIOptionsBuilder: added

./v2/x/mongo/driver

incompatible changes

##WriteCommandError.Retryable: changed from func(./v2/x/mongo/driver/description.ServerKind, ./v2/x/mongo/driver/description.VersionRange) bool to func(./v2/x/mongo/driver/description.VersionRange) bool
##WriteConcernError.Retryable: changed from func(./v2/x/mongo/driver/description.ServerKind, *./v2/x/mongo/driver/description.VersionRange) bool to func() bool

./v2/x/mongo/driver/auth

incompatible changes

HandshakeOptions.OuterLibraryName: removed
HandshakeOptions.OuterLibraryPlatform: removed
HandshakeOptions.OuterLibraryVersion: removed

./v2/x/mongo/driver/mongocrypt

compatible changes

(*KmsContext).RequestError: added

./v2/x/mongo/driver/operation

incompatible changes

(*Distinct).Hint: removed
(*Hello).OuterLibraryName: removed
(*Hello).OuterLibraryPlatform: removed
(*Hello).OuterLibraryVersion: removed

./v2/x/mongo/driver/topology

incompatible changes

##ConvertToDriverAPIOptions: changed from func(*./v2/mongo/options.ServerAPIOptions) *./v2/x/mongo/driver.ServerAPIOptions to func(./v2/mongo/options.Lister[./v2/mongo/options.ServerAPIOptions]) ./v2/x/mongo/driver.ServerAPIOptions
##NewConfig: changed from func(
./v2/mongo/options.ClientOptions, *./v2/x/mongo/driver/session.ClusterClock) (Config, error) to func(./v2/mongo/options.ClientOptionsBuilder, *./v2/x/mongo/driver/session.ClusterClock) (*Config, error)
WithOuterLibraryName: removed
WithOuterLibraryPlatform: removed
WithOuterLibraryVersion: removed

compatible changes

NewConfigFromOptions: added

@qingyang-hu qingyang-hu force-pushed the godriver3168 branch 5 times, most recently from b11c2a5 to 93ebd31 Compare November 11, 2024 19:25
@qingyang-hu qingyang-hu force-pushed the godriver3168 branch 6 times, most recently from 4d94342 to fd682b7 Compare November 13, 2024 16:38
@qingyang-hu qingyang-hu force-pushed the godriver3168 branch 4 times, most recently from 6ecb7c2 to ddc0900 Compare November 20, 2024 17:02
@qingyang-hu qingyang-hu force-pushed the godriver3168 branch 5 times, most recently from 77cf4a1 to d4a0765 Compare November 20, 2024 21:49
@qingyang-hu qingyang-hu force-pushed the godriver3168 branch 5 times, most recently from 074d596 to 50549f6 Compare November 25, 2024 23:59
@qingyang-hu qingyang-hu force-pushed the godriver3168 branch 9 times, most recently from 58a6a49 to db03cdf Compare December 2, 2024 21:43
@qingyang-hu qingyang-hu marked this pull request as ready for review December 2, 2024 22:41
@@ -399,7 +397,10 @@ func (c *crypt) decryptKey(kmsCtx *mongocrypt.KmsContext) error {

res := make([]byte, bytesNeeded)
bytesRead, err := conn.Read(res)
if err != nil && !errors.Is(err, io.EOF) {
if err != nil {
if kmsCtx.Fail() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See MONGOCRYPT-752. When mongocrypt_kms_ctx_fail returns false, consider wrapping the error message from mongocrypt_ctx_kms_status. The error from mongocrypt_ctx_kms_status may help identify a retry occurred (e.g. "KMS request failed after 3 retries due to a network error: last attempt failed with: ")

@qingyang-hu qingyang-hu marked this pull request as draft December 4, 2024 21:17
@qingyang-hu qingyang-hu marked this pull request as ready for review December 5, 2024 21:11
Comment on lines 2994 to 3003
if tlsCAFile := os.Getenv("KMS_FAILPOINT_CA_FILE"); tlsCAFile == "" {
require.Fail(mt, "failed to load CA file")
} else {
var err error
clientAndCATlsMap := map[string]interface{}{
"tlsCAFile": tlsCAFile,
}
tlsCfg, err = options.BuildTLSConfig(clientAndCATlsMap)
require.Nil(mt, err, "BuildTLSConfig error: %v", err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since require.* stops test execution when a test fails, an if/else here isn't required.

Suggested change
if tlsCAFile := os.Getenv("KMS_FAILPOINT_CA_FILE"); tlsCAFile == "" {
require.Fail(mt, "failed to load CA file")
} else {
var err error
clientAndCATlsMap := map[string]interface{}{
"tlsCAFile": tlsCAFile,
}
tlsCfg, err = options.BuildTLSConfig(clientAndCATlsMap)
require.Nil(mt, err, "BuildTLSConfig error: %v", err)
}
tlsCAFile := os.Getenv("KMS_FAILPOINT_CA_FILE")
require.NotEmpty(mt, tlsCAFile, "failed to load CA file")
tlsCfg, err = options.BuildTLSConfig(map[string]interface{}{"tlsCAFile": tlsCAFile})
require.Nil(mt, err, "BuildTLSConfig error: %v", err)

Copy link
Collaborator

@prestonvasquez prestonvasquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor edit

internal/integration/client_side_encryption_prose_test.go Outdated Show resolved Hide resolved
prestonvasquez
prestonvasquez previously approved these changes Jan 15, 2025
Comment on lines 565 to 569
- command: subprocess.exec
params:
binary: python3
background: true
args: ["-u", "${DRIVERS_TOOLS}/.evergreen/csfle/kms_failpoint_server.py", "--port", "9003"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is already done by start-servers.sh.

@@ -553,6 +553,39 @@ functions:
KMS_MOCK_SERVERS_RUNNING: "true"
args: [*task-runner, evg-test-kmip]

start-kms-failpoint-server:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems like mostly a duplicate of start-cse-servers. Can we use that existing function instead of this one?

@@ -2979,6 +2981,147 @@ func TestClientSideEncryptionProse(t *testing.T) {
assert.Greater(t, len(payload.Data), len(payloadDefaults.Data), "the returned payload size is expected to be greater than %d", len(payloadDefaults.Data))
})
})

mt.RunOpts("24. kms retry tests", noClientOpts, func(mt *mtest.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Consider separating this into its own test function.

Comment on lines 2242 to 2246
- matrix_name: "retry-kms-requests-test"
matrix_spec: { version: ["7.0"], os-ssl-40: ["rhel87-64"] }
display_name: "Retry KMS Requests ${os-ssl-40}"
tasks:
- name: ".retry-kms-requests"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making a new matrix, can we roll this up into a general "KMS Test" that contains all ".kms-tls" and ".retry-kms-requests" tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-3-low Low Priority PR for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants