Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support default checksums #1191

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

feat: support default checksums #1191

wants to merge 10 commits into from

Conversation

0marperez
Copy link
Contributor

Issue #

Description of changes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@0marperez 0marperez added the no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly. label Nov 27, 2024

This comment has been minimized.

1 similar comment
Copy link

Affected Artifacts

Changed in size
Artifact Pull Request (bytes) Latest Release (bytes) Delta (bytes) Delta (percentage)
smithy-client-jvm.jar 65,382 62,681 2,701 4.31%
http-client-jvm.jar 319,197 314,811 4,386 1.39%
runtime-core-jvm.jar 811,421 811,044 377 0.05%

@0marperez
Copy link
Contributor Author

Note to reviewers: I'll address the breaking API changes and failing protocol issues soon. In the meantime, I’d appreciate a review of the checksum-related changes.

Comment on lines +29 to +31
* Calculates a request's checksum.
*
* If the checksum will be sent as a header, calculate the checksum.
* If a user supplies a checksum via an HTTP header no calculation will be done. The exception is MD5, if a user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness/clarification: "Calculates a request's checksum" and "If a user supplies a checksum via an HTTP header no calculation will be done." conflict with each other

*
* @param requestChecksumRequired Model sourced flag indicating if checksum calculation is mandatory.
* @param requestChecksumCalculation Configuration option that determines when checksum calculation should be done.
* @param userSelectedChecksumAlgorithm The checksum algorithm that the user selected for the request, may be null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: requestChecksumAlgorithm

private val forcedToCalculateChecksum = requestChecksumRequired || requestChecksumCalculation == HttpChecksumConfigOption.WHEN_SUPPORTED
private val checksumHeader = StringBuilder("x-amz-checksum-")
private val defaultChecksumAlgorithm = lazy { Crc32() }
private val defaultChecksumAlgorithmHeaderPostfix = "crc32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postfix -> Suffix

* The header must start with "x-amz-checksum-" followed by the checksum algorithm's name.
* MD5 is not considered a valid checksum algorithm.
*/
private fun userProviderChecksumHeader(request: HttpRequest, logger: Logger): String? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: can be restructured as an extension function HttpRequest.checksumHeader(logger: Logger): String?

and naming: userProvidedChecksumHeader

* Users can check which checksum was validated by referencing the `ResponseChecksumValidated` execution context variable.
*
* @param shouldValidateResponseChecksumInitializer A function which uses the input [I] to return whether response checksum validation should occur
* @param responseValidationRequired Flag indicating if the checksum validation is mandatory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: "Model sourced flag"

.removePrefix("x-amz-checksum-")
.toHashFunction() ?: throw ClientException("Could not parse checksum algorithm from header $checksumHeader")

if (context.protocolResponse.body is HttpBody.Bytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this branch added? The spec says we should delay checksum calculation until the body is consumed:

Where possible, SDKs MUST defer this calculation and validation until the payload is actually consumed by the user i.e. a payload must not be read twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toHashingBody doesn't support Bytes bodies. I decided to calculate the checksum in memory instead of adding support for Bytes to toHashingBody. It should be safe if the request is not being streamed but I think you're right that we're not following the spec

Copy link
Contributor

@lauzadis lauzadis Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toHashingBody doesn't support Bytes bodies

Does it need to? The previous implementation never seemed to need it

@@ -126,23 +133,6 @@ class FlexibleChecksumsRequestInterceptorTest {
assertEquals(0, call.request.headers.getNumChecksumHeaders())
}

@Test
fun itSetsChecksumHeaderViaExecutionContext() = runTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

@@ -163,29 +168,4 @@ class FlexibleChecksumsResponseInterceptorTest {

op.roundTrip(client, TestInput("input"))
}

@Test
fun testSkipsValidationWhenDisabled() = runTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?


public enum class HttpChecksumConfigOption {
/**
* SDK will create/validate checksum if the service marks it as required or if this is set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: create -> calculate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctness: missing tests for when requestChecksumRequired = false and requestChecksumCalculation = HttpChecksumConfigOption.WHEN_REQUIRED. Same for response validation

* @param checksumAlgorithmNameInitializer an optional function which parses the input [I] to return the checksum algorithm name.
* if not set, then the [HttpOperationContext.ChecksumAlgorithm] execution context attribute will be used.
* If the request will be streamed:
* - The checksum calculation is done asynchronously using a hashing & completing body.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "asynchronously" → "during transmission"

* - The checksum will be sent in a trailing header, once the request is consumed.
*
* If the request will not be streamed:
* - The checksum calculation is done synchronously
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "synchronously" → "before transmission"

Comment on lines +60 to +75
private val defaultChecksumAlgorithm = lazy { Crc32() }
private val defaultChecksumAlgorithmHeaderPostfix = "crc32"

private val checksumAlgorithm = userSelectedChecksumAlgorithm?.let {
val hashFunction = userSelectedChecksumAlgorithm.toHashFunction()
if (hashFunction == null || !hashFunction.isSupported) {
throw ClientException("Checksum algorithm '$userSelectedChecksumAlgorithm' is not supported for flexible checksums")
}
checksumHeader.append(userSelectedChecksumAlgorithm.lowercase())
hashFunction
} ?: if (forcedToCalculateChecksum) {
checksumHeader.append(defaultChecksumAlgorithmHeaderPostfix)
defaultChecksumAlgorithm.value
} else {
null
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Seems unnecessary to declare forcedToCalculateChecksum, defaultChecksumAlgorithm, and defaultChecksumAlgorithmHeaderPostfix as fields only to use them in one if branch. Could this just be:

?: if (requestChecksumRequired || requestChecksumCalculation == HttpChecksumConfigOption.WHEN_SUPPORTED) {
    checksumHeader.append("crc32")
    Crc32()
} else

Comment on lines +80 to +81
userProviderChecksumHeader(context.protocolRequest, logger)?.let {
logger.debug { "User supplied a checksum via header, skipping checksum calculation" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: These log messages which talk about the user feel awkward. Log messages are for users so it seems strange to mention the user in the third person. I'd suggest rewording these to be more about data/actions and less about the actors involved (e.g., "Found a provided checksum in the request, skipping checksum calculation").


deferredChecksum.complete(checksum)
} else {
logger.debug { "Calculating checksum asynchronously" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We've lost the log message that we're calculating a checksum asynchronously during transmission. I'd suggest moving your "Calculating checksum using '$checksumAlgorithm'" message into the if/else clauses, tweaking each to mention whether the checksum is being calculated before or during transmission.

Comment on lines +159 to +163
private fun String.isCompositeChecksum(): Boolean {
// Ends with "-#" where "#" is a number between 1-1000
val regex = Regex("-([1-9][0-9]{0,2}|1000)$")
return regex.containsMatchIn(this)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: When this logic is made S3 specific, we shouldn't assert on the part number fitting inside a certain range. Just "-(\d)+$" should suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please add tests for HttpChecksumConfigOption.WHEN_REQUIRED. Applies to response interceptors as well.

Comment on lines -129 to -144
@Test
fun itSetsChecksumHeaderViaExecutionContext() = runTest {
checksums.forEach { (checksumAlgorithmName, expectedChecksumValue) ->
val req = HttpRequestBuilder().apply {
body = HttpBody.fromBytes("<Foo>bar</Foo>".encodeToByteArray())
}

val op = newTestOperation<Unit, Unit>(req, Unit)
op.context[HttpOperationContext.ChecksumAlgorithm] = checksumAlgorithmName
op.interceptors.add(FlexibleChecksumsRequestInterceptor<Unit>())

op.roundTrip(client, Unit)
val call = op.context.attributes[HttpOperationContext.HttpCallList].first()
assertEquals(expectedChecksumValue, call.request.headers["x-amz-checksum-$checksumAlgorithmName"])
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we still use HttpOperationContext.ChecksumAlgorithm or can it be deprecated?

Comment on lines -167 to -190
@Test
fun testSkipsValidationWhenDisabled() = runTest {
val req = HttpRequestBuilder()
val op = newTestOperation<TestInput>(req)

op.interceptors.add(
FlexibleChecksumsResponseInterceptor<TestInput> {
false
},
)

val responseChecksumHeaderName = "x-amz-checksum-crc32"

val responseHeaders = Headers {
append(responseChecksumHeaderName, "incorrect-checksum-would-throw-if-validated")
}

val client = getMockClient(response, responseHeaders)

val output = op.roundTrip(client, TestInput("input"))
output.body.readAll()

assertNull(op.context.getOrNull(ChecksumHeaderValidated))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why did we delete this test?

Comment on lines +31 to +34
/**
* SDK will create/validate checksum if the service marks it as required or if this is set.
*/
WHEN_SUPPORTED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "or if this is set" → "or if the service offers optional checksums"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a changelog entry isn't required for a pull request. Use sparingly.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants