Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to allow user to provide their own checksum value when checksum algorithm is specified #2644

Closed
wants to merge 11 commits into from

Conversation

yasminetalby
Copy link
Contributor

Issue #, if available: #2540

Description of changes:
Fix to allow user to provide their own checksum value when checksum algorithm is specified.

Check all that applies:

  • Did a review by yourself.
  • Added proper tests to cover this PR. (If tests are not applicable, explain.)
  • Checked if this PR is a breaking (APIs have been changed) change.
  • Checked if this PR will not introduce cross-platform inconsistent behavior.
  • Checked if this PR would require a ReadMe/Wiki update.

Check which platforms you have built SDK on to verify the correctness of this PR.

  • Linux
  • Windows
  • Android
  • MacOS
  • IOS
  • Other Platforms

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yasminetalby yasminetalby marked this pull request as ready for review August 29, 2023 15:57
@@ -251,6 +257,7 @@ bool AWSAuthV4Signer::SignRequest(Aws::Http::HttpRequest& request, const char* r
{
payloadHash = STREAMING_UNSIGNED_PAYLOAD_TRAILER;
Aws::String trailerHeaderValue = Aws::String("x-amz-checksum-") + request.GetRequestHash().first;
request.DeleteHeader(trailerHeaderValue.c_str());
Copy link
Contributor

@sbiscigl sbiscigl Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider not serializing the header x-amz-checksum- to the header in the first place during codegen as we will always end up adding it one way or another if needed. this is fine for now imo opinon, but removing the serialization could perhaps be a better long term solution

return m_precalculatedHashSet;
}

const Aws::String &GetPrecalculatedHash() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make these methods inline

httpRequest->SetRequestHash(checksumAlgorithmName, Aws::MakeShared<Crypto::CRC32>(AWS_CLIENT_LOG_TAG));
}
else if (checksumValueAndAlgorithmProvided){
httpRequest->SetHeaderValue(checksum_type, request.GetHeaders().find(checksum_type)->second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request.GetHeaders().find() will return request.GetHeaders().end() iterator which points beyond the container and it is UB to dereference -> such iterator.

@@ -238,7 +238,13 @@ bool AWSAuthV4Signer::SignRequest(Aws::Http::HttpRequest& request, const char* r
if (request.GetRequestHash().second != nullptr)
{
Aws::String checksumHeaderKey = Aws::String("x-amz-checksum-") + request.GetRequestHash().first;
Aws::String checksumHeaderValue = HashingUtils::Base64Encode(request.GetRequestHash().second->Calculate(*(request.GetContentBody())).GetResult());
auto checksumHeaderValue = [](const std::shared_ptr<Utils::Crypto::Hash> &hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know IIFE is cool, but KISS is also cool and ternary operator would work here too:

auto checksumHeaderValue = request.GetRequestHash().second ? 
    request.GetRequestHash().second : HashingUtils::Base64Encode(hash->Calculate(request.GetContentBody().get()).GetResult());

also, because we now dereference GetContentBody, it worth adding a check it is present (and trace error and return false if not)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or keep the IIFE, resulting ternary operator is also looking complicated.

@@ -48,6 +48,23 @@ namespace Aws

// when hashing streams, this is the size of our internal buffer we read the stream into
static const uint32_t INTERNAL_HASH_STREAM_BUFFER_SIZE = 8192;

void SetPrecalculatedHash(const Aws::String &precalculatedHash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding pre calculated to the interface scares me, since now we have to be aware that hash is either in the computing state and we need to pull the computed value or its in precomputed state and we pull that value.

I know its a public interface and we cannot really change it. One thing to consider is a NoopHash implementation - basically it will take precomputed hash and all update methods will do nothing, with GetHash just returning precomputed value. That way we can hide the fact that its precomputed

@@ -238,7 +238,13 @@ bool AWSAuthV4Signer::SignRequest(Aws::Http::HttpRequest& request, const char* r
if (request.GetRequestHash().second != nullptr)
{
Aws::String checksumHeaderKey = Aws::String("x-amz-checksum-") + request.GetRequestHash().first;
Aws::String checksumHeaderValue = HashingUtils::Base64Encode(request.GetRequestHash().second->Calculate(*(request.GetContentBody())).GetResult());
auto checksumHeaderValue = [](const std::shared_ptr<Utils::Crypto::Hash> &hash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lets backlog this and put a comment - we should not be computing checksum over the whole body and put it in header. we should keep a running checksum and compute it as we are streaming the body

namespace Utils {
namespace Crypto {
/**
* CRC32 hash implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you copied the comment as well

using namespace Aws::Utils::Crypto;

PrecalculatedHash::PrecalculatedHash(const Aws::String &mHash): m_hash(ByteBuffer((unsigned char*)mHash.c_str(),mHash.length())){
m_hash_string = mHash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to the initializer list instead of assigning in the ctor and rename to m_hashString pls

*/
class AWS_CORE_API PrecalculatedHash : public Hash {
public:
PrecalculatedHash(const Aws::String &mHash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: call it it hash instead of mHash lol. It really doesnt matter but we call member variables m_variable name to disambiguate from non-members

auto precalculatedValue = request.GetHeaders().find(checksumType)->second;
hash->SetPrecalculatedHash(precalculatedValue);
httpRequest->SetRequestHash(checksumAlgorithmName, hash); }
auto hash = Aws::MakeShared<Crypto::PrecalculatedHash>(AWS_CLIENT_LOG_TAG,request.GetHeaders().find(checksumType)->second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could likely factor all of this out into a single if (request.IsStreaming() && checksumValueAndAlgorithmProvided) statement instead of creating one for each checksum algo

HashResult PrecalculatedHash::Calculate(Aws::IStream& stream)
{
AWS_UNREFERENCED_PARAM(stream);
return m_hash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need m_hash at all anymore? imo whenever we want to return a byte buffer we should be calling HashingUtils::Base64Decode(m_hashString) that way we only are keeping track of one member variable and the constructor becomes a lot cleaner as we are only using one string param and not casting it to a byte array.

also it might be incorrect as we would be creating a byte buffer of the base54 encoding and not the actual hash result

@@ -774,61 +775,76 @@ void AWSClient::AppendHeaderValueToRequest(const std::shared_ptr<Aws::Http::Http
void AWSClient::AddChecksumToRequest(const std::shared_ptr<Aws::Http::HttpRequest>& httpRequest,

This comment was marked as spam.

/**
* Return the base64 encoded checksum value
*/
virtual Aws::String GetHashBase64(ByteBuffer& m_checksum){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im confused on how this is supposed to work. are the callers expected to retrieve checksum from the class before calling this method?


using namespace Aws::Utils::Crypto;

PrecalculatedHash::PrecalculatedHash(const Aws::String &hash) : m_hashString(hash){}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would recommend decoding b64 in constructor once. and then all functions that return raw value just return that precomputed

@@ -238,7 +238,7 @@ bool AWSAuthV4Signer::SignRequest(Aws::Http::HttpRequest& request, const char* r
if (request.GetRequestHash().second != nullptr)
{
Aws::String checksumHeaderKey = Aws::String("x-amz-checksum-") + request.GetRequestHash().first;
Aws::String checksumHeaderValue = HashingUtils::Base64Encode(request.GetRequestHash().second->Calculate(*(request.GetContentBody())).GetResult());
auto checksumHeaderValue = request.GetRequestHash().second->GetHashBase64(request.GetRequestHash().second->Calculate(*(request.GetContentBody())).GetResult());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetHashBase64 should not be taking result of calculate here. Just keep the old code here and let calculate return raw value for precomputed as described below

if (request->GetRequestHash().second != nullptr)
{
if (request->GetRequestHash().second != nullptr) {
auto hash = request->GetRequestHash().second->GetHash().GetResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hash can be changed to GetHashBase64()

if (checksumAlgorithmName == "crc32")
if (request.IsStreaming() && checksumValueAndAlgorithmProvided)
{
auto hash = Aws::MakeShared<Crypto::PrecalculatedHash>(AWS_CLIENT_LOG_TAG,request.GetHeaders().find(checksumType)->second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would switch a logic slightly here. Lets decode b64 that customer provided, make sure its valid b64, then construct precalculated hash and give it both b64 and raw formats. precalculated hash can then return b64/raw as specified by method.

we can probably remove the need to decode the b64, but that can be done along with the changes needed to avoid hashing the whole stream for signing purposes

@jmklix jmklix mentioned this pull request Jan 25, 2024
11 tasks
@jmklix
Copy link
Member

jmklix commented Feb 13, 2024

merged here: #2826

@jmklix jmklix closed this Feb 13, 2024
@jmklix jmklix deleted the checksum-notes branch February 13, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants