Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize UrlEncode in Utils #3307

Merged
merged 25 commits into from
Jul 10, 2024
Merged

Conversation

danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Apr 28, 2024

Description

Optimizes UrlEncode for better memory efficiency

Motivation and Context


BenchmarkDotNet v0.13.12, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M3 Max, 1 CPU, 14 logical and 14 physical cores
.NET SDK 8.0.202
  [Host]     : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD


image

https://github.com/danielmarbach/aws-sdk-net-benchmarks/blob/main/UrlEncode.cs

Testing

Added tests to verify the code still does the same as before.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

sdk/src/Core/AWSSDK.Core.NetStandard.csproj Show resolved Hide resolved
@@ -75,12 +77,6 @@ public static partial class AWSSDKUtils
// Default value of progress update interval for streaming is 100KB.
public const long DefaultProgressUpdateInterval = 102400;

internal static Dictionary<int, string> RFCEncodingSchemes = new Dictionary<int, string>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did check the entire SDK and couldn't find another reference so I got rid of it. Is it possible to entirely get rid of the different RFC implementations? If that is possible we could potentially even get rid of the custom code and replace it with the System.Net implementation?

https://learn.microsoft.com/en-us/dotnet/api/system.net.webutility.urlencode?view=net-8.0

if (!TryGetRFCEncodingSchemes(rfcNumber, out var validUrlCharacters))
validUrlCharacters = ValidUrlCharacters;

var unreservedChars = string.Concat(validUrlCharacters, path ? ValidPathCharacters : "");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the encoding schemes are only used here the string concat could also go by already have the concatenated string available as costs and returning them by TryGetRFCEncodingSchemes

@dscpinheiro dscpinheiro added the v4 label Apr 29, 2024
@danielmarbach
Copy link
Contributor Author

@normj @dscpinheiro I know this is still draft but would you mind having a look at it at some point? I have added a bunch of inline comments that might influence this PR.

@danielmarbach
Copy link
Contributor Author

Ping @normj

@danielmarbach danielmarbach marked this pull request as ready for review May 30, 2024 16:13
@danielmarbach
Copy link
Contributor Author

I raised #3326 because the constants are a bit messy and need an overhaul. I did my best to get things into compilable state for now.

@danielmarbach
Copy link
Contributor Author

Ok I'll do a rebase

@danielmarbach
Copy link
Contributor Author

@normj rebased

Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll get somebody else on the team to do a second pass.

@danielmarbach
Copy link
Contributor Author

Rebased to solve the conflict

@danielmarbach
Copy link
Contributor Author

Would be nice if this could be merged since I have a bunch of other PRs that introduce more tests to the test file that probably require a rebase then ;)

@boblodgett
Copy link
Contributor

boblodgett commented Jul 2, 2024

Would be nice if this could be merged since I have a bunch of other PRs that introduce more tests to the test file that probably require a rebase then ;)

Taking a look / Dry-running the changes.

@boblodgett
Copy link
Contributor

I don't visually see issues in the PR. However, this didn't build in our build system because a change is required to support the /unsafe switch. I will make the change so we can ensure this PR will build and test correctly before I approve the PR.

Copy link
Contributor

@boblodgett boblodgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmarbach danielmarbach requested review from boblodgett and normj July 3, 2024 20:02
Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with assumption @boblodgett gets a successful build through the internal system.

@dscpinheiro
Copy link
Contributor

@danielmarbach I approved and merged #3363 and #3365, but we saw integration test failures when running this PR on our build system (they were all related to this CloudFormation section: https://github.com/aws/aws-sdk-net/blob/3.7.844.0/sdk/test/NetStandard/IntegrationTests/IntegrationTests/CloudFormation.cs#L20).

Essentially that value is passed to UrlEncode (via GetParametersAsString), but AWSSDKUtils.UrlEncode(TEMPLATE_TEXT, false) in your branch is not returning the same value as v4-development.

@danielmarbach
Copy link
Contributor Author

@dscpinheiro Ok I'll check

@danielmarbach
Copy link
Contributor Author

@dscpinheiro I'm approaching my leave period so it might take a bit longer for me to respond. Apologies in advance

@danielmarbach
Copy link
Contributor Author

@dscpinheiro I fixed the problem. It was between the chair and the keyboard 👯

encoded.Append(ToUpperHex(loNibble));
if (unreservedChars.IndexOf((char)symbol) != -1)
{
dataBuffer[index++] = symbol;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we could potentially remove bound checks here by doing something like

Unsafe.Add(ref MemoryMarshal.GetReference(dataBuffer), index++) = symbol;

but I decided to not go down that path since it increases the complexity of the code

@danielmarbach
Copy link
Contributor Author

Give it another run through the benchmark

image

Still very good allocation reduction and the throughput hit is still in very acceptable range

@boblodgett
Copy link
Contributor

All the new changes are being run through the backend build system now.

@boblodgett
Copy link
Contributor

boblodgett commented Jul 8, 2024

One test failure for netcoreapp3.1 target running tests.

error : Test failure in XUnitTestRunner: UnitTests.NetStandard.Core.AWSSDKUtilsTests.UrlEncodeWithoutPath(input: "{\r\n\t\"AWSTemplateFormatVersion\" : \"2010-09-09"..., expected: "%7B%0A%09%22AWSTemplateFormatVersion%22%20%3A%20%2"...)

Because of how our system runs I do not know if other targets also have problems yet.

@danielmarbach
Copy link
Contributor Author

I ran against net8 and as far as I recall net4.x. I'm not installing a runtime that is out of support for almost two years and hasn't received security updates since them. Sorry.

So if you can provide me the entire test output with some information that makes it possible for me to guess what's going on I'm happy to give it another look.

Otherwise this PR has also edit rights. Feel free to push updates.

@boblodgett
Copy link
Contributor

I ran against net8 and as far as I recall net4.x. I'm not installing a runtime that is out of support for almost two years and hasn't received security updates since them. Sorry.

So if you can provide me the entire test output with some information that makes it possible for me to guess what's going on I'm happy to give it another look.

Otherwise this PR has also edit rights. Feel free to push updates.

Thank you for working with us on this PR while we shake out the test issues. I ran the test in Visual Studio locally and it also fails in .NET 8.

The project containing the failed test is: .\aws-sdk-net\sdk\test\NetStandard\UnitTests\AWSSDK.UnitTests.Custom.NetStandard.csproj

The file containing the failed test within that project is: .\aws-sdk-net\sdk\test\NetStandard\UnitTests\Core\AWSSDKUtilsTests.cs

The test is: public void UrlEncodeWithoutPath(string input, string expected) which has three [InlineData] test cases. The case that is failing specifically is the third one which uses the TemplateText.

Here is a link to the test case: https://github.com/danielmarbach/aws-sdk-net/blob/f5fe55e95b98d07bb7cd2c2a8e8f4451f0b1b819/sdk/test/NetStandard/UnitTests/Core/AWSSDKUtilsTests.cs#L70

TemplateText can be found here: https://github.com/danielmarbach/aws-sdk-net/blob/f5fe55e95b98d07bb7cd2c2a8e8f4451f0b1b819/sdk/test/NetStandard/UnitTests/Core/AWSSDKUtilsTests.cs#L78

All other tests in this project pass.

@boblodgett
Copy link
Contributor

From the look of it \r (0x0D) characters are being injected and encoded when the test case hits a \n (0x0A):

image

[Theory]
[InlineData("value, with special chars!", "value%2C%20with%20special%20chars%21")]
[InlineData("value, with special chars and path {/+:}", "value%2C%20with%20special%20chars%20and%20path%20%7B%2F%2B%3A%7D")]
[InlineData(TemplateText, "%7B%0A%09%22AWSTemplateFormatVersion%22%20%3A%20%222010-09-09%22%2C%0A%0A%09%22Description%22%20%3A%20%22This%20is%20a%20sample%20template%22%2C%0A%0A%09%22Parameters%22%20%3A%20%7B%0A%09%09%22TopicName%22%20%3A%20%7B%0A%09%09%20%20%20%20%22Type%22%20%3A%20%22String%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Default%22%20%3A%20%22TheTopic%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Description%22%20%3A%20%22A%20topic.%22%0A%09%09%7D%2C%0A%20%20%20%20%20%20%20%20%22QueueName%22%20%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Type%22%20%3A%20%22String%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Default%22%20%3A%20%22TheQueue%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Description%22%20%3A%20%22A%20queue.%22%0A%20%20%20%20%20%20%20%20%7D%0A%09%7D%2C%0A%0A%09%22Resources%22%20%3A%20%7B%0A%0A%09%09%22TheQueue%22%20%3A%20%7B%0A%09%09%20%20%20%20%22Type%22%20%3A%20%22AWS%3A%3ASQS%3A%3AQueue%22%2C%0A%09%09%20%20%20%20%22Properties%22%20%3A%20%7B%0A%09%09%09%09%22QueueName%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22QueueName%22%20%7D%0A%09%09%20%20%20%20%7D%0A%09%09%7D%2C%0A%0A%20%20%20%20%20%20%20%20%22TheTopic%22%20%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Type%22%20%3A%20%22AWS%3A%3ASNS%3A%3ATopic%22%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Properties%22%20%3A%20%7B%0A%09%09%09%09%22TopicName%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22TopicName%22%20%7D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22Subscription%22%20%3A%20%5B%0A%09%09%09%09%09%7B%22Protocol%22%20%3A%20%22sqs%22%2C%20%22Endpoint%22%20%3A%20%7B%22Fn%3A%3AGetAtt%22%20%3A%20%5B%20%22TheQueue%22%2C%20%22Arn%22%5D%7D%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%5D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%09%7D%2C%0A%0A%09%22Outputs%22%20%3A%20%7B%0A%09%09%22TopicARN%22%20%3A%20%7B%0A%09%09%20%20%20%20%22Value%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22TheTopic%22%20%7D%0A%09%09%7D%2C%0A%20%20%20%20%20%20%20%20%22QueueURL%22%20%3A%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%22Value%22%20%3A%20%7B%20%22Ref%22%20%3A%20%22TheQueue%22%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%09%7D%0A%7D%0A")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test case that is failing which I didn't remember was a new test case until reviewing the changes again. This likely passes on a Linux based machine but will fail on Windows because of how some of the methods automatically translate \n to \r\n in windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah the input needs to be normalized. I can only do that after my vacation. Feel free to push it if you have time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it happen on checkout on the build agent?

@danielmarbach
Copy link
Contributor Author

danielmarbach commented Jul 9, 2024

Honestly looking at this again on my phone I'd argue the only thing that would have been needed to cover the bug in 618ef3e would have been a long enough string. Fighting line endings and going back and forth on this PR for trying to fit the template text into here seems a bit pointless and unnecessarily holding this up. How about I remove this test case for now?

I can send an update PR on after vacation (pinky finger promise)

@danielmarbach
Copy link
Contributor Author

@boblodgett based on the above comment I removed the template test for now. Having some tests in place for the method where there previously have been none is already a good start and the template was only a temporary thing anyway which helped me discover the buffer length problem. Ideally, there would be another test with a longer input but I cannot do this right now since I'm not around my laptop or workstation for a while

@danielmarbach
Copy link
Contributor Author

And thanks for your support here and sorry for me being less helpful this time

@boblodgett
Copy link
Contributor

Backend build was successful. Approved and am going to merge this in.

@boblodgett boblodgett merged commit 7bd0df1 into aws:v4-development Jul 10, 2024
@danielmarbach danielmarbach deleted the UrlEncode branch July 10, 2024 20:08
peterrsongg pushed a commit that referenced this pull request Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants