Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add rust well-known-endpoint tests #4884

Merged
merged 15 commits into from
Nov 16, 2024

Conversation

jmayclin
Copy link
Contributor

@jmayclin jmayclin commented Nov 9, 2024

Description of changes:

This test deletes the python well-known-endpoints test in favor of a rust well-known-endpoints test.

internet_http_client uses s2n-tls-hyper to make HTTP requests against a number of well known endpoints. The success of this test implies

  • successful HTTP interop
  • successful TLS interop

internet_kms_pq_client uses s2n-tls-tokio to make PQ requests against the kms service endpoints. The success of this test implies

  • successful PQ interop with existing PQ deployments.

Call-outs:

Opt In

The tests are not run by default, and must be explicitly opted into. This is because they are quite a bit slower than the other unit tests, taking about 10 seconds to run.

Retries

I do plan on adding retries to these tests to handle flaky network issues, but I don't want to do that until I have observed some of the specific errors. We don't currently have a good understanding of whether the flakiness comes from. TCP? TLS? DNS? Being able to actually observe these failures will allow us to

  1. document where the flakiness actually comes from
  2. have tightly scoped retries to catch other (real) transient errors.

Testing:

A new github workflow step was added to run the internet tests.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label Nov 9, 2024
@jmayclin jmayclin marked this pull request as ready for review November 9, 2024 02:05
@jmayclin jmayclin changed the title Well known endpoints test: add rust well-known-endpoint tests Nov 9, 2024
.github/workflows/ci_rust.yml Outdated Show resolved Hide resolved
bindings/rust/integration/tests/internet_http_client.rs Outdated Show resolved Hide resolved
bindings/rust/integration/tests/internet_http_client.rs Outdated Show resolved Hide resolved
bindings/rust/integration/tests/internet_http_client.rs Outdated Show resolved Hide resolved
bindings/rust/integration/tests/internet_kms_pq_client.rs Outdated Show resolved Hide resolved
Comment on lines 66 to 67
#[tokio::test]
async fn http_get_test() -> Result<(), Box<dyn std::error::Error>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing any retry mechanism-- is that handled by hyper automatically, or do you plan to add that in another PR? How flaky are these tests? Same for the PQ tests, since s2n-tls-tokio definitely doesn't handle retries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more information about this in the callouts, but

  1. it is not handled by hyper automatically
  2. I do plan on adding it in another PR, but not until actually observing some failures.
  3. I don't have any concrete data, but I have not seen any failures locally.

We don't currently have a good understanding of whether the flakiness comes from. TCP? TLS? DNS? Being able to actually observe these failures will allow us to

  1. document where the flakiness actually comes from
  2. have tightly scoped retries to catch other (real) transient errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Do you suspect that the flakiness we attributed to the network was actually coming from the IO parsing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect that, but I certainly wouldn't be surprised by that outcome 🤷 .

I'm also wondering whether the amount of queries that we were doing (e.g. 30 failed TLS handshakes for each domain per test run) might have led some domains to start refusing connections.

jmayclin and others added 2 commits November 11, 2024 15:15
* move akamai to more general tls client test
* make logging handle concurrent tests
* make kem_name return an Option<&str>
Ok("NONE") => None,
Ok(name) => Some(name),
Err(_) => {
// Unreachable: This would indicate a non-utf-8 string literal in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on whether to make this an actual unreachable! statement. I ended up deciding no?

* cargo fmt
jmayclin and others added 3 commits November 12, 2024 16:27
* make tls client more general
* add PQ feature
* add copyright header
@jmayclin jmayclin requested a review from lrstewart November 13, 2024 21:56
* move tests to specific network module
Comment on lines +18 to +19
fn pq_sanity_check() -> Result<(), Box<dyn std::error::Error>> {
let config = testing::build_config(&Policy::from_version("KMS-PQ-TLS-1-0-2020-07")?)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default_pq" might be a better / more readable option.

Might also be worth opening a little follow-up issue to add "default_pq" to the pq integration tests 🤔

Copy link
Contributor Author

@jmayclin jmayclin Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_pq will require us to add the kem_group API.

I think it's gonna be neater to to that in a separate PR, so I opened an issue to track this: #4896 . The issue explicitly calls out

  • updating the integration test
  • updating the pq sanity check
    as requirements for closing the issue.

@jmayclin jmayclin enabled auto-merge (squash) November 14, 2024 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants