Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay after bloating test image #6014

Closed
wants to merge 1 commit into from
Closed

Conversation

coryrc
Copy link
Contributor

@coryrc coryrc commented Nov 12, 2019

Some systems don't prevent memory allocation, but kill any pods that exceed
memory usage after some time. So add a 1-second delay after bloating memory
in the autoscale test image.

Fixes #6007

It's done at this location because it still logs the increase in memory if possible.

Some systems don't prevent memory allocation, but kill any pods that exceed
memory usage after some time. So add a 1-second delay after bloating memory
in the autoscale test image.
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 12, 2019
@knative-prow-robot knative-prow-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 12, 2019
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coryrc: 0 warnings.

In response to this:

Some systems don't prevent memory allocation, but kill any pods that exceed
memory usage after some time. So add a 1-second delay after bloating memory
in the autoscale test image.

Fixes #6007

It's done at this location because it still logs the increase in memory if possible.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the area/test-and-release It flags unit/e2e/conformance/perf test issues for product features label Nov 12, 2019
@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-unit-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-unit-tests:

pkg/activator/net.TestThrottlerWithError
pkg/activator/net.TestThrottlerWithError/both_requests_time_out

@coryrc
Copy link
Contributor Author

coryrc commented Nov 13, 2019

/retest

@coryrc
Copy link
Contributor Author

coryrc commented Nov 13, 2019

/assign @vagababov

@@ -173,6 +173,7 @@ func handler(w http.ResponseWriter, r *http.Request) {
go func() {
defer wg.Done()
fmt.Fprint(w, bloat(mb))
time.Sleep(time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we already have a delay of 1s or more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? The sleep option? Each bloat/sleep/prime etc runs in parallel, so it won't have any unanticipated effects unless a bloat and a sleep < 1s occur in the same call and the test expects it to get back right away (which does not appear to be the case because everything passes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess. Do you think 1s is enough for things to be killed?

Copy link
Contributor Author

@coryrc coryrc Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for fully-managed Cloud Run. It has no effect on k8s-based platforms.

@vagababov
Copy link
Contributor

/lgtm
/approve
/hold
for the question

@knative-prow-robot knative-prow-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Nov 13, 2019
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: coryrc, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2019
@coryrc
Copy link
Contributor Author

coryrc commented Nov 14, 2019

/assign @dgerd

@dgerd
Copy link

dgerd commented Nov 14, 2019

I really don't like the idea of inserting a random sleep to get this to work. In fact I don't like the timing aspect of the test at all, and that the only way to observe failures without timing is to reach down into the Pod. I see two options:

  1. Move the test from conformance to e2e -- We have some coverage of resource.limits through the cgroup runtime test and we don't have anything in our specification on how these limits are enforced.
    In our API specification we link out to K8s which says If a Container exceeds its memory limit, it might be terminated. If it is restartable, the kubelet will restart it, as with any other type of runtime failure. I believe any container runtime that is cgroups based is going to see a restart, but given that K8s takes such a light stance here with might restart it I could see moving this to e2e to keep coverage and detect regressions, but remove it from conformance.

  2. Update our specification -- Update our runtime contract to add more details on how memory limits should be enforced. If we want to go this route we will want to take a closer look into how various container runtimes enforce this. Can you ever get more than the limit? How long can it be over the limit?

I don't think it is worth the effort to go down the second path at this time.

@coryrc
Copy link
Contributor Author

coryrc commented Nov 14, 2019

Going to go with Dan's request #1 and move it to e2e using this issue: #6006

@coryrc coryrc closed this Nov 14, 2019
@coryrc coryrc deleted the issue6007 branch November 14, 2019 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delay after bloating test image
6 participants