Wathola Tracing for upgrade tests #6219

mgencur · 2022-03-02T11:56:47Z

Partially fixes #4481
Fixes points 1. and 2. from #6145 (comment)
The automated reporting of broken traces would be done in a separate PR.

Proposed Changes

register Zipkin exporter using trace config from the config-tracing config map in Knative Eventing namespace
adds trace instrumentation to these components: wathola-sender, wathola-forwarder, wathola-receiver
if config-tracing enables tracing and there's a backend storage (Zipkin or Jaeger) it will send Trace information to the storage
it is possible to display a complete Trace in Zipkin UI, going from wathola-sender to wathola-receiver, including the components in between (wathola-forwarder and Eventing core components), example below:

Pre-review Checklist

At least 80% unit test coverage
E2E tests for any new behavior
Docs PR for any user-facing impact
Spec PR for any new API feature
Conformance test for any change to the spec

Release Note

Docs

codecov · 2022-03-02T12:03:16Z

Codecov Report

Merging #6219 (9ff8b09) into main (7001b65) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #6219   +/-   ##
=======================================
  Coverage   82.18%   82.18%           
=======================================
  Files         231      231           
  Lines        7787     7787           
=======================================
  Hits         6400     6400           
  Misses        937      937           
  Partials      450      450

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7001b65...9ff8b09. Read the comment docs.

mgencur · 2022-03-02T13:49:58Z

/retest

mgencur · 2022-03-03T07:12:30Z

/retest

mgencur · 2022-03-03T08:20:22Z

/retest

knative-prow-robot · 2022-03-03T10:00:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mgencur
To complete the pull request process, please assign lberk after the PR has been reviewed.
You can assign the PR to them by writing /assign @lberk in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mgencur · 2022-03-03T11:08:57Z

/retest

mgencur · 2022-03-03T12:20:18Z

The failure in reconciler-tests is unrelated. I will re-run the tests later after getting some feedback on this PR.

knative-prow-robot · 2022-03-04T08:48:37Z

@mgencur: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-knative-eventing-reconciler-tests	`7aae60f`	link	false	`/test pull-knative-eventing-reconciler-tests`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

pierDipi · 2022-03-07T08:40:22Z

/cc @cardil

cardil · 2022-03-07T11:04:07Z

Thanks, @mgencur, for doing this. I will review it shortly...

mgencur · 2022-03-21T07:36:21Z

@cardil gentle ping. Two weeks have passed...

cardil

Looks great @mgencur! A lot of ❤️ for doing this.

I found only some minor nits.

cardil · 2022-03-25T20:51:03Z

test/lib/client.go

-	tracingEnv corev1.EnvVar
-	loggingEnv *corev1.EnvVar
+	TracingCfg string
+	LoggingCfg string


What's the reason for this change?

Is it the usage in config.toml? If so, It might have been used there as (only make it public):

tracingConfig = '{{- .TracingEnv.Value -}}'

Yes. It was for that reason. I somehow find it cleaner if the client holds the config instead of an EnvVar. The function that produces the config is called getTracingConfig. It can then create the EnvVar (which is only done in a single place anyway) or it can pass the config to Wathola. This change doesn't bring any complexity. But I don't have a strong opinion about this.

I'm not forcing this to rollback.

test/upgrade/prober/wathola/config/tracing.go

cardil · 2022-03-25T21:14:41Z

test/upgrade/prober/wathola/sender/services.go

+		// Give time to send tracing information.
+		time.Sleep(5 * time.Second)


I don't like this static wait, as it may introduce failures regarding grace period at later time.

It should be possible to know if tracing is already sent.

I don't think there's an easy way to do it. Except for querying back the Zipkin endpoint if the data was stored there (but I wouldn't really like to do this).
There's an open issue in opencensus census-instrumentation/opencensus-go#862
Anyway, the Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second.
Would that be alright? We're already thinking about not shutting down the Sender but using a different way to give it a signal to send the Finished event. So, having it implemented should then relax the need for the 1-second wait time at the end.
And I think waiting 1 second at the end is not too bad - it's a workaround for the missing feature in opencensus.

Right. I think Knative should move to OpenTelemetry as soon as possible. Maybe it will give us greater capabilities as well.

Yeah. There's an open issue for moving to OpenTelemetry. It's been there for a while...

I wonder if this might help: census-instrumentation/opencensus-go#862 (comment)?

I don't know. It's a different API for exporting "metrics" which we don't need now. And it looks like it can only export those metrics via ReadAndExport. Not the traces :-/ But I'm not 100% sure.

cardil · 2022-03-25T21:18:54Z

test/upgrade/prober/wathola/sender/services.go

@@ -154,14 +183,22 @@ func (h httpSender) Supports(endpoint interface{}) bool {
 }

 func (h httpSender) SendEvent(ce cloudevents.Event, endpoint interface{}) error {
+	return h.SendEventWithContext(context.Background(), ce, endpoint)


This line creates a new context object for every event sent.

We should use one context object, and it should be supporting signals (knative.dev/pkg/signals).

I don't think so. Each event requires a new context because the tracing information is stored in it (via the opencensus exporter.
The "main" branch uses context.Background() as well: https://github.com/knative/eventing/blob/main/test/upgrade/prober/wathola/sender/services.go#L162
I could possibly use return h.SendEventWithContext(signals.NewContext(), ce, endpoint) but I am not sure what it brings. The context we're using here is for setting up information that is sent with the event - the event is sent and we're done with that (we're not waiting for a shutdown signal there). When I look at where signals.NewContext() is used it's mainly in long running "main" funcitons for catching signals. But the Sender itself has its own loop for handling signals in SendContinually.
So my changes do not change the original behaviour (we have been using context.Background().

Okay. We could shift to using signals.NewContext() by removing manual signals handling in SendContinually, and then create subcontext for each send. But, you got it right. It's not worth it.

The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second.

mgencur · 2022-03-28T09:52:58Z

Rebased, made loggingCfg private and reduced the sleep time for sending the tracing info down to 1 second.

* The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes.

mgencur · 2022-03-28T12:32:13Z

The failures in E2E tests here is unrelated:

2022-03-28T11:52:59.3323318Z === CONT  TestBrokerNamespaceDefaulting
2022-03-28T11:52:59.3324361Z     broker_defaults_webhook_test.go:171: 
2022-03-28T11:52:59.3325579Z         	Error Trace:	broker_defaults_webhook_test.go:171
2022-03-28T11:52:59.3326313Z         	            				wait.go:220
2022-03-28T11:52:59.3326863Z         	            				wait.go:233
2022-03-28T11:52:59.3327386Z         	            				wait.go:660
2022-03-28T11:52:59.3327918Z         	            				wait.go:594
2022-03-28T11:52:59.3328452Z         	            				wait.go:458
2022-03-28T11:52:59.3328986Z         	            				wait.go:443
2022-03-28T11:52:59.3329824Z         	            				broker_defaults_webhook_test.go:163
2022-03-28T11:52:59.3330224Z         	Error:      	Not equal: 
2022-03-28T11:52:59.3330716Z         	            	expected: "PT0.5S"
2022-03-28T11:52:59.3331192Z         	            	actual  : "PT0.2S"
2022-03-28T11:52:59.3331538Z         	            	
2022-03-28T11:52:59.3331912Z         	            	Diff:
2022-03-28T11:52:59.3332621Z         	            	--- Expected
2022-03-28T11:52:59.3333028Z         	            	+++ Actual
2022-03-28T11:52:59.3333508Z         	            	@@ -1 +1 @@
2022-03-28T11:52:59.3333946Z         	            	-PT0.5S
2022-03-28T11:52:59.3334330Z         	            	+PT0.2S
2022-03-28T11:52:59.3334780Z         	Test:       	TestBrokerNamespaceDefaulting
2022-03-28T11:52:59.3340279Z     broker_defaults_webhook_test.go:172: 
2022-03-28T11:52:59.3340885Z         	Error Trace:	broker_defaults_webhook_test.go:172
2022-03-28T11:52:59.3341477Z         	            				wait.go:220
2022-03-28T11:52:59.3342013Z         	            				wait.go:233
2022-03-28T11:52:59.3342534Z         	            				wait.go:660
2022-03-28T11:52:59.3343065Z         	            				wait.go:594
2022-03-28T11:52:59.3343591Z         	            				wait.go:458
2022-03-28T11:52:59.3344104Z         	            				wait.go:443
2022-03-28T11:52:59.3345119Z         	            				broker_defaults_webhook_test.go:163
2022-03-28T11:52:59.3345514Z         	Error:      	Not equal: 
2022-03-28T11:52:59.3346033Z         	            	expected: 5
2022-03-28T11:52:59.3346436Z         	            	actual  : 10
2022-03-28T11:52:59.3346858Z         	Test:       	TestBrokerNamespaceDefaulting

cardil · 2022-03-28T13:44:10Z

test/upgrade/prober/wathola/config/tracing.go

+	if err != nil {
+		Log.Warn("Tracing configuration is invalid, using the no-op default", zap.Error(err))
+	}
+	if err = tracing.SetupStaticPublishing(Log, "", config); err != nil {


I'm thinking that maybe returning here the *tracing.OpenCensusTracer allows later to call Finish method.

func SetupStaticPublishing(logger *zap.SugaredLogger, serviceName string, cfg *config.Config) (*tracing.OpenCensusTracer, error) { oct := tracing.NewOpenCensusTracer(tracing.WithExporter(serviceName, logger)) if err := oct.ApplyConfig(cfg); err != nil { return nil, fmt.Errorf("unable to set OpenCensusTracing config: %w", err) } return oct, nil }

This method should call Close() method of HTTP reporter, which should break the send loop, and try sending the last batch before shutdown. I think it should be enough to send all registered spans.

Thanks. Let me try it.

So this whole approach doesn't work because on this line there's nil passed to the function: , and then on this line it throws "panic: runtime error: invalid memory address or nil pointer dereference"
It is also not possible to call tracing.SetupTracing() again (to force calling the reporter Close method) because the tracer is already registered and can't be done again.
As a side note, I've verified that 1.5 seconds is enough for the spans/traces to be sent properly.
I recommend using 1.5 seconds sleep time and call it done.

Filed knative/pkg#2475

cardil

/lgtm

Looks great! Thanks @mgencur for doing this and for knative/pkg#2475 as well!

pierDipi

Very minor and non-blocking comment.

/lgtm
/approve

pierDipi · 2022-03-29T12:47:20Z

test/lib/creation.go

+		if c.loggingCfg != "" {
+			pod.Containers[i].Env = append(pod.Containers[i].Env, corev1.EnvVar{Name: ti.ConfigLoggingEnv, Value: c.loggingCfg})
 		}


Perhaps we can refactor logging and tracing in [1] to be consistent when "" is provided because it looks weird that we're handling the empty string only for the logging config.

[1]:

eventing/test/test_images/utils.go

Lines 59 to 85 in eb2bfff

// ConfigureTracing can be used in test-images to configure tracing

func ConfigureTracing(logger *zap.SugaredLogger, serviceName string) error {

tracingEnv := os.Getenv(ConfigTracingEnv)

if tracingEnv == "" {

return tracing.SetupStaticPublishing(logger, serviceName, config.NoopConfig())

}

conf, err := config.JSONToTracingConfig(tracingEnv)

if err != nil {

return err

}

return tracing.SetupStaticPublishing(logger, serviceName, conf)

}

// ConfigureTracing can be used in test-images to configure tracing

func ConfigureLogging(ctx context.Context, name string) context.Context {

loggingEnv := os.Getenv(ConfigLoggingEnv)

conf, err := logging.JSONToConfig(loggingEnv)

if err != nil {

logging.FromContext(ctx).Warn("Error while trying to read the config logging env: ", err)

return ctx

}

l, _ := logging.NewLoggerFromConfig(conf, name)

return logging.WithLogger(ctx, l)

}

@pierDipi Did you mean something like this? #6289

I meant the other way around, support in ConfigureLogging an empty loggingEnv so that we can drop if c.loggingCfg != "" { in this PR

knative-prow · 2022-03-29T12:49:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cardil, mgencur, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pierDipi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue

* Wathola Tracing for upgrade tests (#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (#6289) * Print traces for missed events in upgrade tests (#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (#6331) * NPE fix (#6343) Co-authored-by: Chris Suszynski <[email protected]>

* wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue

* Wathola Tracing for upgrade tests (knative#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (knative#6289) * Print traces for missed events in upgrade tests (knative#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (knative#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (knative#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (knative#6331) * NPE fix (knative#6343) Co-authored-by: Chris Suszynski <[email protected]>

* Wathola Tracing for upgrade tests (#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (#6289) * Print traces for missed events in upgrade tests (#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (#6331) * NPE fix (#6343) Co-authored-by: Martin Gencur <[email protected]> Co-authored-by: Chris Suszynski <[email protected]>

* Wathola Tracing for upgrade tests (knative#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (knative#6289) * Print traces for missed events in upgrade tests (knative#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (knative#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (knative#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (knative#6331) * NPE fix (knative#6343) Co-authored-by: Chris Suszynski <[email protected]> Co-authored-by: Martin Gencur <[email protected]> Co-authored-by: Chris Suszynski <[email protected]>

* Wathola Tracing for upgrade tests (knative#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (knative#6289) * Print traces for missed events in upgrade tests (knative#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (knative#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (knative#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (knative#6331) * NPE fix (knative#6343) Co-authored-by: Chris Suszynski <[email protected]>

* Wathola Tracing for upgrade tests (knative#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (knative#6289) * Print traces for missed events in upgrade tests (knative#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (knative#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (knative#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (knative#6331) * NPE fix (knative#6343) Co-authored-by: Chris Suszynski <[email protected]> Co-authored-by: Martin Gencur <[email protected]> Co-authored-by: Chris Suszynski <[email protected]>

* Wathola Tracing for upgrade tests (knative#6219) * wathola exposing trace information * Run update-deps.sh * Fix license * Fix import * Ensure backwards compatibility * Assert ParentID not nil in test * Separate old and new events sender APIs * Make loggingCfg in client private * Wait only 1 second for flushing tracing info The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second. * Increase the sleep time to 1.5 seconds to be safe * The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes. * Use Log.Fatal when tracing is not set up properly * Increase the sleep time to 5 seconds and reference knative/pkg issue * Process empty tracing config in test images (knative#6289) * Print traces for missed events in upgrade tests (knative#6249) * Upgrade tests reporting Trace information for missed events * TMP: Induce missed event * Revert "TMP: Induce missed event" This reverts commit 2fec7c7. * Report trace also for Duplicated events * TMP: Induce missed event * TMP: Simulate duplicate events * Fix readme * Unify path for duplicate and missed events * Revert "TMP: Simulate duplicate events" This reverts commit c126521. * Revert "TMP: Induce missed event" This reverts commit fcd9185. * Do not fail upgrade tests if tracing is not configured (knative#6299) * Do not fail upgrade tests if tracing is not configured * TMP: Do not deploy Knative Monitoring * Revert "TMP: Do not deploy Knative Monitoring" This reverts commit 086a8f9. * Limit the number of exported traces (knative#6329) Exporting traces for a large number of events can exceed the timeout of the whole test suite, leading to all upgrade tests being reported as failed. * Cleanup Zipkin tracing only once in upgrade test suite (knative#6331) * NPE fix (knative#6343) Co-authored-by: Chris Suszynski <[email protected]>

knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/test-and-release Test infrastructure, tests or release labels Mar 2, 2022

mgencur changed the title ~~Wathola tracing main~~ Wathola Tracing Mar 2, 2022

mgencur changed the title ~~Wathola Tracing~~ Wathola Tracing for upgrade tests Mar 3, 2022

knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 4, 2022

knative-prow-robot requested a review from cardil March 7, 2022 08:40

This was referenced Mar 8, 2022

[WIP] Print traces for missed events in upgrade tests #6246

Closed

Print traces for missed events in upgrade tests #6249

Merged

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2022

cardil requested changes Mar 25, 2022

View reviewed changes

knative-prow-robot assigned cardil Mar 25, 2022

mgencur added 7 commits March 28, 2022 10:02

wathola exposing trace information

5e022a3

Run update-deps.sh

3c59d7f

Fix license

7b7b79f

Fix import

6a34efe

Ensure backwards compatibility

a40d695

Assert ParentID not nil in test

cf4ee28

Separate old and new events sender APIs

3df5133

mgencur added 2 commits March 28, 2022 11:20

Make loggingCfg in client private

e86a116

Wait only 1 second for flushing tracing info

c3798cf

The Reporter is created with a default batch interval 1 second. So, it should be enough to wait just 1 second because the data is flushed every 1 second.

mgencur force-pushed the wathola_tracing_main branch from 7aae60f to c3798cf Compare March 28, 2022 09:52

knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2022

mgencur added 2 commits March 28, 2022 12:16

Increase the sleep time to 1.5 seconds to be safe

0c61842

* The ticker runs every 100ms so it could be 1100 ms until the buffer really flushes.

Use Log.Fatal when tracing is not set up properly

4c3885f

cardil reviewed Mar 28, 2022

View reviewed changes

mgencur mentioned this pull request Mar 29, 2022

Make sure OpenCensusTracer from tracing package can be correctly shut down (flushed) knative/pkg#2475

Closed

Increase the sleep time to 5 seconds and reference knative/pkg issue

9ff8b09

cardil approved these changes Mar 29, 2022

View reviewed changes

knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 29, 2022

pierDipi approved these changes Mar 29, 2022

View reviewed changes

knative-prow bot assigned pierDipi Mar 29, 2022

knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 29, 2022

knative-prow bot merged commit 3890b39 into knative:main Mar 29, 2022

mgencur mentioned this pull request Mar 29, 2022

Process empty tracing config in test images #6289

Merged

5 tasks

cardil mentioned this pull request Apr 1, 2022

An automated debugging tooling for the Eventing #6145

Closed

mgencur mentioned this pull request Aug 3, 2022

SetupStaticPublishing and SetupDynamicPublishing returns Tracer with Shutdown function knative/pkg#2566

Merged

		// Give time to send tracing information.
		time.Sleep(5 * time.Second)

	// ConfigureTracing can be used in test-images to configure tracing
	func ConfigureTracing(logger *zap.SugaredLogger, serviceName string) error {
	tracingEnv := os.Getenv(ConfigTracingEnv)

	if tracingEnv == "" {
	return tracing.SetupStaticPublishing(logger, serviceName, config.NoopConfig())
	}

	conf, err := config.JSONToTracingConfig(tracingEnv)
	if err != nil {
	return err
	}

	return tracing.SetupStaticPublishing(logger, serviceName, conf)
	}

	// ConfigureTracing can be used in test-images to configure tracing
	func ConfigureLogging(ctx context.Context, name string) context.Context {
	loggingEnv := os.Getenv(ConfigLoggingEnv)
	conf, err := logging.JSONToConfig(loggingEnv)
	if err != nil {
	logging.FromContext(ctx).Warn("Error while trying to read the config logging env: ", err)
	return ctx
	}
	l, _ := logging.NewLoggerFromConfig(conf, name)
	return logging.WithLogger(ctx, l)
	}

Wathola Tracing for upgrade tests #6219

Wathola Tracing for upgrade tests #6219

Conversation

mgencur commented Mar 2, 2022 • edited Loading

Proposed Changes

Pre-review Checklist

codecov bot commented Mar 2, 2022 • edited Loading

Codecov Report

mgencur commented Mar 2, 2022

mgencur commented Mar 3, 2022

mgencur commented Mar 3, 2022

knative-prow-robot commented Mar 3, 2022

mgencur commented Mar 3, 2022

mgencur commented Mar 3, 2022

knative-prow-robot commented Mar 4, 2022

pierDipi commented Mar 7, 2022

cardil commented Mar 7, 2022

mgencur commented Mar 21, 2022

cardil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgencur Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cardil Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgencur commented Mar 28, 2022

mgencur commented Mar 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cardil left a comment

Choose a reason for hiding this comment

pierDipi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knative-prow bot commented Mar 29, 2022

mgencur commented Mar 2, 2022 •

edited

Loading

codecov bot commented Mar 2, 2022 •

edited

Loading

mgencur Mar 28, 2022 •

edited

Loading

cardil Mar 28, 2022 •

edited

Loading