Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Add functional-ish tests in golang / fix bugs #32542

Merged
merged 46 commits into from
Aug 8, 2022

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Jul 28, 2022

Fixes #31999

This PR turned out to be a bit of a monster. The main thing it does is add testing for functional 'scenarios', that is testing full heartbeat configs and the events they output going through all heartbeat specific code using the libbeat factory interface. This lets us do something sort of close to functional testing, where instead of using the heartbeat binary we actually use this factory interface. This gets us pretty close with a much simpler testing experience that's easier to debug and faster to run. It probably won't miss any error conditions from generating the full binary.

This also cleans up a bunch of linter issues in modified files, which is a bit distracting, things like missing error return values, shadowed variable names etc.

Finally, the new scenario test this adds, which checks that all events always contain a check group, revealed two bugs:

  1. Synthetics test runs create an empty event at the start of each run (with no check group)
  2. Any synth events prior to a journey start event would be missing a check group

This PR fixes this by:

  1. Moving the check group generation code for synthexec up to the stream enricher, rather than the journey enricher.
  2. Creating the check group UUIDs after journeys end, rather than waiting for them to start. The synthetics node agent can produce output before any journey starts

Why is it important?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

@andrewvc andrewvc added Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Jul 28, 2022
@andrewvc andrewvc self-assigned this Jul 28, 2022
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jul 28, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 28, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-08-08T13:14:12.813+0000

  • Duration: 66 min 29 sec

Test stats 🧪

Test Results
Failed 0
Passed 1648
Skipped 22
Total 1670

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewvc andrewvc mentioned this pull request Jul 29, 2022
6 tasks
@andrewvc andrewvc changed the title [Heartbeat] Refactor tests WIP [Heartbeat] Refactor tests Aug 2, 2022
andrewvc and others added 13 commits August 2, 2022 22:50
…check_group

This PR is WIP and not ready for review, it doesn't do everything it should yet.

In the situation where a browser job cannot start due to low memory (or anything that kills node / chrome too early) nothing will be visible in the Kibana UI due to the lack of a `check_group` field.

This is because that field is generated upon encountering the first `journey/start` event.

This PR fixes that by ensuring there's always a check group for single journey
monitors.
…ic#32536)

* update dependency elastic/go-structform from v0.0.9 to v0.0.10

Signed-off-by: Florian Lehner <[email protected]>

Co-authored-by: Craig MacKenzie <[email protected]>

Co-authored-by: Craig MacKenzie <[email protected]>
* add ulimit to debian containers

* add changelog
@andrewvc
Copy link
Contributor Author

andrewvc commented Aug 4, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Aug 4, 2022

update

✅ Branch has been successfully updated

Copy link
Contributor

@lucasfcosta lucasfcosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I didn't have any blocking comments. I tried really hard to see how mpx could become nil, but I couldn't see any way for it to become nil.

Also, it's quite annoying Jenkins is not showing you any of the logs 😢 I think we may have to use fmt here.

I've just started a repeat 10000 for dlv with a conditional breakpoint to see if I can catch this failure locally in a bit and debug.

heartbeat/beater/heartbeat.go Outdated Show resolved Hide resolved
heartbeat/beater/heartbeat.go Outdated Show resolved Hide resolved
heartbeat/hbtestllext/isdefs.go Outdated Show resolved Hide resolved
x-pack/heartbeat/monitors/browser/synthexec/synthexec.go Outdated Show resolved Hide resolved
x-pack/heartbeat/scenarios/framework.go Outdated Show resolved Hide resolved
return mtr, err
}

func makeTestFactory() (factory *monitors.RunnerFactory, sched *scheduler.Scheduler, close func()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a function which takes 0 arguments, does it still need to be a "factory"? Can't we simply define the returned function straightaway and use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess this is confusing, it isn't a factory for making tests, it's a function that makes factories. Renamed to setupFactoryAndSched

Comment on lines +158 to +159
FirstStart: time.Now(),
StartTime: time.Now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: perhaps we should save this right before so that we can avoid them being different if there's a small drift and we want to be ultra precise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, this info isn't actually used anywhere AFAIK, usually in tests all the fields are just left undefined. I also don't think any delta here would be meaningful, so in short, I'm 99% sure we're good.

}

// make a pipeline
pipe := &monitors.MockPipeline{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the way you used the mock pipeline to append items to an array! Really elegant reuse of the monitor code infra here.

@andrewvc
Copy link
Contributor Author

andrewvc commented Aug 4, 2022

I think I've addressed all your comments. I also added a fix in synthexec for a situation where the command can't be found. I think this may have some association with the weirdness we saw with browser monitors, but I also fixed the code to not try to run browser monitors on CI. We never should have been trying since the synthetics agent isn't present on CI.

I'm fine leaving these tests effectively disabled for now until such time as we fix that, at which point we can debug further. Esp. since locally they still run.

Copy link
Contributor

@lucasfcosta lucasfcosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Also cloned the repo and ran the functional tests locally to make sure they were running smoothly.

Although I have approved this, please do have a look at my comment about logging the os.Environ. I do think that's important to address and it would be great if you could check that comment and see if you think it's worth doing indeed.

All other comments are not blocking.

}

func (s SyncPipelineClientAdaptor) Wait() {
// intentionally blank, async pipelines should be empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wish we could have conditional types for these things like in TypeScript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I wish so too

@@ -27,6 +28,8 @@ func create(name string, cfg *config.C) (p plugin.Plugin, err error) {
// We don't want users running synthetics in environments that don't have the required GUI libraries etc, so we check
// this flag. When we're ready to support the many possible configurations of systems outside the docker environment
// we can remove this check.
ev := os.Environ()
logp.L().Info("EV %v", ev)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this log line here? It feels to me we shouldn't log all the env. vars to stdout as that could leak information to logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, good catch! I thought I'd pulled all of those

@@ -263,7 +272,7 @@ func runCmd(
// Close mpx after the process is done and all events have been sent / consumed
go func() {
err := <-cmdDone
jsonWriter.Close()
_ = jsonWriter.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for us to do this assignment? Maybe it's my lack of familiarity with Go, but it doesn't seem it makes a difference in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes intellij's linter happier. _ = means discard this. It's the only assignment that never uses := because nothing is actually assigned. See https://www.geeksforgeeks.org/what-is-blank-identifierunderscore-in-golang/

TBH, I'm not sure I should be changing this vs. changing my intellij settings but it's certainly not harmful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear, the linter complains about ignored error values, since go doesn't have exceptions

Comment on lines 208 to 210
defer func() {
_ = jsonReader.Close()
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we defer as we were doing previously?

I tried to do some research to understand the difference between this and the previous defer pattern and I couldn't find anything that would explain it.

I found this response on SO interesting but it actually advocates for the opposite pattern: directly deferring (although I can see the advantage of a lambda if we're doing any error handling for the deferred function.

Maybe you're doing it to evaluate jsonReader only when the function exits? I couldn't see why that would be helpful in this case though since we only assign to it with os.Pipe() at the beginning of the function?

If this is indeed intentional perhaps it would be worth leaving a comment as to why we do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story WRT the linter and errors

@andrewvc andrewvc merged commit 3fe6871 into elastic:main Aug 8, 2022
@andrewvc andrewvc deleted the refactor-test branch August 8, 2022 15:52
@andrewvc
Copy link
Contributor Author

andrewvc commented Aug 8, 2022

@Mergifyio backport 8.4

@andrewvc andrewvc added the v8.4.0 label Aug 8, 2022
@andrewvc andrewvc changed the title [Heartbeat] Add functional-ish tests in golang [Heartbeat] Add functional-ish tests in golang / fix bugs Aug 8, 2022
mergify bot pushed a commit that referenced this pull request Aug 8, 2022
Fixes #31999

This PR turned out to be a bit of a monster. The main thing it does is add testing for functional 'scenarios', that is testing full heartbeat configs and the events they output going through all heartbeat specific code using the libbeat factory interface. This lets us do something sort of close to functional testing, where instead of using the heartbeat binary we actually use this factory interface. This gets us pretty close with a much simpler testing experience that's easier to debug and faster to run. It probably won't miss any error conditions from generating the full binary.

This also cleans up a bunch of linter issues in modified files, which is a bit distracting, things like missing error return values, shadowed variable names etc.

Finally, the new scenario test this adds, which checks that all events always contain a check group, revealed two bugs:

Synthetics test runs create an empty event at the start of each run (with no check group)
Any synth events prior to a journey start event would be missing a check group
This PR fixes this by:

Moving the check group generation code for synthexec up to the stream enricher, rather than the journey enricher.
Creating the check group UUIDs after journeys end, rather than waiting for them to start. The synthetics node agent can produce output before any journey starts

(cherry picked from commit 3fe6871)
@mergify
Copy link
Contributor

mergify bot commented Aug 8, 2022

backport 8.4

✅ Backports have been created

@@ -206,6 +205,9 @@ func (je *journeyEnricher) createSummary(event *beat.Event) error {
return je.error
}

// create a new check group for the next journey
je.streamEnricher.checkGroup = makeUuid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Would check-group be unique if we run zip urls which could contain multiple journeys since we moved the generation to the stream enricher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should still create multiple check groups for zip URLs. I don't think we want to assert this at the new functional tests though, since my hope is that zip urls will be gone soon enough.

cmacknz pushed a commit that referenced this pull request Aug 24, 2022
Fixes #31999

This PR turned out to be a bit of a monster. The main thing it does is add testing for functional 'scenarios', that is testing full heartbeat configs and the events they output going through all heartbeat specific code using the libbeat factory interface. This lets us do something sort of close to functional testing, where instead of using the heartbeat binary we actually use this factory interface. This gets us pretty close with a much simpler testing experience that's easier to debug and faster to run. It probably won't miss any error conditions from generating the full binary.

This also cleans up a bunch of linter issues in modified files, which is a bit distracting, things like missing error return values, shadowed variable names etc.

Finally, the new scenario test this adds, which checks that all events always contain a check group, revealed two bugs:

Synthetics test runs create an empty event at the start of each run (with no check group)
Any synth events prior to a journey start event would be missing a check group
This PR fixes this by:

Moving the check group generation code for synthexec up to the stream enricher, rather than the journey enricher.
Creating the check group UUIDs after journeys end, rather than waiting for them to start. The synthetics node agent can produce output before any journey starts

(cherry picked from commit 3fe6871)

Co-authored-by: Andrew Cholakian <[email protected]>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
Fixes #31999

This PR turned out to be a bit of a monster. The main thing it does is add testing for functional 'scenarios', that is testing full heartbeat configs and the events they output going through all heartbeat specific code using the libbeat factory interface. This lets us do something sort of close to functional testing, where instead of using the heartbeat binary we actually use this factory interface. This gets us pretty close with a much simpler testing experience that's easier to debug and faster to run. It probably won't miss any error conditions from generating the full binary.

This also cleans up a bunch of linter issues in modified files, which is a bit distracting, things like missing error return values, shadowed variable names etc.

Finally, the new scenario test this adds, which checks that all events always contain a check group, revealed two bugs:

Synthetics test runs create an empty event at the start of each run (with no check group)
Any synth events prior to a journey start event would be missing a check group
This PR fixes this by:

Moving the check group generation code for synthexec up to the stream enricher, rather than the journey enricher.
Creating the check group UUIDs after journeys end, rather than waiting for them to start. The synthetics node agent can produce output before any journey starts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Heartbeat] Add integration tests for browser monitors
7 participants