Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-24840 Docker client API for integration tests #1792

Merged
merged 7 commits into from
Sep 6, 2024

Conversation

robbycochran
Copy link
Collaborator

@robbycochran robbycochran commented Aug 14, 2024

Description

Refactor integration tests from using docker process execution to using the docker engine API.

  • Decouple k8s integration tests from container test actions (this made it easier to refactor). K8sNamespaceTestSuite no longer inherits from the BaseIntegrationTestSuite.
  • Remove all shell execution from test suite code -- all system interactions are through Executor interface.
  • Launch all containers using unified configuration (ContainerStartConfig) rather than command line args
  • Remove Docker process executor
  • Add -dockerized test targets in the integration-test makefile, e.g. make -C integration-tests TestProcessNetwork-dockerized
  • Adds logging to provide visibility and help with debugging issues. Also print last few lines of collector logs when there is a non-zero exit.

Logging example

2024/08/29 23:20:50 INFO: read credentials for quay.io from /root/.docker/config.json
2024/08/29 23:20:51 INFO: registry login success: quay.io
2024/08/29 23:20:51 INFO: quay.io/rhacs-eng/collector-performance:stats-1.1.3 already exists
2024/08/29 23:20:51 INFO: start container-stats with quay.io/rhacs-eng/collector-performance:stats-1.1.3 (19a07c711590)
2024/08/29 23:20:51 INFO: pulling quay.io quay.io/rhacs-eng/collector:3.19.x-73-g3523d378a9
2024/08/29 23:21:10 INFO: pulled quay.io/rhacs-eng/collector:3.19.x-73-g3523d378a9
2024/08/29 23:21:13 INFO: start collector with quay.io/rhacs-eng/collector:3.19.x-73-g3523d378a9 (f3a2749e296c)
2024/08/29 23:21:13 INFO: collector has healthcheck: CMD-SHELL /usr/local/bin/status-check.sh
2024/08/29 23:22:13 ERROR: Waiting for container collector to become health=healthy, elapsed time: 1m0.002452957s
2024/08/29 23:22:13 ERROR: Timed out waiting for container collector to become health=healthy, elapsed Time: 

...

    	            	collector AbortHandler 0xa6d161 + 38
    	            	/lib64/libc.so.6 (null) 0x7dabbca2a6f0 + 0
    	            	/lib64/libc.so.6 (null) 0x7dabbca7794c + 0
    	            	/lib64/libc.so.6 raise 0x7dabbca2a646 + 22
    	            	/lib64/libc.so.6 abort 0x7dabbca147f3 + 211
    	            	/lib64/libstdc++.so.6 (null) 0x7dabbcd8cb21 + 0
    	            	/lib64/libstdc++.so.6 (null) 0x7dabbcd9852c + 0
    	            	/lib64/libstdc++.so.6 (null) 0x7dabbcd98597 + 0
    	            	/lib64/libstdc++.so.6 (null) 0x7dabbcd987f9 + 0
    	            	collector collector::KernelDriverCOREEBPF::Setup(collector::CollectorConfig const&, sinsp&) 0xb1b1c9 + 303
    	            	collector collector::system_inspector::Service::InitKernel(collector::CollectorConfig const&) 0xb17d98 + 1530
    	            	collector collector::CollectorService::InitKernel() 0xa7dc62 + 38
    	            	collector collector::SetupKernelDriver(collector::CollectorService&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, collector::CollectorConfig const&) 0xa7de08 + 159
    	            	collector main 0xa69e1d + 1269
    	            	/lib64/libc.so.6 (null) 0x7dabbca15590 + 0
    	            	/lib64/libc.so.6 __libc_start_main 0x7dabbca15640 + 128
    	            	collector _start 0xa68c65 + 37
    	            	Caught signal 6 (SIGABRT): Aborted
    	            	/bootstrap.sh: line 85:     9 Aborted                 eval exec "$@"
    	            	
    	Test:       	TestProcessNetwork</code></pre></td></tr>

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added integration tests
    - [ ] Added unit tests
    - [ ] Added regression tests

If any of these don't apply, please comment below.

Testing Performed

checked logs

@robbycochran robbycochran requested a review from a team as a code owner August 14, 2024 00:20
@robbycochran robbycochran added all-integration-tests run-multiarch-builds Run steps for non-x86 archs. run-benchmark Ask to run benchmark on a PR and compare it with the baseline labels Aug 19, 2024
Copy link

VM Method Baseline CPU median (%) Test CPU median (%) CPU P-value

VM Method Baseline Memory median (MiB) Test Memory median (MiB) Memory P-value

@robbycochran robbycochran force-pushed the rc-gotest-logging branch 2 times, most recently from 97e55a2 to 3736de1 Compare August 19, 2024 22:11
@robbycochran robbycochran force-pushed the rc-docker-client branch 2 times, most recently from 5d02290 to 233552d Compare August 26, 2024 22:35
@robbycochran robbycochran changed the title Docker client API for integration tests ROX-24840 Docker client API for integration tests Aug 29, 2024
@robbycochran robbycochran changed the base branch from rc-gotest-logging to master August 29, 2024 18:04
@robbycochran robbycochran added debug and removed debug labels Aug 29, 2024
Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a bit of comments, but nothing major. The change is looking great!

Comment on lines 73 to 75
if l.level <= WarnLevel {
l.logger.Printf("WARN: "+format, v...)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move some of this logic into the Log method?

Suggested change
if l.level <= WarnLevel {
l.logger.Printf("WARN: "+format, v...)
}
l.Log(WarnLevel, format, v...)

Then the log method can look something like this:

func (l *Logger) Log(level LogLevel, format string, v ...interface{}) {
	if level < l.level {
		return
	}

	// Maybe this next part can be done with a const array? Haven't checked.
	switch level {
	case TraceLevel:
		l.logger.Print("TRACE: ")
	case DebugLevel:
		l.logger.Print("DEBUG: ")
	...
	}

	l.logger.Printf(format, v...)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading some other parts of the code I noticed the log messages need to add a line manually, we might want to ensure a newline is added in here so we don't have to worry about it in other places of the code.

Copy link
Collaborator Author

@robbycochran robbycochran Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I refactored the logging code and removed the extra newlines

authConfigs := map[string]string{}
auths, err := readRegistryConfigs()
if err != nil {
log.Error("Error reading registry auth files: %s", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stop execution at this point? I now right now we have images on private registries that required authentication, so tests will just fail if we keep executing.

Suggested change
log.Error("Error reading registry auth files: %s", err)
return nil, log.Error("Error reading registry auth files: %s", err)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the registry login to only occur if we pull the image which reduces the calls to registry login and as a side effect allows the tests to operate in situations where all the images are local and registry creds are not needed.


func (d *dockerAPIExecutor) StartContainer(startConfig config.ContainerStartConfig) (string, error) {
ctx := context.Background()
defer d.client.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this close the client when this method ends or once the *dockerAPIExecutor goes out of scope? If the first is true, will other calls to StartContainer fail? If it's the second one, is it even needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question applies to all places where this same line is found.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it isn't necessary. I don't think it closes the client, just closes the existing connections. I've removed calls to it.

integration-tests/pkg/executor/executor_docker_api.go Outdated Show resolved Hide resolved
integration-tests/pkg/executor/executor_docker_api.go Outdated Show resolved Hide resolved
integration-tests/suites/base.go Outdated Show resolved Hide resolved
integration-tests/suites/base.go Outdated Show resolved Hide resolved
integration-tests/suites/base.go Outdated Show resolved Hide resolved
integration-tests/suites/base.go Outdated Show resolved Hide resolved
integration-tests/suites/k8s_namespace.go Outdated Show resolved Hide resolved
@robbycochran robbycochran force-pushed the rc-docker-client branch 2 times, most recently from 4ea6a82 to aebefef Compare September 5, 2024 18:30
018a0f37a Add basic logging utility
4a33e95de mv
3ae29dcdd using log pkg for sleep msg
8265dbe tail collector logs on non-zero exit
422120b Remove K8sExecutor from Executor interface
9da606d [wip] docker client update
f542804 docker client file
68fb0e9 benchmark
7cee1bb config
4d0a62c no retry on inspect for pull
f97710d Include entrypoint
ed4f05a Add PullImage from Giles
786fc87 PullImage wrapper
0011133 refactor, simplify
f3e9696 fmt
11b1f8d plop entrypoint
bf82e49 use docker-api by default, control with env CONTAINER_EXECUTOR
83a321a debug and trim logs
741b98f copy multiplexed output
ff7d92e remove tmp from collector
4fb34b6 tidy, rename
2f6494d simplify
17f7232 debug
54eeb27 debug mounts
1ba6f48 revert debugging
0b6e922 bind mounts only
03e27d2 check image path for auth
5cd9384 Fix container logs, improve auth
58761f5 cleanup
9688f1f correct rebase
623ddd5 add priv
c302fa8 fixup
484e081 simplify
32f649d remove docker process executor
5d0a427 fix
c396093 map
9b9802a skip podman install
653f188 update ansible podman for rhcos
5365aa3 print collector logs on error
0666f8f update logging
5d02290 update
2ec3a15 add dockerized make test targets
233552d format
4c7d4f0 remove extra step
3636c89 warn
6b37086 invert
Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There's just one sleep that needs to be removed before merging.

@robbycochran robbycochran merged commit cc42ce3 into master Sep 6, 2024
48 of 49 checks passed
@robbycochran robbycochran deleted the rc-docker-client branch September 6, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
all-integration-tests run-benchmark Ask to run benchmark on a PR and compare it with the baseline run-multiarch-builds Run steps for non-x86 archs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants