-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add s3 reliability test #44
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Luke Steensen <[email protected]>
@@ -1,6 +1,6 @@ | |||
--- | |||
- name: Install Vector configuration | |||
template: | |||
copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to get around an issue where template
tries to fill in the vector config's templates. We can probably get around it with escaping, but this seemed safer.
It also opens the question of, if we actually wanted to template some things in, how would we do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! We can use different variable_end_string
and variable_start_string
when resolving vector templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, @lukesteensen we use templates to insert addresses, ports, etc, so we'll need to revert this. I'll try to submit a PR today that changes this to use different variable_end_string
and variable_start_string
values.
Description=verify-logs | ||
|
||
[Service] | ||
ExecStart=/usr/local/bin/verifiable-logger verify file-to-s3-reliability-test-data us-east-1 --prefix "host=%H" --tail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty hardcoded right now, but should obviously be templated in once we figure out the right way to get the variables in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would highly recommend not adding support for environment variables in Systemd. I also ran into weird issues when I passed more than one flag. It was a mess. But if you look at the http_test_server
role I used a template for that service file.
You can run this test via: | ||
|
||
``` | ||
test -t file_to_s3_reliability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing missing here: how do we shut down the test and destroy all the resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should add a bin/teardown -t <test> -c <config>
. The terraform destroy
command should make this easy. It should only be called for the local test state, not global state, obviously.
Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.
What's wrong with CloudWatch? I think it works quite well for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CloudWatch works!
But if we create resources per test - then if we have bin/teardown
, we have to invoke it with enough context so that it only deletes resources associated with a particular run.
So this is where we use lambda: we can assign the invocation of bin/teardown
with all required context, delayed by, let's say, three hours. We will then schedule this lambda invocation as part of bin/test
run. It'll allow us to not only clean up the VMs, but also all the associated terraform state - policies, S3 buckets, VPC, and everything else that we create per-test.
I think this solution can replace our CloudWatch VMs removal because we can clean up everything, including the VMs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should solve it for now: #52
region = "us-east-1" | ||
bucket = "file-to-s3-reliability-test-data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, these should be templated in somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this stuff being hard coded since it's all contained within this test. We could use variables (this is what the configurations
folder is for in the test, but I don't think it matters too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is S3 bucket names have global scope, so we either have to use a different one each test run, or make this part of the state global. I think making it global is worse than templating the names.
@@ -0,0 +1,2 @@ | |||
--- | |||
foo: "bar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't let me not have this file.
Presumably, we could use this for things like the bucket name but a few things weren't clear:
- How to template into the vector toml while leaving templated fields alone
- How to get this same variable into terraform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- How to get this same variable into terraform
We do the templating at part of bin/test
and our lib
facility, and pass it to both ansible and terraform!
|
||
resource "aws_s3_bucket" "logs-bucket" { | ||
# data is namespaced by host within the bucket | ||
bucket = "file-to-s3-reliability-test-data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, should be templated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would template this so it's namespaced like our instance names. Ex:
vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data
@@ -11,7 +11,7 @@ locals { | |||
} | |||
|
|||
module "vpc" { | |||
source = "../../../terraform/aws_vpc" | |||
source = "../aws_vpc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I was doing something wrong, but nothing worked at all without changing these paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd. Test harness seems to work on master
currently...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked, doesn't work on my end either. Created #45.
} | ||
|
||
module "topology" { | ||
source = "../../../terraform/aws_uni_topology" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to change this from the case I copied to get things working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea!
While reading, I thought - what if we use a global shared S3 bucket? But it's better this way. We definitely need to template the bucket name.
We probably should introduce a run_id
, derived from the user_id
+ a random string. This would also be useful for proper terraform state isolation - something I wanted to work on as part of the task to run multiple test cases in parallel.
@@ -1,6 +1,6 @@ | |||
--- | |||
- name: Install Vector configuration | |||
template: | |||
copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! We can use different variable_end_string
and variable_start_string
when resolving vector templates.
You can run this test via: | ||
|
||
``` | ||
test -t file_to_s3_reliability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.
@@ -11,7 +11,7 @@ locals { | |||
} | |||
|
|||
module "vpc" { | |||
source = "../../../terraform/aws_vpc" | |||
source = "../aws_vpc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd. Test harness seems to work on master
currently...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Really happy to see this implemented. It definitely raises our confidence that Vector will work reliably over long periods. I think we should clean up these final items.
Also, didn't see anything in here about Slack notifications, etc. I assume that is hard-coded into the verifiable logger? I'm wondering if we can generalize this communication strategy somehow? I don't want to overthink this, but I know we'll need "alerts" of some kind for other tests in the future. We could even throw them on a queue and handle them "generally" out of band. Just thinking out loud a little bit.
|
||
resource "aws_s3_bucket" "logs-bucket" { | ||
# data is namespaced by host within the bucket | ||
bucket = "file-to-s3-reliability-test-data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would template this so it's namespaced like our instance names. Ex:
vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data
Signed-off-by: Luke Steensen <[email protected]>
While far from perfect, this works to spin up the equivalent reliability test to the old test env repo.
I'll comment on things of note, but I'm curious how this looks overall.