Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3 reliability test #44

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add s3 reliability test #44

wants to merge 2 commits into from

Conversation

lukesteensen
Copy link
Member

While far from perfect, this works to spin up the equivalent reliability test to the old test env repo.

I'll comment on things of note, but I'm curious how this looks overall.

Signed-off-by: Luke Steensen <[email protected]>
@@ -1,6 +1,6 @@
---
- name: Install Vector configuration
template:
copy:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get around an issue where template tries to fill in the vector config's templates. We can probably get around it with escaping, but this seemed safer.

It also opens the question of, if we actually wanted to template some things in, how would we do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, @lukesteensen we use templates to insert addresses, ports, etc, so we'll need to revert this. I'll try to submit a PR today that changes this to use different variable_end_string and variable_start_string values.

Description=verify-logs

[Service]
ExecStart=/usr/local/bin/verifiable-logger verify file-to-s3-reliability-test-data us-east-1 --prefix "host=%H" --tail
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hardcoded right now, but should obviously be templated in once we figure out the right way to get the variables in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highly recommend not adding support for environment variables in Systemd. I also ran into weird issues when I passed more than one flag. It was a mess. But if you look at the http_test_server role I used a template for that service file.

bin/test Outdated Show resolved Hide resolved
You can run this test via:

```
test -t file_to_s3_reliability
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here: how do we shut down the test and destroy all the resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should add a bin/teardown -t <test> -c <config>. The terraform destroy command should make this easy. It should only be called for the local test state, not global state, obviously.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

What's wrong with CloudWatch? I think it works quite well for this.

Copy link
Contributor

@MOZGIII MOZGIII Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CloudWatch works!
But if we create resources per test - then if we have bin/teardown, we have to invoke it with enough context so that it only deletes resources associated with a particular run.

So this is where we use lambda: we can assign the invocation of bin/teardown with all required context, delayed by, let's say, three hours. We will then schedule this lambda invocation as part of bin/test run. It'll allow us to not only clean up the VMs, but also all the associated terraform state - policies, S3 buckets, VPC, and everything else that we create per-test.

I think this solution can replace our CloudWatch VMs removal because we can clean up everything, including the VMs!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve it for now: #52

Comment on lines +11 to +12
region = "us-east-1"
bucket = "file-to-s3-reliability-test-data"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these should be templated in somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this stuff being hard coded since it's all contained within this test. We could use variables (this is what the configurations folder is for in the test, but I don't think it matters too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is S3 bucket names have global scope, so we either have to use a different one each test run, or make this part of the state global. I think making it global is worse than templating the names.

@@ -0,0 +1,2 @@
---
foo: "bar"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't let me not have this file.

Presumably, we could use this for things like the bucket name but a few things weren't clear:

  1. How to template into the vector toml while leaving templated fields alone
  2. How to get this same variable into terraform

Copy link
Contributor

@MOZGIII MOZGIII Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How to get this same variable into terraform

We do the templating at part of bin/test and our lib facility, and pass it to both ansible and terraform!


resource "aws_s3_bucket" "logs-bucket" {
# data is namespaced by host within the bucket
bucket = "file-to-s3-reliability-test-data"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should be templated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data

cases/file_to_s3_reliability/terraform/variables.tf Outdated Show resolved Hide resolved
@@ -11,7 +11,7 @@ locals {
}

module "vpc" {
source = "../../../terraform/aws_vpc"
source = "../aws_vpc"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I was doing something wrong, but nothing worked at all without changing these paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Test harness seems to work on master currently...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked, doesn't work on my end either. Created #45.

}

module "topology" {
source = "../../../terraform/aws_uni_topology"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change this from the case I copied to get things working.

Copy link
Contributor

@MOZGIII MOZGIII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea!

While reading, I thought - what if we use a global shared S3 bucket? But it's better this way. We definitely need to template the bucket name.
We probably should introduce a run_id, derived from the user_id + a random string. This would also be useful for proper terraform state isolation - something I wanted to work on as part of the task to run multiple test cases in parallel.

@@ -1,6 +1,6 @@
---
- name: Install Vector configuration
template:
copy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

You can run this test via:

```
test -t file_to_s3_reliability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

@@ -11,7 +11,7 @@ locals {
}

module "vpc" {
source = "../../../terraform/aws_vpc"
source = "../aws_vpc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Test harness seems to work on master currently...

Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Really happy to see this implemented. It definitely raises our confidence that Vector will work reliably over long periods. I think we should clean up these final items.

Also, didn't see anything in here about Slack notifications, etc. I assume that is hard-coded into the verifiable logger? I'm wondering if we can generalize this communication strategy somehow? I don't want to overthink this, but I know we'll need "alerts" of some kind for other tests in the future. We could even throw them on a queue and handle them "generally" out of band. Just thinking out loud a little bit.


resource "aws_s3_bucket" "logs-bucket" {
# data is namespaced by host within the bucket
bucket = "file-to-s3-reliability-test-data"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data

Signed-off-by: Luke Steensen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants