Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add s3 reliability test #44

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ansible/roles/vector/tasks/configure.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
- name: Install Vector configuration
template:
copy:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get around an issue where template tries to fill in the vector config's templates. We can probably get around it with escaping, but this seemed safer.

It also opens the question of, if we actually wanted to template some things in, how would we do it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, @lukesteensen we use templates to insert addresses, ports, etc, so we'll need to revert this. I'll try to submit a PR today that changes this to use different variable_end_string and variable_start_string values.

src: "{{ configuration_file }}"
dest: /etc/vector/vector.toml
mode: 0644
21 changes: 21 additions & 0 deletions ansible/roles/verifiable-logger/tasks/install.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
- name: Create verifiable-logger directory
file:
path: /var/lib/verifiable-logger
state: directory

- name: Copy verifiable-logger binary
get_url:
url: "https://verifiable-logger-builds.s3.us-east-2.amazonaws.com/verifiable-logger"
dest: /usr/local/bin/verifiable-logger
mode: 755

- name: Copy generate-logs unit file
template:
src: generate-logs.service
dest: /etc/systemd/system/generate-logs.service

- name: Copy verify-logs unit file
template:
src: verify-logs.service
dest: /etc/systemd/system/verify-logs.service
2 changes: 2 additions & 0 deletions ansible/roles/verifiable-logger/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
---
- include: "{{ action }}.yml"
6 changes: 6 additions & 0 deletions ansible/roles/verifiable-logger/tasks/start-generating.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
- name: Start generating logs
systemd:
name: generate-logs
state: started
daemon-reload: yes
6 changes: 6 additions & 0 deletions ansible/roles/verifiable-logger/tasks/start-verifying.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
- name: Start verifying logs
systemd:
name: verify-logs
state: started
daemon-reload: yes
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[Unit]
Description=generate-logs

[Service]
ExecStart=/usr/local/bin/verifiable-logger generate --rate 10 --output output.log
WorkingDirectory=/var/lib/verifiable-logger/
6 changes: 6 additions & 0 deletions ansible/roles/verifiable-logger/templates/verify-logs.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[Unit]
Description=verify-logs

[Service]
ExecStart=/usr/local/bin/verifiable-logger verify file-to-s3-reliability-test-data us-east-1 --prefix "host=%H" --tail
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hardcoded right now, but should obviously be templated in once we figure out the right way to get the variables in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highly recommend not adding support for environment variables in Systemd. I also ran into weird issues when I passed more than one flag. It was a mess. But if you look at the http_test_server role I used a template for that service file.

WorkingDirectory=/var/lib/verifiable-logger/
25 changes: 25 additions & 0 deletions cases/file_to_s3_reliability/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# File to S3 Reliability Test

This is a long-running test of Vector tailing a file and sending the data to S3.
It uses the [`verifiable-logger`][0] project both to generate the log data and
verify that it all reaches S3.

## Try it

You can run this test via:

```
test -t file_to_s3_reliability
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here: how do we shut down the test and destroy all the resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should add a bin/teardown -t <test> -c <config>. The terraform destroy command should make this easy. It should only be called for the local test state, not global state, obviously.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

What's wrong with CloudWatch? I think it works quite well for this.

Copy link
Contributor

@MOZGIII MOZGIII Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CloudWatch works!
But if we create resources per test - then if we have bin/teardown, we have to invoke it with enough context so that it only deletes resources associated with a particular run.

So this is where we use lambda: we can assign the invocation of bin/teardown with all required context, delayed by, let's say, three hours. We will then schedule this lambda invocation as part of bin/test run. It'll allow us to not only clean up the VMs, but also all the associated terraform state - policies, S3 buckets, VPC, and everything else that we create per-test.

I think this solution can replace our CloudWatch VMs removal because we can clean up everything, including the VMs!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve it for now: #52

```

## Resources

* [Setup][setup]
* [Development][development]
* [How it works][how_it_works]
* [Vector docs][docs]
* [Vector repo][repo]
* [Vector website][website]


[0]: https://github.com/timberio/verifiable-logger
21 changes: 21 additions & 0 deletions cases/file_to_s3_reliability/ansible/bootstrap.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
- hosts: '{{ test_namespace }}'
tasks:
- meta: refresh_inventory
- name: Wait 600 seconds for target connection to become reachable/usable
wait_for_connection:

- debug:
var: playbook_dir

- hosts: '{{ test_namespace }}:&tag_TestRole_subject'
become: true
roles:
- role: vector
action: install

- role: vector
action: configure

- role: verifiable-logger
action: install
16 changes: 16 additions & 0 deletions cases/file_to_s3_reliability/ansible/config_files/vector.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
data_dir = "/var/lib/vector"

[sources.file]
type = "file"
include = ["/var/lib/verifiable-logger/output.log"]
start_at_beginning = true

[sinks.s3]
inputs = ["file"]
type = "aws_s3"
region = "us-east-1"
bucket = "file-to-s3-reliability-test-data"
Comment on lines +11 to +12
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these should be templated in somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this stuff being hard coded since it's all contained within this test. We could use variables (this is what the configurations folder is for in the test, but I don't think it matters too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is S3 bucket names have global scope, so we either have to use a different one each test run, or make this part of the state global. I think making it global is worse than templating the names.

key_prefix = "host={{ host }}/date=%F/"
encoding = "text"
compression = "none"
batch.timeout_secs = 30
12 changes: 12 additions & 0 deletions cases/file_to_s3_reliability/ansible/run.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
- hosts: '{{ test_namespace }}:&tag_TestRole_subject'
become: true
roles:
- role: verifiable-logger
action: start-generating

- role: verifiable-logger
action: start-verifying

- role: vector
action: start
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
---
foo: "bar"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't let me not have this file.

Presumably, we could use this for things like the bucket name but a few things weren't clear:

  1. How to template into the vector toml while leaving templated fields alone
  2. How to get this same variable into terraform

Copy link
Contributor

@MOZGIII MOZGIII Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How to get this same variable into terraform

We do the templating at part of bin/test and our lib facility, and pass it to both ansible and terraform!

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# WARNING!
#
# Do not modify the parameters of this file since historical test results are
# based on these parameters. Please create a new configuration and specify taht
# configuration instead.
subject_instance_type = "c5.large"
70 changes: 70 additions & 0 deletions cases/file_to_s3_reliability/terraform/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
provider "aws" {
region = "us-east-1"
version = "~> 2.53"
}

terraform {
required_version = ">= 0.12"
backend "s3" {}
}

module "topology" {
source = "../../../terraform/aws_uni_topology"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change this from the case I copied to get things working.


providers = {
aws = aws
}

pub_key = var.pub_key
subject_instance_type = var.subject_instance_type
test_configuration = var.test_configuration
test_name = var.test_name
user_id = var.user_id
results_s3_bucket_name = var.results_s3_bucket_name
}

resource "aws_s3_bucket" "logs-bucket" {
# data is namespaced by host within the bucket
bucket = "file-to-s3-reliability-test-data"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should be templated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data


lifecycle_rule {
enabled = true

expiration {
days = 14
}
}
}

data "aws_iam_policy_document" "logs-bucket-policy" {
statement {
sid = "AllowTestHarnessListBucket"

actions = [
"s3:ListBucket",
]

resources = [
aws_s3_bucket.logs-bucket.arn,
]
}

statement {
sid = "AllowTestHarnessEverythingElse"

actions = [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
]

resources = [
"${aws_s3_bucket.logs-bucket.arn}/*",
]
}
}

resource "aws_iam_role_policy" "default" {
role = module.topology.instance_profile_name
policy = data.aws_iam_policy_document.logs-bucket-policy.json
}
24 changes: 24 additions & 0 deletions cases/file_to_s3_reliability/terraform/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
variable "pub_key" {
type = string
}

variable "subject_instance_type" {
type = string
}

variable "test_configuration" {
type = string
}

variable "test_name" {
type = string
}

variable "user_id" {
type = string
}

// don't actually need this, but need to accept it
variable "results_s3_bucket_name" {
type = string
}
3 changes: 3 additions & 0 deletions cases/file_to_s3_reliability/terraform/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
terraform {
required_version = ">= 0.12"
}
6 changes: 3 additions & 3 deletions terraform/aws_uni_topology/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ locals {
}

module "vpc" {
source = "../../../terraform/aws_vpc"
source = "../aws_vpc"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I was doing something wrong, but nothing worked at all without changing these paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Test harness seems to work on master currently...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked, doesn't work on my end either. Created #45.


providers = {
aws = aws
Expand All @@ -24,7 +24,7 @@ module "vpc" {
}

module "aws_instance_profile" {
source = "../../../terraform/aws_instance_profile"
source = "../aws_instance_profile"

providers = {
aws = aws
Expand All @@ -36,7 +36,7 @@ module "aws_instance_profile" {
}

module "aws_instance_subject" {
source = "../../../terraform/aws_instance"
source = "../aws_instance"

providers = {
aws = aws
Expand Down
3 changes: 3 additions & 0 deletions terraform/aws_uni_topology/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
output "instance_profile_name" {
value = module.aws_instance_profile.name
}