Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tf] COS (and Minio) terraform modules #227

Merged
merged 69 commits into from
Nov 29, 2024
Merged

[tf] COS (and Minio) terraform modules #227

merged 69 commits into from
Nov 29, 2024

Conversation

Abuelodelanada
Copy link
Contributor

@Abuelodelanada Abuelodelanada commented Nov 13, 2024

This PR fixes #220, and creates a new module named minio
It is based on #219

In tandem with:

Loki ingests it's own logs:

  • a grafana agent
  • relation over loki:logging-consumer to ga
  • relation over loki:logging-provider to ga

Loki sends metrics to mimir:

  • a grafana agent
  • relation over loki:self-metrics-endpoint to ga, and
  • relation over mimir:receive-remote-write to ga

Loki sends tracing to tempo:

  • grafana-agent:tracing-provider loki:tracing

Mimir sends tracing to tempo:

  • grafana-agent:tracing-provider mimir:tracing

Mimir metrics send back to itself:

  • mimir:self-metrics-endpoint grafana-agent:metrics-endpoint

Tempo sends metrics to mimir:

  • a grafana agent
  • relation over tempo:self-metrics-endpoint to ga, and
  • relation over mimir:receive-remote-write to ga

Tempo sends logs to loki:

  • grafana-agent:logging-provider tempo:logging

Grafana send traces to Tempo:

  • juju relate tempo:tracing grafana:tracing

How to test it using tofu or terraform

Using the default values for worker's units:

Create a main.tf file with the following:

# COS module that deploy the whole Canonical Observability Stack
module "cos" {
  source         = "git::https://github.com/canonical/observability//terraform/modules/cos?ref=self-monitoring"
  model_name     = var.model_name
  minio_password = var.minio_password
  minio_user     = var.minio_user
}

# S3 module that deploy the Object Storage MinIO required by COS
module "minio" {
  source         = "git::https://github.com/canonical/observability//terraform/modules/minio?ref=self-monitoring"
  model_name     = var.model_name
  channel        = var.channel
  minio_user     = var.minio_user
  minio_password = var.minio_password

  loki  = module.cos.loki
  mimir = module.cos.mimir
  tempo = module.cos.tempo
}

variable "channel" {
  description = "Charms channel"
  type        = string
  default     = "latest/edge"
}

variable "model_name" {
  description = "Model name"
  type        = string
}
variable "minio_user" {
  description = "User for MinIO"
  type        = string
}

variable "minio_password" {
  description = "Password for MinIO"
  type        = string
  sensitive   = true
}

And then run:

$ tofu init 

$ tofu apply -var='minio_password=Password' -var='minio_user=User' -var='model_name=test'

Juju status after deploying the cos TF module

image

Using the custom values for worker's units:

If you have enough RAM and CPUs to deploy 3 units per worker you need to create a main.tf like this one:

# COS module that deploy the whole Canonical Observability Stack
module "cos" {
  source                        = "../cos"
  model_name                    = var.model_name
  channel                       = var.channel
  minio_password                = var.minio_password
  minio_user                    = var.minio_user
  loki_backend_units            = var.loki_backend_units
  loki_read_units               = var.loki_read_units
  loki_write_units              = var.loki_write_units
  mimir_backend_units           = var.mimir_backend_units
  mimir_read_units              = var.mimir_read_units
  mimir_write_units             = var.mimir_write_units
  tempo_compactor_units         = var.tempo_compactor_units
  tempo_distributor_units       = var.tempo_distributor_units
  tempo_ingester_units          = var.tempo_ingester_units
  tempo_metrics_generator_units = var.tempo_metrics_generator_units
  tempo_querier_units           = var.tempo_querier_units
  tempo_query_frontend_units    = var.tempo_query_frontend_units
}

# S3 module that deploy the Object Storage MinIO required by COS
module "minio" {
  source         = "../minio"
  model_name     = var.model_name
  channel        = var.channel
  minio_user     = var.minio_user
  minio_password = var.minio_password

  loki  = module.cos.loki
  mimir = module.cos.mimir
  tempo = module.cos.tempo
}
    
variable "channel" {
  description = "Charms channel"
  type        = string
  default     = "latest/edge"
}

variable "model_name" {
  description = "Model name"
  type        = string
}

variable "use_tls" {
  description = "Specify whether to use TLS or not for coordinator-worker communication. By default, TLS is enabled through self-signed-certificates"
  type        = bool
  default     = true
}

variable "minio_user" {
  description = "User for MinIO"
  type        = string
}

variable "minio_password" {
  description = "Password for MinIO"
  type        = string
  sensitive   = true
}

variable "loki_backend_units" {
  description = "Number of Loki worker units with backend role"
  type        = number
  default     = 1
}

variable "loki_read_units" {
  description = "Number of Loki worker units with read role"
  type        = number
  default     = 1
}

variable "loki_write_units" {
  description = "Number of Loki worker units with write roles"
  type        = number
  default     = 1
}

variable "mimir_backend_units" {
  description = "Number of Mimir worker units with backend role"
  type        = number
  default     = 1
}

variable "mimir_read_units" {
  description = "Number of Mimir worker units with read role"
  type        = number
  default     = 1
}

variable "mimir_write_units" {
  description = "Number of Mimir worker units with write role"
  type        = number
  default     = 1
}

variable "tempo_compactor_units" {
  description = "Number of Tempo worker units with compactor role"
  type        = number
  default     = 1
}

variable "tempo_distributor_units" {
  description = "Number of Tempo worker units with distributor role"
  type        = number
  default     = 1
}

variable "tempo_ingester_units" {
  description = "Number of Tempo worker units with ingester role"
  type        = number
  default     = 1
}

variable "tempo_metrics_generator_units" {
  description = "Number of Tempo worker units with metrics-generator role"
  type        = number
  default     = 1
}

variable "tempo_querier_units" {
  description = "Number of Tempo worker units with querier role"
  type        = number
  default     = 1
}
variable "tempo_query_frontend_units" {
  description = "Number of Tempo worker units with query-frontend role"
  type        = number
  default     = 1
}
$ tofu init 

$ tofu apply -var='minio_password=Password' -var='minio_user=User' -var='model_name=test' -var='loki_backend_units=3' -var='loki_read_units=3' -var='loki_write_units=3' -var='mimir_backend_units=3' -var='mimir_read_units=3' -var='mimir_write_units=3' -var='tempo_compactor_units=3' -var='tempo_distributor_units=3' -var='tempo_ingester_units=3' -var='tempo_metrics_generator_units=3' -var='tempo_querier_units=3' -var='tempo_query_frontend_units=3'

Juju status after deploying the cos TF module with custom worker's units

image

@Abuelodelanada Abuelodelanada marked this pull request as ready for review November 13, 2024 13:38
@Abuelodelanada Abuelodelanada requested a review from a team as a code owner November 13, 2024 13:38
terraform/modules/cos/README.md Outdated Show resolved Hide resolved
terraform/modules/cos/README.md Outdated Show resolved Hide resolved
terraform/modules/cos/main.tf Show resolved Hide resolved
terraform/modules/s3/main.tf Outdated Show resolved Hide resolved
terraform/modules/s3/output.tf Outdated Show resolved Hide resolved
terraform/modules/s3/scripts/s3management.sh Outdated Show resolved Hide resolved
terraform/modules/s3/scripts/s3management.sh Outdated Show resolved Hide resolved
terraform/modules/s3/variables.tf Outdated Show resolved Hide resolved
@MichaelThamm
Copy link
Contributor

MichaelThamm commented Nov 27, 2024

I tested in monolithic mode and deployment worked in one try! Here are my notes:

  • This took ~32 minutes (on a relatively resource constrained machine)

  • When TF finishes deploying a charm, that charm is not necessarily active/idle so this check may not be sufficient.

    • e.g. If I successfully deploy from the Mimir TF module, then Mimir is Blocked due to missing S3 integration (although TF says it deployed).
    • This active/idle logic could be added to the s3management.sh script since you already have the wait_for_app function.
  • It would be great if (this should not block the PR from merging) the creation of Minio + manual steps were decoupled from the COS deployment.

    • This would be helpful for integration tests using TF where we build a charm (Loki, Mimir, Tempo) and then run terraform apply -chdir=terraform/modules/minio to add consistency to deployments with Minio for testing.
    • The shell script would likely need to be re-written to handle a (named) list of modules and then execute the manual steps per module in the list.
    • If you decouple Minio from these modules then you have to consider the scenarios:
      • e.g. Loki is already deployed and we deploy minio
      • e.g. Deploy minio in a new model and then deploy Loki
      • The latter example would be much more complex to automate.

@Abuelodelanada
Copy link
Contributor Author

Hi @MichaelThamm !

Thanks for your review!

I tested in monolithic mode and deployment worked in one try! Here are my notes:

* This took ~32 minutes (on a relatively resource constrained machine)

This is not necessarily wrong. On my VM (6 CPU, 20 GB RMA) took ~10 min

* When TF finishes deploying a charm, that charm is not necessarily `active/idle` so [this check](https://github.com/canonical/observability/pull/227/files#diff-78a5f99412a7e4da0da4664d1aba64190da5f66378f7a863cb6a3ac5adcfef62R44) may not be sufficient.
  
  * e.g. If I successfully deploy from the Mimir TF module, then Mimir is `Blocked` due to missing S3 integration (although TF says it deployed).
  * This `active/idle` logic could be added to the `s3management.sh` script since you already have the `wait_for_app` function.

* It would be great if (this should not block the PR from merging) the creation of Minio + manual steps were decoupled from the COS deployment.
  
  * This would be helpful for integration tests using TF where we build a charm (Loki, Mimir, Tempo) and then run `terraform apply -chdir=terraform/modules/minio` to add consistency to deployments with Minio for testing.
  * The shell script would likely need to be re-written to handle a (named) list of modules and then execute the manual steps per module in the list.
  * If you decouple Minio from these modules then you have to consider the scenarios:
    
    * e.g. Loki is already deployed and we deploy minio
    * e.g. Deploy minio in a new model and then deploy Loki
    * The latter example would be much more complex to automate.

I've addressed these in these two commits:

@Abuelodelanada Abuelodelanada requested a review from sed-i November 28, 2024 14:26
@Abuelodelanada Abuelodelanada mentioned this pull request Nov 29, 2024
@Abuelodelanada Abuelodelanada changed the title [tf] Add self monitoring relations [tf] COS (and Minio) terraform modules Nov 29, 2024
@Abuelodelanada Abuelodelanada merged commit aec28b5 into main Nov 29, 2024
1 check passed
@Abuelodelanada Abuelodelanada deleted the self-monitoring branch November 29, 2024 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tf] Add self monitoring relations
5 participants