Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform provisioning for Jupiter into Azure Cloud #2733

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

murny
Copy link
Contributor

@murny murny commented Jan 30, 2022

This PR adds the ability to provision and setup Jupiter in Azure Cloud via Kubernetes.

How to start

  • Install terraform
  • Follow instructions here ([Spike] Investigate our Azure setup #2570 (comment)) to set up Azure CLI, create a service account and retrieve your Azure login information for Terraform
  • Rename terraform.tfvars.sample to terraform.tfvars. Add Azure login information and other information you need
  • Install all terraform providers and libraries via terraform init (need to be inside terraform directory to run this and following commands)
  • Do a test run via terraform plan and make sure no issues
  • Provision everything in Azure with terraform apply (warning: this takes about 30 minutes to finish)
  • Once completed, everything should be up and running
  • Get External IP via kubectl describe ing jupiter-ingress -n jupiter --kubeconfig kubeconfig and look for the "Address" in the output, this is the external IP.
  • Visiting this external ip will give you an Nginx 404 error as we only allow traffic for *.uat.library.ualberta.ca
  • We will need to setup a A record for this IP Address (if we doing this for real we would probably need to bug networking team and have them point the *.uat.library.ualberta.ca domain to this IP Address. But for testing we can just create some entries in our computer's /etc/hosts like so:
20.0.0.0 era.uat.library.ualberta.ca
20.0.0.0 digitalcollections.uat.library.ualberta.ca
  • You can now proceed to https://era.uat.library.ualberta.ca! However first time we go here since we haven't configured SSL with real certificates you will get the following error:
    image
  • Just click the button at the bottom to proceed.
  • We will need seed data next, this can be done with the following steps:
  1. Get a jupiter app pod name via
    kubectl get pods -n jupiter --kubeconfig kubeconfig
  2. Then run the following kubectl exec --stdin --tty jupiter-app-<REPLACE WITH POD ID FROM ABOVE> -n jupiter --kubeconfig kubeconfig -- bin/rails db:seed
  • That's it we done! UAT is all up and running.

Sidekiq running correctly:
Screen Shot 2022-01-29 at 11 02 23 PM

Database seeds ran without issue:
Screen Shot 2022-01-29 at 11 02 41 PM

Solr is working correctly, can search records with no problems:
Screen Shot 2022-01-29 at 11 03 14 PM

Depositing and updatinga new item all works. Downloading files works. Everything works.

SAML could easily be configured and setup just like how we do for staging if we wanted too

  • Once finished using the environment, take it down via terraform destroy (rarely but sometimes you need to do this twice because of ordering issues where postgres tries to be taken down before cluster, etc. Rerunning it again resolves everything) Destroying everything takes about 10 minutes.

Next steps:

  • Review, try and then finally merge this PR (fix any issues that are deemed worthy of fixing now. Convert the steps above into a README file maybe. Point to the proper jupiter image (ualberta/jupiter). Add changelog? Merge!)
  • Have Github Actions build a jupiter image from now on? (Can look at my demo POC for code for this)
  • Start provisioning UAT via terraform instead of current UAT setup. Work towards making this production ready.
  • Start improving the processes and how these scripts work (look into terraform best practices, break these scripts into modules, and so forth). Anything that happens manually should be automated away (DNS/SSL setup, SAML configuration, find a better solution for seeding initial data (data that Jupiter can't live without like populating our deposit drop downs), etc.). Work towards improving logging, metrics, security and become feature parity with Jupiter's current production environment.
  • ???
  • Follow Matts plan to cutover current production to new production environment that is being provisioned by Terraform

@github-actions
Copy link

1 Warning
⚠️ This PR is too big! Consider breaking it down into smaller PRs.

Generated by 🚫 Danger

@@ -10,16 +9,13 @@ SAML_CERTIFICATE=
ROLLBAR_TOKEN=
GOOGLE_ANALYTICS_TOKEN=
RAILS_LOG_TO_STDOUT=true
# Comma delimited string of rack attack safelisted IPs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rack attack gem was removed long time ago

@@ -1,7 +1,6 @@
#Rails application env variables
RAILS_ENV=uat
DATABASE_URL=postgresql://jupiter:mysecretpassword@postgres:5432/
FCREPO_URL=http://fcrepo:8080/fcrepo/rest
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fedora was removed long time ago...updated all references I could find here

@@ -39,14 +39,5 @@
#
# preload_app!

if ENV['RAILS_ENV'] == 'uat'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to fork puma workers

}


resource "kubernetes_config_map" "solr-config" {
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solr helm charts were mostly around the idea of creating a cluster (multiple solrcloud instances and zookeeper/operator instances etc) which is probably overkill for us. Created a simple solr pod myself which mirrors the work we did in docker compose with old UAT.

For production, perhaps looking into these helm charts would be worth it.

(If Jupiter wasn't on the cutting block, I would suggest moving to ElasticSearch. The time it will take someone (maybe on Unix team) to figure out and rollout a game plan for a Solr Cluster, and then the time and effort to maintain this Solr Cluster would be a big undertaking. Never mind the fact that Jupiter requires an End of Life version of Solr (v6.6) and upgrading this would also be quite a bit of work. So for all this time and effort, you could easily rip out the functionality of Solr in Jupiter and transition to Elasticsearch instead. Then just use Azure's Elasticsearch service. This would be a far cheaper path to go down then keeping with an EOL Solr Cluster. Jupiter itself would also be in a far better place going forward into the future)

}

data = {
"schema.xml" = "${file("${path.module}/../solr/config/schema.xml")}"
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think solrconfig.xml is required but not sure about schema.xml? Also we have a ton of junk in the solr/config and solr directory in Jupiter that we should delete if we not using it...as its just adds confusion.

In Docker compose we would mount this entire solr/config directory into the image, but honestly think we just need these 1 or 2 files.

In Kubernetes world, you create a ConfigMap with the file data, then mount this as a volume to do a similar thing

}
}
}
volume {
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: We have no persistence layer here. So this be a good improvement where we want to look into storing the solr data in a persistent storage so things like recreating this solr pod doesn't delete all the data it currently has

}
}

resource "kubernetes_service" "solr-service" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to expose solr pod to rest of the pods, so that Rails can talk to it.

This allows traffic from Rails to Solr via "http://jupiter-solr:8983/solr/jupiter-uat" to work

RAILS_LOG_TO_STDOUT = "true"
RAILS_SERVE_STATIC_FILES = "true"
DATABASE_URL = "postgresql://${urlencode("${var.postgresql-admin-login}@${azurerm_postgresql_server.db.name}")}:${urlencode(var.postgresql-admin-password)}@${azurerm_postgresql_server.db.fqdn}:5432/${azurerm_postgresql_database.postgresql-db.name}"
SOLR_URL = "http://${var.app-name}-solr:8983/solr/jupiter-uat"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this name could be better, should be driven off the solr pod name instead:
"http://${kubernetes_deployment.solr.name}:8983/solr/jupiter-uat"

}
spec {
rule {
# TODO: Figure out how we will use DNS/etc
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As TODO states this is left as a manual task. Where either someone has to manually create an A Record in the DNS after we provisioned UAT environment... or for self local testing you can set your /etc/hosts to the correct ip mapping.

But this should get automated! Terraform can do this for us in a very trivial manner.

Same goes for SSL certificates and getting that configured.

name = "${var.app-name}-ingress"
annotations = {
"kubernetes.io/ingress.class" = "nginx"
"nginx.ingress.kubernetes.io/proxy-body-size" = "16m"
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of how you can configure NGINX via annotations. For example by default you can only upload images to nginx that I believe less then 1 MB in size, this makes it 16 MBs. Obviously we want to increase this and this is how you can do this.

There is a ton of other options you can configure NGINX to your liking. Could look into adding SSL certs, having nginx serve assets instead of Rails, etc.

More info: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/

collation = "English_United States.1252"
}

# TODO: Fix this, Currently we allow everything for now
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the firewall rules for Redis needs to be played with and tightened.

At the end of the day we only want internal private traffic for our pods (flow from Rails/Sidekiq to Redis/Postgres). We ourselves have no business needing access to these resources.

Perhaps we can change to only allow private ip addresses or a virtual private network?

  start_ip_address    = "10.0.0.0"
  end_ip_address      = "10.255.255.255"

But this needs some investigation

family = "C"
sku_name = "Standard"

# TODO: It possible to use SSL here?
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Postgres and Redis they provide SSL options but this is probably overkill especially if we keep this traffic to just inside our pods. But something that might be worth looking into

account_tier = "Standard"
account_replication_type = "LRS"

# TODO: Probably not needed for Jupiter?
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be removed. We only do file uploads from Rails server so we have no need for CORS. But I suppose if we ever do something with ActionText or using ActiveStorage direct upload then this might be useful

@@ -0,0 +1,24 @@
####################
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample file for terraform.tfvars This is an easy way to set the variables in variables.tf for terraform without having to pass them on the command line. The variables in terraform.tfvars should be kept secret as they will contain sensitive information (DB password, etc) and as a result are gitignored


/public/packs
/public/packs-test
/node_modules
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't check all this stuff into our docker images 😆. This saves a good 100 MB or so from my images I was building

@@ -80,6 +80,10 @@ gem 'google-api-client',
gem 'builder_deferred_tagging', github: 'ualbertalib/builder_deferred_tagging', tag: 'v0.01'
gem 'oaisys', github: 'ualbertalib/oaisys', tag: 'v1.0.3'

group :uat do
gem 'azure-storage-blob', require: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required for Activestorage to use Azure blob storage as a bucket

data = {
RAILS_ENV = "uat"
RAILS_LOG_TO_STDOUT = "true"
RAILS_SERVE_STATIC_FILES = "true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having Rails serve our assets currently. Nginx could easily do this instead. Could be nice improvement here to investigate this in the future.

name = "${var.app-name}-config"
}
}
# TODO: Figure healthcheck out, seems to be failing on HTTP/HTTPS issue
Copy link
Contributor Author

@murny murny Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is failing. Needs more investigation.

We need to white list host I believe via:
config.hosts << IPAddr.new('10.0.0.0/8') (maybe we need to do localhost too?)

Apparently you could also whitelist this route instead (but couldn't get this working locally 🤔) via

# Exclude requests for the /healthcheck/ path from host checking
Rails.application.config.host_configuration = {
  exclude: ->(request) { request.path =~ /healthcheck/ }
}

More info: https://guides.rubyonrails.org/configuring.html#actiondispatch-hostauthorization

But even after this I am seeing the following error:

2022-01-30 20:01:43 +0000 HTTP parse error, malformed request: #<Puma::HttpParserError: Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?

Apparently this can happen when the site keep redirecting http to https when attempting to hit localhost (puma/puma#1128 (comment)).

So I set the scheme = "HTTPS" so the probe should happen with HTTPS instead but still no luck.

So anyways needs some playing around to figure out how we can get this to pass without issues.

Everything works without the healthcheck, but healthchecks are important for kubernetes as it helps the control plane when deploying/monitoring pods/etc

spec {
container {
name = "${var.app-name}-workers"
image = "murny/jupiter:latest"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using my own image I build instead of overwriting the ualberta stuff. So this will just need to updated to point to the correct one.

My own image does nothing special...just docker build -f Dockerfile.deployment and docker push

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant