Skip to content
This repository has been archived by the owner on Mar 9, 2021. It is now read-only.

Summary of DevOps code issues and possible resolutions #27

Open
AlexSkrypnyk opened this issue Oct 18, 2019 · 2 comments
Open

Summary of DevOps code issues and possible resolutions #27

AlexSkrypnyk opened this issue Oct 18, 2019 · 2 comments

Comments

@AlexSkrypnyk
Copy link

Since the first deployment of GovCMS 2.0 it became evident that the DevOps code (aka"glue code") that allows GovCMS Drupal distribution to run on Lagoon has several problems that impact development and delivery SaaS and PaaS projects.

Meta

This is epic-like issue that lists identified problems and proposed resolutions.

The solutions here described high-level - more detailed information will be provided in separate issues that will be referenced for each solution here. These issues will also be used for more depth technical discussions and to submit PRs. We will create these issues once the solutions here are approved (do not want to prematurely spam issues).

The goal here is to provide publicly accessible roadmap of upcoming changes.

It does not mean that all suggestions will be accepted, but at least the problems will be acknowledged.

Please provide a number of the problem when commenting on this issue.

Important

The approaches described in proposed solutions try to preserve existing DX API (env variables) and only extend it for new functionality. The underlying scripting most likely require changes, but this should not have an impact on consumers (sites that are using GovCMS platform).

Once all approved issues resolved, it is expected that newly created consumer projects will be using these new approaches. At the same time, existing PaaS consumers may update to the latest version of "glue code" if they want to - we will provide a command for this (see below).

The problems

  1. Ahoy commands are no longer just wrappers. They contain workflow logic.
    1.1 Ahoy commands become "magic" - not clear what happens when they are called.
    1.2 Ahoy commands cannot be run on Windows, so workflow commands are not the same for windows.
    1.3 Ahoy as a tool is not bad; it is just used to handle too much.

  2. CI configuration using custom logic when building a site.
    2.1 This logic is different from building locally, resulting in different build steps (we are not talking about the difference of environments here - only logic and order of workflow command executions).
    2.2 CI configuration should use the same website building logic as local and production.

  3. govcms-deploy script handles both site build and site deployment logic
    3.1 This script is currently used to control how the site is built based on environment it is ran in.
    3.2 The script is coupled with platform specific variables
    3.3 The script does not allow variation in deployment logic (see below)

  4. Deployment assumes fixed canonical db workflow
    4.1 Example: unable to have non-prod environment as non-canonical

  5. Environment variables (1, 2, 3) apply differently in different environments. It is not clear what is the designation of each environment variable.
    5.1 It is not clear what is the full set of environment variables neither to GovCMS Platform developers nor consumer site developers.
    5.2 It is not clear if changing a variable in one environments will lead to the same result in all environments (not only because of the logic in environment scripts - see p1. above - but also dues to how environments treat variable loading).
    5.3 It is not clear which variables should be controlled by GovCMS platform developers and which by consumer project developers.

  6. There is no unified documentation that covers "glue code".
    6.1 There no single place were a consumer site developer can go for platform reference (Lagoon documentation does not help much because of 5.3).
    6.2 There is no single place where platform developers use as a "contract" between platform and consumer sites.
    6.3 There is no place were platform changes can be reflected at the same time as these changes introduce (stale documentation problem).

  7. Existing projects do not have a mechanism to receive updates to the "glue code" (it is handled now partially, but listing this here)
    7.1. After project being scaffolded, it is not possible to receive updates to the "glue code". The only way is for consumer site developers to manually pick latest changes from the scaffold repos.
    7.2 There is no way to "push" "glue code" updates to PaaS projects.

  8. Lint and test scripts are setup to be used with SaaS and not PaaS
    8.1 The tool's configuration file path and targets are hardcoded to SaaS file locations.
    8.2 Because of 8.2, it is required to maintain a separate version of these scripts for PaaS, but they use exactly the same logic as SaaS.

Proposed solutions

  1. Refactor Ahoy commands so they would be "dumb" shorthand wrappers of build and deploy scripts (see below).
    The rule of thumb for Ahoy commands is - copy the content, paste it into terminal and run. If the command requires adjustment after it is being pasted into terminal - factor it out into a separate script.

  2. Replace CI config to be "dumb" and rely on calling build script (see below) with parameters. Also, make sure not to use any Ahoy commands CI config file.

  3. Split govcms-deploy script into 2 scripts - build and deploy.

  • build script can have variables that control build process. Any environment that needs to alter a build process would do it by setting environment variables and calling a script (synthetic example):
if [[ "$LAGOON_ENVIRONMENT_TYPE" = "production" ]]; then
  GOVCMS_DEPLOY_WORKFLOW_CONFIG="retain" build
else
  • deploy script controls how a deployment proceeds in a concrete environment. For example, .lagoon.yml would have a section for deployment separate to build (synthetic example):
tasks:
  post-rollout:
    - run:
        name: Build site
        command: build
        service: cli
    - run:
        name: Deploy
        command: deploy
        service: cli

This means that build and deploy stages are completely separate; they can have own namespaced variables that do not affect other scripts.

  1. Although the work on this have been started, there is still work to be done in p.3 to make this streamlined and documented.

  2. When dealing with environment variables in Dockerised application, it is important to be clear about where variables are set (which layer of the stack - Dockerfile, docker-compose.yml, .env.defaults), what happens if the value is not provided, and what are the defaults.

A possible solution would be reducing the number of places variables can be specified out of the box. Currently, we have Dockerfile, docker-compose.yml, .env.defaults, .env, .lagoon.env, .lagoon.env.master with Dockerfile, docker-compose.yml, .env.defaults being compulsory. Reducing it to Dockerfile and docker-compose.yml would help, so that docker-compose.yml becomes a single place for all environment variables. But unfortunately, Lagoon does not support this yet, but there is a question about this, so reducing the number of files is not an option now.

Looking at the current documentation, p.5.2 and p.5.3 are still not resolved. We need to add more information for each variable, including the default value, who should set it (GovCMS team or consumer site developer), and how it works in every environment (if there is a difference).

  1. Proposed solution is to create docs dir in scaffold/tooling repository and use it as a single place of all platform documentation.
  • Why [scaffold/tooling]? - this is a central place for all scaffold files, so it makes sense to host it together with code.
  • To address p.6.3, the changes to API code would have to be submitted with changes to documentation (this is a best practice for many open-source projects).
  • Unlike wiki, where it is not possible to comment, diff or publish documentation, storing documentation in docs directory helps to setup a consistent documentation management workflow.
  • Setting up a CI process that can automatically publish documentation to a public URL with every scaffold-tooling release will allow developers to access documentation for previous versions (basically, most of the projects using readthedocs.org use this versioning).
  1. Proposed solution is to put all "glue code", including .docker configuration and docker-compose (basically all non-Drupal files that https://github.com/govCMS/govcms8-scaffold-paas has) and deliver them as a part of vendor/govcms/scaffold-tooling repository. Of course, sourcing absolutely all of the files from vendor/govcms/scaffold-tooling is not possible on day 1, but having a "source of truth" place for all "glue code" helps to keep all other places in-sync:
  • this allows to test the scaffold tooling itself - the tests can start the full stack and make sure that every scaffolding command or file works as expected; that all environment variables used in scripts control the logic correctly etc. (so, basically, unit and integration tests).
  • also, scaffold-tooling can have a special command vendor/govcms/scaffold-tooling/update (for example) that would allow to have a consistent way to get all the latest versions of the scaffold files into current project (currently developers still have to do this manually, by copying every single file from scaffold repositories). Running such command will still require manual review of the updated files (this is still required when you do manually copy newer scaffolding files into the repository) , but at least this will make the developer's life easier.
  1. Refactor both scripts to support passing paths as environment variables.
@simesy
Copy link
Contributor

simesy commented Oct 18, 2019

  1. Agreed

  2. Agreed. Compounded by differences between environments (eg whether bind mounts are used, and how environment variables exposed. Generally I support the same workflow in different environments where possible.

  3. Agreed. I think we could incrementally improve different aspects. At any rate I'm a big fan of splitting out "build" (set up the application) and "deploy" (applying external state).

  4. Agreed

  5. Big yes

  6. Would like to try and use the current wiki as long as possible. Readthedocs/etc will be great when we have a handle on the complete knowledge domain. There is also some other docs to be sourced or considered, where the audiences are usually different but I believe they overlap. We have dupe docs in Freshdesk, the authors of which I'd like understand how they can be working in the wiki instead. Also the "to-be-open-sourced" training docs are focussed on site building as far as i know, but there might be a compelling case to PR on them to cover basic devops training.

  7. Agree. I don't think we should worry about D7, and D8 we are roadmapped to archive the saas scaffold. This leaves the docker files in govcms/govcms8, govcms/govcms8lagoon and here in govcms/govcms8-scaffold-paas. I've started looking at how gitlab might handle CI without any volume mounting.... I think there are just some bugs that are covered up by the bind mounting of /app at the end of the docker build.

  8. Agreed.

@simesy
Copy link
Contributor

simesy commented Dec 19, 2019

2,3) We split up pre/post deploy but currently without using any scripts.

  1. I'm doing the docs in the wiki. A conversation headrecently is that the wiki should transfer to something like readthedocs soon. I'm personally not pushing hard for it, as I'm waiting for more details about the open source training documentation as I'd like to see a lot of the wiki stuff become training material.

  2. I think we made pretty good process on 7 and once (if) saas/paas consolidation happens then there can be a sweep up of a lot of logic and files in govcms8lagoon's docker images.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants