Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote installer shell script to pull directly from Github #3

Closed
nullbio opened this issue May 28, 2024 · 19 comments
Closed

Rewrote installer shell script to pull directly from Github #3

nullbio opened this issue May 28, 2024 · 19 comments

Comments

@nullbio
Copy link

nullbio commented May 28, 2024

I'm not sure what you'd like to do with this, if anything, but I rewrote the shell script to pull binaries directly from GitHub releases using the tag names, version numbering, and file formats automated in your build system. While switching my current project to dotenvx, I found the need to install the binary on my Docker containers. This led me to make some adjustments to the installation script for enhanced security.

My reasons are as follows:

  • Security Priority: Given the sensitive nature of this project, security should always be the top priority. Any breach could have severe consequences. Therefore, it's crucial to adopt a paranoid approach to security whenever possible.
  • Trusted Sources: The shell script should use transparent and trusted sources, minimizing external exposure. Since the project is primarily hosted on GitHub, it makes sense to centralize dependencies there. Removing dependencies on Heroku and URLs (e.g., https://dotenvx.sh/$(uname)/$(uname -m).tgz) is a significant security improvement. If a malicious entity were to gain access to these endpoints, they could serve malicious binaries or execute harmful commands by modifying the VERSION response. Everything is publicly traceable on GitHub, including the builds, source code, and hashes. When users see they’re pulling the binary directly from a release, it provides much greater assurance.

Here are some modifications I made and the reasoning:

  • Removed Automatic Sudo Installation: The script no longer installs sudo automatically if it's missing. In production Docker containers, sudo is often not installed for security reasons. Installing sudo within the script is an anti-pattern. If running as root is required, users should be notified and make an informed decision.
  • Added Installation Path Option: Users can now specify an installation path (--path argument), allowing them to install dotenvx in a non-root user-writable directory to avoid the need for sudo altogether. This is much safer than the current approach, and we should consider making the default behavior to install in $HOME instead of /usr/local/bin with an optional --global argument.
  • Added Version Specifier Option: Users can now specify a precise version number (--version) to install. Pinning version numbers is a common practice to prevent unforeseen breakages, and is particularly useful for installing dotenvx in docker images as part of a CI/CD flow for consistent builds.
  • Curl Dependency Check: Since many systems, like stripped-down Alpine Docker containers, may not have curl installed, the script now checks for curl and exits if it's not found. Unfortunately, at least one external dependency is unavoidable, but all other dependencies are core Linux utilities that should be present on all systems. Previously there was no error handling around this and a missing curl dependency was causing strange output.
  • Direct GitHub Releases: The script now pulls directly from GitHub releases using the GitHub API to get tag names. As long as the file and tag naming conventions remain consistent, this should continue to work seamlessly.
  • Enhanced Error Handling: Added more general error handling in areas that were previously missing it to increase robustness.
  • Help Usage and New Arguments: Added help usage to the script (--help|help)

As for the docs on the website, in maintaining the theme of transparency, they could say:

curl -fSs https://raw.githubusercontent.com/dotenvx/dotenvx.sh/main/installer.sh | sh

I haven't created a pull request for this because I'm not sure how you wanted to handle this, considering that if you're on board with all of this it may replace the need for the dotenvx.sh Heroku app.

Here's the code:

#!/bin/sh

set -e

# Enable debugging if VERBOSE or GITHUB_ACTIONS and RUNNER_DEBUG are set
if [ -n "$VERBOSE" ] || { [ -n "$GITHUB_ACTIONS" ] && [ -n "$RUNNER_DEBUG" ]; }; then
  set -x
fi

usage() {
  echo "Usage: $0 version=x path=y"
  echo "       $0 help|--help"
  echo "Arguments:"
  echo "  version=x      Specify the version of dotenvx to install, for example: version=0.44.0"
  echo "  path=y         Specify the installation directory, default is /usr/local/bin"
  echo "  help           Display this help message."
}

# Default values
INSTALL_DIR="/usr/local/bin"
VERSION=""

# Parse arguments
for arg in "$@"; do
  case $arg in
  version=* | --version=*)
    VERSION="${arg#*=}"
    ;;
  path=* | --path=*)
    INSTALL_DIR="${arg#*=}"
    ;;
  help | --help)
    usage
    exit 0
    ;;
  *)
    # Unknown option
    echo "Unknown option: $arg"
    usage
    exit 1
    ;;
  esac
done

case $INSTALL_DIR in
~*/*)
  INSTALL_DIR="$HOME/${INSTALL_DIR#\~/}"
  ;;
~*)
  INSTALL_DIR="$HOME/${INSTALL_DIR#\~}"
  ;;
esac

if [ ! -w "$INSTALL_DIR" ] && [ "$(id -u)" -ne 0 ]; then
  echo "The install directory $INSTALL_DIR is not writable by the current user. Please run as root or choose a writable directory. See --help."
  exit 1
fi

if ! command -v curl >/dev/null 2>&1; then
  echo "This script requires curl. Please install curl and try again."
  exit 1
fi

# Global to prevent multiple curl calls
latest_version=""

# Determine system architecture and OS
map_architecture() {
  local os=$(uname -s | tr '[:upper:]' '[:lower:]')
  local arch=$(uname -m | tr '[:upper:]' '[:lower:]')

  case "$os" in
  linux) os="linux" ;;
  darwin) os="darwin" ;;
  *)
    echo "Your OS ${os} is unsupported. Must be either Linux or Darwin."
    return 1
    ;;
  esac

  case "$arch" in
  x86_64) arch="x86_64" ;;
  amd64) arch="amd64" ;;
  arm64) arch="arm64" ;;
  aarch64) arch="aarch64" ;;
  *)
    echo "Your architecture ${arch} is unsupported. Must be either x86_64, amd64, arm64, or aarch64."
    return 1
    ;;
  esac

  echo "${os}-${arch}"
  return 0
}

# Check dotenvx version installed vs. provided or latest available
_dotenvx_check_installed() {
  local target_version="$1"
  local installed_version=$("$INSTALL_DIR/dotenvx" --version 2>/dev/null || echo "0")

  if [ -n "$target_version" ]; then
    if [ "$installed_version" = "$target_version" ]; then
      echo "dotenvx v$target_version is already installed at $INSTALL_DIR/dotenvx"
      return 0
    else
      return 1
    fi
  fi

  # Fetch the latest version number from GitHub tags
  latest_version=$(curl -s "https://api.github.com/repos/dotenvx/dotenvx/tags" |
    grep '"name"' |
    head -1 |
    sed -E 's/.*"v([^"]+)".*/\1/' 2>/dev/null)

  if [ -z "$latest_version" ]; then
    echo "Error: Failed to fetch the latest dotenvx version from github tags" >&2
    exit 1
  fi

  # newer semver will sort to top
  local newer=$(echo -e "${installed_version}\n${latest_version}" | sort -r | head -1)
  if [ "$installed_version" != "$newer" ]; then
    return 1
  fi

  echo "dotenvx v$installed_version is already installed at $INSTALL_DIR/dotenvx, and is the latest version available"
  return 0
}

# Install or upgrade dotenvx
_install_dotenvx() {
  local sys_info=$(map_architecture)
  if [ $? -ne 0 ]; then
    exit 1
  fi

  local version="${1:-$latest_version}"
  local filename="dotenvx-${version}-${sys_info}.tar.gz"
  local url="https://github.com/dotenvx/dotenvx/releases/download/v${version}/${filename}"
  local tmpdir=$(command mktemp -d)
  local install_message="Installing dotenvx from Github releases: v${version}/${filename}"

  local progress="--progress-bar"
  if [ -n "$CI" ] && [ "$CI" != "0" ]; then
    progress="--no-progress-meter"
  fi

  echo "$install_message"

  curl $progress --fail -L --proto '=https' -o "$tmpdir/dotenvx.tar.gz" "$url"
  tar xz -C $INSTALL_DIR -f "$tmpdir/dotenvx.tar.gz"
  rm -r "$tmpdir"

  if [ "$(command which dotenvx)" != "$INSTALL_DIR/dotenvx" ]; then
    echo "Warning: Conflicting dotenvx found at $(command which dotenvx)" >&2
    echo "Warning: Please update your path to include $INSTALL_DIR" >&2
    export PATH="$INSTALL_DIR:$PATH"
  fi

  echo "Installed dotenvx v$(dotenvx --version) successfully at $INSTALL_DIR/dotenvx"
}

# Main logic
if [ -n "$VERSION" ]; then
  # Check if the specified version is already installed
  if _dotenvx_check_installed "$VERSION"; then
    exit 0
  else
    _install_dotenvx "$VERSION"
  fi
else
  # Default behavior when no version is specified. Attempt to use latest from GH.
  if ! _dotenvx_check_installed; then
    _install_dotenvx
  fi
fi
@motdotla
Copy link
Contributor

This is pretty great. Thank you @nullbio.

I have mixed feelings on moving off a domain that would allow us to substitute where we mirror/host the binaries in the future, but generally I like the idea of removing another piece of complexity (keeping dotenvx.sh maintained, up, and secure).

any idea on rate limits for raw.github requests? part of the solution for dotenvx.sh was to boost up to the 5,000 requests per hour per IP 1. Requests can quickly add up for those installing essentially from the same ip inside AWS or Digital Ocean, etc.

@motdotla
Copy link
Contributor

motdotla commented May 28, 2024

maybe there are 3 questions here:

  1. what's the rate limit on api.github.com (for tag listing)
  2. what's the rate limit on raw.github.com (for .sh file)
  3. what's the rate limit on /releases/download (for actual download)

--

  1. only 60 requests per hour: source
  2. appears to be unlimited ?? hard to find data on this.
  3. appears to be unlimited There is no limit on the total size of a release, nor bandwidth usage source

@motdotla
Copy link
Contributor

additional context:

Sentry publishes their binaries to npm and their installer script under their domain (which appears not to be open sourced):

Pkgx hosts their installer.sh on github but under their own domain code (which appears partially open sourced):

Other companies use cloudsmith.com or host static binaries at their own s3 bucket.

ideally, my solution would be to offer such transparency and tooling around the dotenvx.sh domain, site, and hosting that it could be trusted and verified. somehow tooling around confirming the dns, ownership of the domain, verifying the code deployed is what is open sourced, etc. a checksum for the running site servicing the binaries so to speak. this would allow us to continue hosting or redirecting to github, but offer flexibility in the future to move binary host providers if needed (for example switching to s3).

in addition to that maybe we provide a normal installer.sh and 'paranoid' paranoid-installer.sh that is stricter about access on machine (sudo, usr/local/bin/ path etc). there's a balance here to keep in mind for ease of use and distribution vs production. the same sort of balance that .env files handle elegantly. maybe this is an opportunity to introduce a similar concept to binary installs and distribution.

@motdotla
Copy link
Contributor

motdotla commented May 28, 2024

one more added note:

I think dotenvx is currently one of the most transparent when it comes to hosting and publishing its binaries. the process can be completely duplicated by:

  1. forking the dotenvx repo
  2. run the publishing steps in the .github/workflows/ci.yml file
  3. publishing to your location of choice

the only potential distribution security concern i've seen brought up is the same one you point to - the domain because fear of a mitm attack (which arguably github actions has as well - complexity, a junior github employee introducing a potential bug, an attacker gaining access to our ci process by credentials or some other means.) the github domain just has more trust attached to it in the public's psyche - well earned.

@motdotla
Copy link
Contributor

downloading sentry's binary directly from the npm registry:

curl -O https://registry.npmjs.org/@sentry/cli-darwin/-/cli-darwin-2.32.1.tgz

this is very attractive because:

  1. no bandwidth limits
  2. no rate limits
  3. statistics - installing this way counts toward the weekly download counts https://www.npmjs.com/package/@sentry/cli-darwin?activeTab=code (right now dotenvx has no insight into where downloads are going - windows, mac, arm, intel, etc)

@nullbio
Copy link
Author

nullbio commented May 28, 2024

All excellent points @motdotla.

The rate limits for the unauthenticated endpoint are on the low end, I can see how that might be an issue in some setups. The script doesn't perform the API call if the user specifies an exact version, but for those who always fetch the latest it would be.

Checksums are another option that might serve as an addendum security feature, sha256sum is GNU Core Utilities so it should be available on most systems out of the box. We could use something like this (should be forked first though): https://github.com/marketplace/actions/generate-checksum to include a checksum file for each release. Github & npmjs use predictable URL pathing, and if we know what version the user is requesting, we can curl the checksum file to compare the hash against the downloaded archive inside the installer script.

npmjs.com could work as well, they have a public API here, example using react package: https://registry.npmjs.org/react which contains a "dist-tags:{latest:version}" version number, which could work in the same vein as the Github API request.

Or we could keep the heroku endpoint to return the latest version number, and use Github (or npmjs) for the downloading.

So seems we have a few best-options here:

  • Heroku or Github for hosting the sh script
  • Heroku endpoint or npmjs API for version number fetching
  • Npmjs or Github for downloading archive
  • Optionally, keep archive downloads on Heroku, and have the sh script pull the checksum from Heroku or Github and compare against the Heroku download

Whatever we choose to do here, I think keeping the --version and --path features are useful for everyone. We don't need to change the default installation path if you would rather keep things how they are, I think the error messages make it fairly clear.

Whatever scenario you choose, I'm happy to make the necessary changes to the script.

@motdotla
Copy link
Contributor

agree 100% on your improvements with the version flag, etc. those are great additions.

what do you think about the installer script embedding all possible versions? too noisy?

  • it would remove another potential security or failure point
  • it would give the installer.sh script a feel like an npm page has: a list of all available versions right in your face before you install.
  • if we designed the installer script in a certain way it could also have the same sections: an embedded short README section, a list of versions and their publish dates, heck even directly embed the full url path to each version so that it is extra explicit (sometimes it is a hassle to read an installer.sh script and actually identify what fully formed url it is hitting to download a binary)

so in other words we'd programmatically publish/generate the installer.sh script. the script itself would gain git history coupled with version releases.

@nullbio
Copy link
Author

nullbio commented May 29, 2024

A few points to consider for those who want to store the installer script locally instead of curling it:

  • You've got a lot of extra noise and bytes in your file for source control.
  • If we remove the endpoints for checking for the latest version it would prevent people from using it as an update tool if an update is available, because their installer script would be missing the new versions. If you think this is a significant problem we could work around it by adding an optional flag to check externally for latest version, and document this on the website/docs.
  • Since we're trying to limit as many external dependencies as possible and rely on core utilities, the more data we add, the more difficult it'll be to parse (there's no nice core utilities for parsing json, yaml, etc that I'm aware of). If it's just the version, likely not so bad because we could just store it as a variable like: "0.0.1#0.0.2#0.0.3" and easily parse it in a loop, but it'll get complex quickly, along with added filesize (larger file for those using it locally, slower curls for initial fetch if getting it remotely), if also storing a lot of other data.

There are some good gains to be had:

  • Removes an extra curl check on version number.
  • Does increase transparency a little bit.
  • Prevents breakages related to the version strings, for example if there's a release with a name: v1.1.1.1rc5 that strays from the regular convention, for whatever reason, it won't explode due to failed sorts based on newest release because the list will already be sorted by us when we inject it.

Overall I think it's a good idea and this might be the way to go, I just think we should try and limit the spam as much as possible. If we add just the version numbers and a separator, for arguments sake let's say we give 9 bytes (v11.11.11) per version number, plus the separator character, you're looking at 1-2kb given the current number of tags. Over time, I suppose we might end up adding anywhere up to 20kb to this file (over many years). Not really a big deal in the scheme of things I suppose. Plus you could always truncate the old version numbers later (that might actually be the way to go, only keep the last x number of versions. That way you could make the argument to add extra data in as well without it blowing up the file too significantly).

@motdotla
Copy link
Contributor

good points. my thoughts:

You've got a lot of extra noise and bytes in your file for source control

yeah, but also removes the need to visit a remote url. all the info is right there. i like this - like visiting a README

If we remove the endpoints for checking for the latest version it would prevent people from using it as an update tool if an update is available

for the security-minded this is actually a benefit. they can modify the script to only have specific versions available - and they can place that in their infrastructure. we've given them more control.

in the details of the installer.sh script we can provide a url where they can visit to update their script.

i'm going to make a first pass at this using what you've already built here. great conversation, thank you! i think we are going to end up with something that is better than 99% of installer scripts - could be an example for future ones.

@nullbio
Copy link
Author

nullbio commented May 30, 2024

Sounds good, excited to see where you get with it! Happy to lend a hand if you need, just let me know.

@motdotla
Copy link
Contributor

motdotla commented Jun 1, 2024

on vacation at the moment with family, but enjoying this work in the early mornings! some effort started here: #4

@motdotla
Copy link
Contributor

making progress on this: https://dotenvx.sh/install.sh

includes specs: https://github.com/dotenvx/dotenvx.sh/blob/main/spec/install_spec.sh

i'm going to start moving some of my own stuff over to it to test in the real world.

then still to do:

  1. built-in README like documentation in the install.sh script
  2. some way to auto-generate this and auto-publish when each new version is published (script will likely move to dotenvx/dotenvx repo and be part of the release packages.

@motdotla
Copy link
Contributor

after some solid real world testing, i've updated the documentation to use dotenvx.sh/install.sh

https://dotenvx.com/docs
https://dotenvx.com/docs/platforms/docker
https://dotenvx.com/docs/cis/github-actions

@motdotla
Copy link
Contributor

working on auto-generate next, but maybe before that publishing binaries to npm. then downloads can happen from there - which will give stats on each os/arch (needed)

@motdotla
Copy link
Contributor

now that https://dotenvx.sh/install.sh has built-in branding and documentation, I'm thinking it should be at root. so that when you visit:

https://dotenvx.sh

you simply see the plaintext /install.sh script.

what do you think?

  1. it would make all the curl https://dotenvx.sh | sh install commands clean/short/uniform around the web
  2. it is easy to remember for dev.
  3. possibly it starts a pattern of this for other developer companies providing .sh install scripts.

downside:

  1. user visiting in the browser and then clicking 'save as' doesn't automatically get the ./install.sh file loaded in their save as prompt
  2. user might mistake it as not plain text because it does not have the filename extension

what are your thoughts @nullbio ?

@nullbio
Copy link
Author

nullbio commented Jun 14, 2024

That's great @motdotla, looking very clean.

I think the url idea is cool, personally have no issue with it. I think it's obvious it is a sh script from the domain name + the contents, but perhaps I'm biased.

@motdotla
Copy link
Contributor

script is now using the npm published binaries.

last major step is to
automate generation of this with each new release and publish to heroku site.

@motdotla
Copy link
Contributor

this is done. all future work on install.sh now lives in https://github.com/dotenvx/dotenvx

it is published as part of the release, then automatically committed to this repo, and automatically deployed.

we get download count going forward:

Screenshot 2024-06-16 at 9 35 46 AM

this is all living under /install.sh but on Monday I will switch the root https://dotenvx.sh to install.sh. The older installer will continue to live under https://dotenvx.sh/installer.sh for now.

@nullbio
Copy link
Author

nullbio commented Jun 19, 2024

Great job @motdotla ! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants