Skip to content

Commit

Permalink
refs platform/#2978: implement init filesystem and seed archives feature
Browse files Browse the repository at this point in the history
  • Loading branch information
Monska85 committed Aug 7, 2024
1 parent 7b14693 commit e15dd0f
Show file tree
Hide file tree
Showing 8 changed files with 228 additions and 52 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
/initarchives/**
!/initarchives/.gitkeep
/initfiles/**
!/initfiles/.gitkeep
/initfilesystem/**
!/initfilesystem/.gitkeep

/tmp

# IDE files
.idea
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
FROM alpine:3.20

RUN apk add --no-cache minio minio-client bash date && \
ln -fs /usr/bin/mcli /usr/bin/mc
RUN apk add --no-cache minio minio-client \
bash date file rsync tar unzip xz \
&& ln -fs /usr/bin/mcli /usr/bin/mc

# Copy scripts folder
COPY scripts /scripts
Expand Down
26 changes: 16 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,28 @@ MINIO_ROOT_PASSWORD ?= minioadmin
build:
docker build -t $(IMAGE_NAME):$(IMAGE_TAG) .

cli: build
@docker run --rm -it \
-e BUCKET_NAME=$(BUCKET_NAME) \
-e MINIO_ROOT_USER=$(MINIO_ROOT_USER) \
-e MINIO_ROOT_PASSWORD=$(MINIO_ROOT_PASSWORD) \
--network host \
$(IMAGE_NAME):$(IMAGE_TAG) mc ls minio/$(BUCKET_NAME)

start: build
@docker run \
@docker run --rm \
-e BUCKET_NAME=$(BUCKET_NAME) \
-e MINIO_ROOT_USER=$(MINIO_ROOT_USER) \
-e MINIO_ROOT_PASSWORD=$(MINIO_ROOT_PASSWORD) \
-e MINIO_BROWSER=on \
-e MINIO_VERSION_ENABLED=1 \
-p 9000:9000 \
-p 9001:9001 \
-v ./initarchives:/docker-entrypoint-initarchives.d \
-v ./initfiles:/docker-entrypoint-initfiles.d \
$(IMAGE_NAME):$(IMAGE_TAG)
# -v ./initfilesystem:/docker-entrypoint-initfs.d \
mc: build
@docker run --rm -it \
-e BUCKET_NAME=$(BUCKET_NAME) \
-e MINIO_ROOT_USER=$(MINIO_ROOT_USER) \
-e MINIO_ROOT_PASSWORD=$(MINIO_ROOT_PASSWORD) \
--network host \
--entrypoint bash \
$(IMAGE_NAME):$(IMAGE_TAG) -ilc 'mc config host add minio http://localhost:9000 $(MINIO_ROOT_USER) $(MINIO_ROOT_PASSWORD) && bash -il'

aws-cli:
@docker run --rm -it \
Expand All @@ -36,4 +39,7 @@ aws-cli:
-e AWS_ENDPOINT_URL=http://localhost:9000 \
--network host \
--entrypoint bash \
amazon/aws-cli -il
amazon/aws-cli -il

minio-console:
xdg-open http://localhost:9001
94 changes: 74 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,86 @@
# Docker MinIO

This is a simple docker image for running a minio server. It is based on alpine linux and uses the minio and minio-client packages from the alpine package repository.
This is a simple docker image for running a MinIO server. It is based on alpine linux and uses the `minio` and `minio-client` packages from the alpine package repository.

The `/scripts/entrypoint.sh` script is used to start the minio server. It is possible to configure the container to create and populate the new bucket at startup by setting the `BUCKET_NAME` environment variable and adding the seed files to the folder defined by the `INITFILES_FOLDER` environment variable.
The `/scripts/entrypoint.sh` script is used to start the MinIO server. It is possible to configure the container to create and populate the new bucket at startup by setting the `BUCKET_NAME` environment variable.

## Bucket initialization

You can use two different methods to initialize the bucket: the `FileSystem` initialization and the `Seed` initialization.

## FileSystem initialization

If the `INITFILESYSTEM_FOLDER` path is not empty, the files in the folder are copied to the `BUCKET_ROOT` destination folder as a startup filesystem for the MinIO server. **The folder must contain a valid MinIO filesystem structure.** **ATTENTION**: when the `INITFILESYSTEM_FOLDER` is used, the startup process takes care of checking that the `BUCKET_NAME` is configured accordingly with the final MinIO server configuration, performing the filesystem check (the `BUCKET_NAME` folder is present in the `BUCKET_ROOT` folder) and the MinIO client check (the `mc ls ${MC_ALIAS}/${BUCKET_NAME}` command does not return an error).

If this initialization method is used, **no other initialization actions are performed**. The final bucket content is only the data present in the `INITFILESYSTEM_FOLDER` folder.

## Seed initialization

This initialization method is used to populate the bucket with **seed archives and files**. Both the archives and the files are processed at startup, and the resulting bucket content is the union of the archives and files content.

The archives are processed first, and then the files. So, if the file with the same name is present in one or more archives and in the `INITFILES_FOLDER` folder, the file in the `INITFILES_FOLDER` will overwrite the files in the archives. If the bucket has a versioning enabled, the previous versions of the files are preserved. See the [Versioning enabled example](#versioning-enabled-example) for more details.

If the `DO_NOT_PROCESS_INITFILES` environment variable is set to `1`, the seed archives and files are not processed at startup.

### Process the seed archives

If the `INITARCHIVES_FOLDER` path is not empty, the archives in the folder are extracted in a temporary folder and all the resulting files and folders are uploaded in the bucket defined by the `BUCKET_NAME` environment variable using the MinIO client (`mc cp --recursive`). The archives are extracted and files are uploaded sequentially in the alphabetical order of the filenames, using the output of the `ls -A ${INITARCHIVES_FOLDER}` command. So, if the archives contain the same files, the last archive extracted will overwrite the files extracted by the previous archives. If the bucket has a versioning enabled, the previous versions of the files are preserved.

**The supported archive formats are `.zip`, `.tar`, `.tar.gz`, `.tar.bz2` and `.tar.xz`.**

### Process the seed files

If the `INITFILES_FOLDER` path is not empty, the files in the folder are uploaded in the bucket defined by the `BUCKET_NAME` environment variable using the MinIO client (`mc cp --recursive`). As for the archives, if the `INITFILES_FOLDER` contains files with the same name of one or more files already present in the bucket (processed by the `INITARCHIVES_FOLDER`), the files in the `INITFILES_FOLDER` will overwrite the files in the bucket. If the bucket has a versioning enabled, the previous versions of the files are preserved.

### Versioning enabled example

To be more clear, here is an example of the initialization process if the versioning is enabled:

1. The `INITARCHIVES_FOLDER` contains the following archives:
- `archive1.zip` with the files `file1.txt` and `file2.txt`
- `archive2.tar.xz` with the files `file2.txt` and `file3.txt`
2. The `INITFILES_FOLDER` contains the following files:
- `file2.txt`
- `file3.txt`
- `file4.txt`
- `file5.txt`

The resulting bucket content will be:

- `file1.txt` with the content of the `archive1.zip` file
- `file2.txt`, in the current version, with the content of the `INITFILES_FOLDER` file
- `file2.txt`, in the first version, with the content of the `archive1.zip` file
- `file2.txt`, in the second version, with the content of the `archive2.tar.xz` file
- `file3.txt`, in the current version, with the content of the `INITFILES_FOLDER` file
- `file3.txt`, in the first version, with the content of the `archive2.tar.xz` file
- `file4.txt` with the content of the `INITFILES_FOLDER` file
- `file5.txt` with the content of the `INITFILES_FOLDER` file

## Environment Variables

| Variable | Description | Default |
| -------------------------- | ----------------------------------------------------------- | -------------------------------- |
| `BUCKET_NAME` | The name of the bucket to create and populate at startup. | `-` |
| `BUCKET_ROOT` | The folder used by the minio server to store the files. | `/data` |
| `INITFILES_FOLDER` | The folder where the seed files are stored. | `/docker-entrypoint-initfiles.d` |
| `DO_NOT_PROCESS_INITFILES` | If set to `1`, the seed files are not processed at startup. | `0` |
| `MINIO_VERSION_ENABLED` | If set to `1`, the minio version is enabled. | `0` |
| `MINIO_ROOT_USER` | The access key used to authenticate with the minio server. | `-` |
| `MINIO_ROOT_PASSWORD` | The secret key used to authenticate with the minio server. | `-` |
| `MINIO_BROWSER` | If set to `on`, the minio console is enabled. | `off` |
| `MINIO_CONSOLE_PORT` | The port used by the minio console. | `9001` |
| `MINIO_OPTS` | Additional options to pass to the minio server. | `-` |
| `MC_ALIAS` | The alias used by the minio client. | `minio` |
| `MINIO_PROTO` | The protocol used to connect to the minio server. | `http` |
| `MINIO_HOST` | The host used to connect to the minio server. | `localhost` |
| `MINIO_PORT` | The port used to connect to the minio server. | `9000` |
| Variable | Description | Default |
| -------------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------------------------------- |
| `BUCKET_NAME` | The name of the bucket to create and populate at startup. | `-` |
| `BUCKET_ROOT` | The folder used by the MinIO server to store the files. | `/data` |
| `INITFILESYSTEM_FOLDER` | The folder where the root init filesystem is stored. If not empty, the files are copied to the `BUCKET_ROOT` folder. | `/docker-entrypoint-initfs.d` |
| `INITARCHIVES_FOLDER` | The folder where the seed archives are stored. | `/docker-entrypoint-initarchives.d` |
| `INITFILES_FOLDER` | The folder where the seed files are stored. | `/docker-entrypoint-initfiles.d` |
| `DO_NOT_PROCESS_INITFILES` | If set to `1`, the seed archives and files are not processed at startup. | `0` |
| `MINIO_ROOT_USER` | The access key used to authenticate with the MinIO server. | `-` |
| `MINIO_ROOT_PASSWORD` | The secret key used to authenticate with the MinIO server. | `-` |
| `MINIO_VERSION_ENABLED` | If set to `1`, the MinIO version is enabled. | `0` |
| `MINIO_OPTS` | Additional options to pass to the MinIO server. | `-` |
| `MINIO_BROWSER` | If set to `on`, the MinIO console is enabled. | `off` |
| `MINIO_CONSOLE_PORT` | The port used by the MinIO console. | `9001` |
| `MC_ALIAS` | The alias used by the MinIO client. | `minio` |
| `MINIO_PROTO` | The protocol used to connect to the MinIO server. | `http` |
| `MINIO_HOST` | The host used to connect to the MinIO server. | `localhost` |
| `MINIO_PORT` | The port used to connect to the MinIO server. | `9000` |

### Deprecated Variables

| Variable | Description |
| ------------------ | ------------------------------------------------------------------------------------------------- |
| `OSB_BUCKET` | The name of the bucket to create and populate at startup. **Use `BUCKET_NAME` instead.** |
| `MINIO_ACCESS_KEY` | The access key used to authenticate with the minio server. **Use `MINIO_ROOT_USER` instead.** |
| `MINIO_SECRET_KEY` | The secret key used to authenticate with the minio server. **Use `MINIO_ROOT_PASSWORD` instead.** |
| `MINIO_ACCESS_KEY` | The access key used to authenticate with the MinIO server. **Use `MINIO_ROOT_USER` instead.** |
| `MINIO_SECRET_KEY` | The secret key used to authenticate with the MinIO server. **Use `MINIO_ROOT_PASSWORD` instead.** |
Empty file added initarchives/.gitkeep
Empty file.
Empty file added initfilesystem/.gitkeep
Empty file.
101 changes: 93 additions & 8 deletions scripts/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,26 @@ minio_wait_for_readiness() {
minio_error "Minio server is not ready in ${TRESHOLD} seconds."
}

docker_process_init_filesystem() {
# Copy the contents of the initfs folder to the bucket root.
minio_note "Copying the contents of '${INITFILESYSTEM_FOLDER}' to '${BUCKET_ROOT}'."
minio_note "Please wait, this may take a while..."
rsync -a --delete "${INITFILESYSTEM_FOLDER}/" "${BUCKET_ROOT}/"
minio_note "The contents of '${INITFILESYSTEM_FOLDER}' have been copied to '${BUCKET_ROOT}'."
}

docker_check_init_filesystem() {
# Check if the configured BUCKET_NAME is consistent with the imported filesystem using filesystem folder name.
if [ ! -d "${BUCKET_ROOT}/${BUCKET_NAME}" ]; then
minio_error "The folder '${BUCKET_NAME}' does not exist in the filesystem. The init filesystem is not consistent with the BUCKET_NAME variable. Please configure the BUCKET_NAME variable to match the bucket present in the init filesystem."
fi

# Check if the configured BUCKET_NAME is consistent with the imported filesystem using MinIO client.
if ! mc ls "${MC_ALIAS}/${BUCKET_NAME}" &>/dev/null; then
minio_error "The bucket '${BUCKET_NAME}' does not exist in the MinIO server. The init filesystem is not consistent with the BUCKET_NAME variable. Please configure the BUCKET_NAME variable to match the bucket present in the init filesystem."
fi
}

docker_create_bucket() {
# Check if bucket exists, otherwise create it.
if mc ls "${MC_ALIAS}/${BUCKET_NAME}" &>/dev/null; then
Expand All @@ -44,25 +64,90 @@ docker_create_bucket() {
fi
}

docker_process_init_files() {
docker_process_init_archives_and_files() {
if [ "$(mc ls "${MC_ALIAS}/${BUCKET_NAME}/" | wc -l)" -ne 0 ]; then
minio_note "Bucket '${BUCKET_NAME}' is not empty. Skipping initialization files."
return
fi

minio_note "Bucket '${BUCKET_NAME}' is empty. Processing initialization files."
if [ "$(ls "${INITFILES_FOLDER}" 2>/dev/null | wc -l)" -gt 0 ]; then
local SOMETHING_UPLOADED TMP_ARCHIVE_EXTRACTION_FOLDER LOGERR_FILE
SOMETHING_UPLOADED=0
TMP_ARCHIVE_EXTRACTION_FOLDER="/tmp/archives"
LOGERR_FILE="/tmp/error.log"

minio_note "Bucket '${BUCKET_NAME}' is empty. Processing initialization of the bucket."

# Processing archives.
if [ "$(ls -A "${INITARCHIVES_FOLDER}" 2>/dev/null | wc -l)" -gt 0 ]; then
minio_note "Extracting archives and uploading files to bucket '${BUCKET_NAME}'."

# Extract all archives in the temporary folder.
local archive
for archive in $(ls -A "${INITARCHIVES_FOLDER}"); do
# Just in case the folder contains a .gitkeep file.
if [ "${archive}" = ".gitkeep" ]; then
continue
fi

# Clean up the temporary folder if it exists.
rm -rf "${TMP_ARCHIVE_EXTRACTION_FOLDER}"
# Create a temporary folder for archive extraction.
mkdir -p "${TMP_ARCHIVE_EXTRACTION_FOLDER}"

# Check if the file is a zip archive.
if [ "$(file -b --mime-type "${INITARCHIVES_FOLDER}/${archive}")" = "application/zip" ]; then
minio_note "Extracting archive '${INITARCHIVES_FOLDER}/${archive}' using unzip."
unzip -q "${INITARCHIVES_FOLDER}/${archive}" -d "${TMP_ARCHIVE_EXTRACTION_FOLDER}" 1>/dev/null
else
# Try to extract the archive using tar.
minio_note "Extracting archive '${INITARCHIVES_FOLDER}/${archive}' using tar."
tar xf "${INITARCHIVES_FOLDER}/${archive}" -C "${TMP_ARCHIVE_EXTRACTION_FOLDER}" 1>/dev/null
fi

# We want to upload the contents of each extracted archive to the bucket.
# This is useful when the bucket is configured to use versioning to keep track of the changes.

# Check if the temporary folder is empty.
if [ "$(ls -A "${TMP_ARCHIVE_EXTRACTION_FOLDER}" 2>/dev/null | wc -l)" -eq 0 ]; then
minio_warn "No files found in the extracted archive '${INITARCHIVES_FOLDER}/${archive}'. Skipping upload."
continue
fi

# Copy the contents of the temporary folder to the bucket.
minio_note "Uploading all extracted files to bucket '${BUCKET_NAME}'."
# Note the trailing slash in the source folder. This is required to copy the CONTENTS of the folder and not the folder itself.
mc cp --recursive "${TMP_ARCHIVE_EXTRACTION_FOLDER}/" "${MC_ALIAS}/${BUCKET_NAME}" 1>/dev/null 2>"${LOGERR_FILE}"
if [ -s "${LOGERR_FILE}" ]; then
minio_error "Error uploading files to bucket '${BUCKET_NAME}'."
minio_error "$(cat "${LOGERR_FILE}")"
fi

SOMETHING_UPLOADED=1
done

# Final clean up the temporary folder if it exists.
rm -rf "${TMP_ARCHIVE_EXTRACTION_FOLDER}"

if [ "${SOMETHING_UPLOADED}" = "0" ]; then
minio_warn "All archives in '${INITARCHIVES_FOLDER}' have been processed, but no files were found in the extracted archives destination folder."
fi
fi

# Processing files.
if [ "$(ls -A "${INITFILES_FOLDER}" 2>/dev/null | wc -l)" -gt 0 ]; then
minio_note "Uploading files to bucket '${BUCKET_NAME}'."
# Note the trailing slash in the source folder. This is required to copy the CONTENTS of the folder and not the folder itself.
mc cp --recursive "${INITFILES_FOLDER}/" "${MC_ALIAS}/${BUCKET_NAME}" 1>/dev/null 2>/tmp/minio_error.log
if [ -s /tmp/minio_error.log ]; then
mc cp --recursive "${INITFILES_FOLDER}/" "${MC_ALIAS}/${BUCKET_NAME}" 1>/dev/null 2>"${LOGERR_FILE}"
if [ -s "${LOGERR_FILE}" ]; then
minio_error "Error uploading files to bucket '${BUCKET_NAME}'."
minio_error "$(cat /tmp/minio_error.log)"
minio_error "$(cat "${LOGERR_FILE}")"
fi
return
SOMETHING_UPLOADED=1
fi

minio_note "No files found in '${INITFILES_FOLDER}'. The bucket '${BUCKET_NAME}' will remain empty."
if [ "${SOMETHING_UPLOADED}" = "0" ]; then
minio_note "No files found after processing archives in '${INITARCHIVES_FOLDER}' and files in '${INITFILES_FOLDER}'. The bucket '${BUCKET_NAME}' will remain empty."
fi
}

# Logging functions.
Expand Down
Loading

0 comments on commit e15dd0f

Please sign in to comment.