rclone
is a great open source tool for synchronizing data across various storage platforms. Of the many it supports, AWS S3 and GCP's GCS are just two.
This repo will show some hints and tips to get you started for using rclone in GCP with GCS. For the purposes of this repo, we'll use a GCE VM, but other runtimes such as GKE can do this as well, with minimal modification.
Your service account can use the Object Creator role, or simply Storage Admin. You can optionally set IAM conditions where the resource must be a particular bucket.
You'll need a VM to perform the copy. It's a good idea to put this VM geographically close to your AWS bucket. This VM should use the service account you just created. Here's an example:
gcloud beta compute \
--project=YOUR_PROJECT \
instances create object-sync-worker \
--zone=us-east1-b \
--machine-type=n1-standard-8 \
--subnet=default \
--network-tier=PREMIUM \
--maintenance-policy=MIGRATE \
--service-account=object-sync@YOUR_PROJECT.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=debian-10-buster-v20200413 \
--image-project=debian-cloud \
--boot-disk-size=10GB \
--boot-disk-type=pd-standard \
--boot-disk-device-name=object-sync-worker \
--no-shielded-secure-boot \
--shielded-vtpm \
--shielded-integrity-monitoring \
--reservation-affinity=any
Something like this should do it:
gcloud beta compute ssh \
--zone "us-east4-c" "object-sync-vm" \
--project "YOUR_PROJECT"
You might need to open a firewall rule for this to work. Alternatively, you can connect via the GCP UI's SSH button in the VM listing.
To ensure you get the latest version, download the *-amd64.deb
for the latest here.
For this README, we will use version 1.51.0
, downloaded with: curl -O https://downloads.rclone.org/v1.51.0/rclone-v1.51.0-linux-amd64.deb
Install it with sudo dpkg -i ./rclone-v1.51.0-linux-amd64.deb
.
Install git: sudo apt install -y git
.
Clone this repository to the VM with git clone https://github.com/domZippilli/gcs-s3-rclone.git
.
Change PWD to the repo root with cd gcs-s3-rclone
.
Find out the rclone config file path by running rclone config file
.
$ rclone config file
Configuration file doesn't exist, but rclone will use this path:
/home/YOUR_USERNAME/.config/rclone/rclone.conf
Create this file by copying the rclone.conf in this directory over it (unless you already have one, in which case you should concatenate it).
cp rclone.conf /home/$USER/.config/rclone/rclone.conf
Alternatively, you can create a symlink to the rclone.conf
in this repo:
ln -s $PWD/rclone.conf /home/$USER/.config/rclone/rclone.conf
(Just be sure not to commit and push any secrets!!!)
The [gcs]
remote for rclone will use the environment metadata from your VM to configure itself, so all you need to add to the configuration file is information to authenticate with AWS.
To do so, put your Access Key and Secret Access Key in the rclone.conf file with an editor as appropriate:
access_key_id = AK_ID
secret_access_key = SK_ID
If you'd like, there are other options like environment variables you can use. See more in the rclone docs.
Now you're all set. At the simplest level, a sync between two buckets should look like this:
rclone sync -P s3:AWS_BUCKET gcs:GCS_BUCKET
You may need/want to experiment with the following command line options:
--s3-region
: Set the region where the source bucket is. You'll need this in order to access the bucket. You can set a default in the rclone.conf file.--transfers
: Set the number of simultaneous transfers. The default is 4, and in many cases (dependent on cores, connectivity, and other factors) higher numbers will result in higher throughput. For more info, see the docs.
Other rclone commands, such as lsd
and ls
, should work with both providers as well. For a full list of commands, see here.
Work in this repo is written by a Googler, but this project is not supported by Google in any way. As the LICENSE
file says, this work is offered to you on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
Apache 2.0 - See the LICENSE for more information.
Copyright 2020 Google, Inc.