radula is a small utility to add some friendliness to RadosGW for our team working with Ceph for S3-like storage. Little more than a wrapper for boto radula saves us time and headache by using nice defaults.
The primary functions for the current version are
- Inspect radosgw bucket/key ACLs
- Spot differences between bucket and key ACLs
- Allow or disallow user read/write to buckets and keys
- When modifying a bucket ACL, modify the ACLs for keys as well
- Verify uploads using checksums
- Upload using multiple threads
The name, "radula", is a cephalopod-related term that hit close to RADOS. It's not a tongue, or really even teeth; it's more like if your tongue had teeth. Spooky.
Install radula from pypi using
pip
.
pip install radula
The radula
command should be available in your $PATH
.
Install the pip packages listed in testing-requirements.txt
and run nosetests
.
$ pip install -U -r testing-requirements.txt $ nosetests --with-coverage --cover-package=radula
The effort to increase code coverage is ongoing.
radula uses boto, so all configuration is really boto configuration, with some extensions to support streaming copy operations, see the Streaming Copy section below for those custom items. Notable changes are replacing the url to amazon aws with that of one of your gateways. Where applicable, you may have to disable SSL as a default option.
# example shared /etc/boto.cfg [s3] host = radosgw1.your_company.com [Boto] is_secure = False
To add your personal credentials, fill in the following in ~/.boto
:
[Credentials] aws_access_key_id = abcdef... aws_secret_access_key = 0123456... [profile other_role] aws_access_key_id = wxyz... aws_secret_access_key = 9765432...
The configuration file can be read from different paths. This is determined by the boto library, and the order of paths that are checked is:
- The path specified by the
BOTO_CONFIG
environment variable. - The file
~/.boto
- The file
~/.aws/credentials
- The file
/etc/boto.cfg
The command structure for radula is
radula [flags] command subject [target]
. The "subject" matter or
"target" of a request could be a local resource or a remote one,
depending on the command being executed. These could be read as "source"
and "destination" in some cases, but the intent is simply to flow left
to right.
$ usage: radula [-h] [--version] [-r] [-w] [-t THREADS] [-p PROFILE] [-d DESTINATION] [-f] [-y] [-c CHUNK_SIZE] [-l] [-L LOG_LEVEL] [-v] [-n] [-z] [-e] [-A] [-i] [-k] [--no-acl] [{acls,get-acl,set-acl,compare-acl,sync-acl,allow,allow-user,disallow,disallow-user,mb,make-bucket,rb,remove-bucket,lb,list-buckets,put, up,upload,get,dl,download,mpl,mp-list,multipart-list,mpc,mp-clean,multipart-clean,rm,remove,keys,ls,list,info,size,etag,remote-md5,remote-rehash,verif y,sc,streaming-copy,copy,cat,url,get-url,local-md5,profiles}] [subject] [target] ... RadosGW client positional arguments: {acls,get-acl,set-acl,compare-acl,sync-acl,allow,allow-user,disallow,disallow-user,mb,make-bucket,rb,remove-bucket,lb,list-buckets,put,up,upload, get,dl,download,mpl,mp-list,multipart-list,mpc,mp-clean,multipart-clean,rm,remove,keys,ls,list,info,size,etag,remote-md5,remote-rehash,verify,sc, streaming-copy,copy,cat,url,get-url,local-md5,profiles} command subject Subject target Target remainder Additional targets for supporting commands. See README optional arguments: -h, --help show this help message and exit --version Prints version number -r, --read During a user grant, permission includes reads -w, --write During a user grant, permission includes writes -t THREADS, --threads THREADS Number of threads to use for uploads. Default=3 -p PROFILE, --profile PROFILE Boto profile. Overrides AWS_PROFILE environment var -d DESTINATION, --destination DESTINATION Destination boto profile, required for streaming copy -f, --force Overwrite local files without confirmation -y, --verify Verify uploads after they complete. Uses --threads. When passed a destination profile, download and hash keys on both ends -c CHUNK_SIZE, --chunk CHUNK_SIZE multipart upload chunk size in bytes. -l, --long-keys prepends bucketname to key results. -L LOG_LEVEL, --log-level LOG_LEVEL Log level, [DEBUG, 10, INFO, 20, etc] -v, --verbose Verbose. Equiv to -L DEBUG -n, --dry-run Print would-be deletions without deleting -z, --resume Resume uploads if needed. -e, --encrypt Store content encrypted at rest -A, --all-buckets act upon all buckets (info only) -i, --ignore-existing Calmly skip existing files; an opposite -f (otherwise errors) -k, --preserve-key When downloading, preserve paths in keys --no-acl When uploading, do not sync key ACL with the bucket ACL. Normally would.
This is a quick walkthrough of the features so far. In these scenarios,
we acting as the user bibby
, who owns the rados bucket mybucket
.
In some of the examples, we'll be manipulating the access to this bucket
for a second user called fred
.
Contained in the bucket are two regular files: hello
and world
.
See Boto docs for working with profiles.
[bibby@machine ~]$ radula profiles here there * DEFAULT
[bibby@machine ~]$ radula get-acl mybucket ACL for bucket: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL
The command get-acl
prints the acl. radula assumed that the term
mybucket
was a bucket, being that it was a lone term.
[bibby@machine ~]$ radula get-acl mybucket/hello ACL for key: mybucket/hello [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL
Because the term contained a slash, the subject is correctly identified
as hello
within the bucket mybucket
.
[bibby@machine ~]$ radula compare-acl mybucket Bucket ACL for: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL --------- Keys with identical ACL: 2 Keys with different ACL: 0
The compare-acl
command on a bucket will report of the sameness of
ACLs across the keys as compared to the bucket. We'll see this again
later in another example.
This can be run against one key, limiting the compared objects to the one key against its bucket
[bibby@machine ~]$ radula check-acl mybucket/hello Bucket ACL for: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL --------- Keys with identical ACL: 1 Keys with different ACL: 0
Can set the ACL of a bucket or key to one of the four AWS "canned"
policies using set-acl
. In this scenario, the subject can be a
bucket or a key, with the target being a canned policy name.
[bibby@machine ~]$ radula set-acl mybucket/hello public-read << prints the output of get-acl after completing the operation
Changing the ACL on a bucket will will be applied to the keys as
well, potentially overwriting any custom access given to keys. Run
compare-acl
before setting the bucket ACL to discover any special
differences, as they may need to be recreated after the set-acl
operation completes.
Should a difference of ACL had appeared, we could forcefully replace all
key ACLs with the bucket's ACL using sync-acl
.
[bibby@machine ~]$ radula sync-acl mybucket Bucket ACL for: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL --------- Setting bucket's ACL on hello Setting bucket's ACL on world
This is a PUT
command, so it doesn't bother to look at the current
ACL for the keys; it just puts a copy of the bucket's own ACL.
sync-acl
can be done on a single key as well.
[bibby@machine ~]$ radula sync-acl mybucket/world Setting bucket's ACL on world
To grant access to another user, we'll make use of some new flags.
-r
and/or -w
to indicate read and write. A grant may have one or
both of rw
. If both are absent, read
is assumed. Permissions are
separate, so it is possible to have a write-only grant.
For permission grants the subject is the user (as far as the usage format in the help text goes), and the target is the key or bucket.
[bibby@machine ~]$ radula allow fred mybucket/hello granting READ to fred on key hello
Multiple grants to the same user for the same permission are possible in rados and on s3, but radula will guard against that and ignore the duplicate entry. Here, we'll add "read-write":
[bibby@machine ~]$ radula -wr allow fred mybucket/hello User fred already has READ for key hello, skipping granting WRITE to fred on key hello
[bibby@machine ~]$ radula -wr allow fred mybucket granting READ to fred on bucket mybucket granting WRITE to fred on bucket mybucket User fred already has READ for key <Key: mybucket,hello>, skipping User fred already has WRITE for key <Key: mybucket,hello>, skipping granting READ to fred on key <Key: mybucket,world> granting WRITE to fred on key <Key: mybucket,world>
With both allow
and disallow
, if an ACL difference exists
between the bucket and a key, that difference may still exist after the
modification. With these commands, we aren't syncing a modified
bucket ACL down to the keys; we're applying the same singular change to
each target individually.
Removing permissions works similarly to granting access, but with some differences. One assumption is about the omission of the read-write flags; If neither are present, both permissions are removed.
start | flags | result |
---|---|---|
RW | -r | W |
RW | -w | R |
RW | -rw | |
RW |
ACLs for the keys are modified first. The user's access cannot be taken away from the bucket if it still exists for one of its keys, so the changes take place from bottom up.
Starting with a blank slate:
[bibby@machine ~]$ radula -wr disallow fred mybucket No change for <Key: mybucket,hello> No change for <Key: mybucket,world> No change for mybucket
Give fred
read on the bucket
[bibby@machine ~]$ radula -r allow fred mybucket granting READ to fred on bucket mybucket granting READ to fred on key <Key: mybucket,hello> granting READ to fred on key <Key: mybucket,world>
Give fred
write on one key
[bibby@machine ~]$ radula -w allow fred mybucket/world granting WRITE to fred on key world
Confirm the difference..
[bibby@machine ~]$ radula compare-acl mybucket Bucket ACL for: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL [CanonicalUser] Fred Fredricks = READ --------- Difference in world: [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL [CanonicalUser] Fred Fredricks = READ [CanonicalUser] Fred Fredricks = WRITE Keys with identical ACL: 1 Keys with different ACL: 1
Plow the keys with the bucket's settings.
[bibby@machine ~]$ radula sync-acl mybucket Bucket ACL for: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL [CanonicalUser] Fred Fredricks = READ --------- Setting bucket's ACL on hello Setting bucket's ACL on world [bibby@machine ~]$ radula check-acl mybucket Bucket ACL for: mybucket [CanonicalUser:OWNER] Andrew Bibby = FULL_CONTROL [CanonicalUser] Fred Fredricks = READ --------- Keys with identical ACL: 2 Keys with different ACL: 0
These functions are similar for moving files in and out of the radosgw.
Its intention is not to replace better tools like s3cmd
, but rather
to cover some very common use cases so that the installation and
configuration of additional libraries might not be needed.
The commands put
, up
, and upload
are equivalent. For these
examples, I've chosen to use up
.
The syntax is radula up {source} {target}
, where source is a local
file or a glob. The target is a in radosgw path, and its behavior
depends on the singularity or plurality of the source given.
If the target path ends with a slash (/
), then the key is presumed
to be the basename of the object appended at that path. See table
below.
If multiple source files are given, the key will always assume it is part of a path, making an ending slash wholly optional.
When using globs, it's important to know that the argument must be
quoted to avoid shell expansion. For example to upload all files
starting with the letter a
from path
, the command would be
radula up 'path/a*' bucket/path
source | target | result |
---|---|---|
/some/file | bucket | bucket/file |
/some/file | bucket/file | bucket/file |
/some/file | bucket/named | bucket/named |
/some/file | bucket/named/ | bucket/named/file |
/some/f* | bucket/named | bucket/named/file, bucket/named/file2 |
/some/f* | bucket/named/ | bucket/named/file, bucket/named/file2 |
For faster multipart uploads, the default number of threads used is
3
, but this can be set during upload using the -t
option.
# upload a large file using 16 threads radula -t 16 up large_file bucket
Upload verification via checksum can be enabled by adding the -y
,
--verify
flag.
As of radula v0.6.6
, uploads to a remote key that already exists
will abort if -f, --force is not also given. The reason is to guard
against accidentally loss of data in ceph.
Should portions of a multipart upload fail, there is a chance that it
can be resumed. A reattempt at upload should abort citing the presence
of a lingering multipart upload in progress. The multipart-list command
should confirm as much. Adding the -z,--resume
flag to the original
upload command will inspect the uploaded parts and upload those that are absent
or differ in checksum. The resume will be slower for each part, as the local
parts are hashed and compared to the uploaded parts. Adding a verification step
with -y,--verify
is recommended.
# an upload resumation with verification radula -t 16 -zy up large_file bucket
The commands get
, dl
, and downlaod
are equivalent. For these
examples, I've chosen to use dl
.
The the syntax is radula dl {source} [{target}]
. The target is
optional, and will default to the basename of the remote file to be
stored in the current working directory.
Unlike up
, the download commands do not support globs.
source | target | result |
---|---|---|
bucket/path/file | ./file | |
bucket/path/file | some_file | ./some_file |
bucket/path/file | dir | dir/file |
bucket/path/file | dir/named | dir/named |
No attempt is made to create local paths that do not exist prior to
download; in the table above dir
is an existing directory.
If a file with the target name already exists, radula
will ask if
you wish to overwrite it unless the -f, --force
flag is enabled.
As of radula v0.6.6
, downloads are multi-threaded using 10 processes by default,
which can be controlled with the -t, --threads
flag.
This is known to have issues writing to glusterfs, so -t 1 is recommended in that instance.
In radula v0.7.1
, default threads was reduced to 3.
As of radula v0.7.9
, uploads may include the -e,--encrypt
flag to instruct Rados to store the data encrypted at rest, using its own internal mechanisms. When encrypted data is copied to another cluster, the remote copy should take on this setting without explicitly being told to.
You can upload entire directories with its structure intact. Assume there is a directory such as this:
$ tree projroot/ projroot/ ├── subdir_a │ ├── ef90d4f2 │ └── efd7f715 └── subdir_b ├── 10eaf5f0 ├── 80920f14 ├── a6fcadbf ├── a8dd1085 └── third_dir ├── 980a978f └── e50f86fe
Uploading projroot will copy the directory structure at the location specified. *Beware: full paths (/home/user/..
) given as sources will upload to keys using that full path.
$ radula up projroot bucket/projects <snip> $ radula -p abibby keys abibby/projects/\* projects/projroot/subdir_a/ef90d4f2 projects/projroot/subdir_a/efd7f715 projects/projroot/subdir_b/10eaf5f0 projects/projroot/subdir_b/80920f14 projects/projroot/subdir_b/a6fcadbf projects/projroot/subdir_b/a8dd1085 projects/projroot/subdir_b/third_dir/980a978f projects/projroot/subdir_b/third_dir/e50f86fe
Because keys are inherently flat on s3, to download recursively you'll need a combination of a glob pattern and the --preserve-key
flag.
$ radula --preserve-key dl bucket/projects/projroot/
The entire key is used to create the local structure, so in this case, the projects
dir will be recreated if it had gone missing.
An alternative to download is cat, which prints the contents of a remote subject to stdout.
$ echo "Hello there you" > hello $ radula up hello mybucket/hello INFO:radula:Finished uploading 16.00 B in 0.08s (188.82 Bps) $ radula cat mybucket/hello Hello there you
In radula 0.7+, cat accept the -c,`--chunk-size` parameter to print part of the remote file. Unique to this command is that the chunk param can be a range of integers or humanized units. If humanized units (ie, 2kb) are used, they'll be converted into integer to conform with the [HTTP Range header spec](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35).
When using a range query, the end of the range may be omitted to include everything from the starting position to the end of the file.
Omitting the first argument is not supported. Starting a range with zero (0-n) does work, but it is recommended to simply provide n by itself, because the range in inclusive. The range 0-100 would output 101 bytes, while input 100 returns 100.
A ValueError will be raised if end of the range is before the starting position.
# first two bytes $ radula -c 2 cat mybucket/hello he # 2 bytes in until the end $ radula -c '2-' cat mybucket/hello llo # first byte to second byte (inclusive) $ radula -c '1-2' cat mybucket/hello el
Checksums can be obtained using local-md5
and remote-md5
, and
easily compared with verify
.
The local-md5
command expects one local file argument, and will
generate the same hash that is expected to be found on the remote.
Multipart upload size matters, so the output hash may differ if uploaded
by another mechanism.
The remote-md5
command expects one remote file uri, ie
mybucket/path/myfile. It will return the etag
attribute associated
with the key, which will typically be a file md5 or conglomeration of
multipart upload hashs with a number tacked at the end.
Calling verify [local_file] [remote_file]
simply runs the operations
mentioned above and tests their outputs for likeness.
To view raw metadata about a remote target, use info [remote_file]
.
The output will contain the etag and other data in JSON format.
For quick access to size and hash data, commands etag
and size
are available to provide this data from the larger info
set.
Signed URLs can permit the download of private objects for a limited time. The command get-url
(alias `url
)
can generate these for one or more objects. The first argument is the remote path, which may use globs, and the optional
second argument is an integer representing the minutes until the URL expires. Omitting the second argument will produce
a URL that expires in one day.
# one file with default expire [bibby@machine ~]$ radula get-url bibby/foo https://s3-host/bibby/foo?Signature={signature}&Expires={expire}&AWSAccessKeyId={access_key} # many files with 15 min custom expire [bibby@machine ~]$ radula get-url bibby/foo* 15 https://s3-host/bibby/foo?Signature={signature}&Expires={expire}&AWSAccessKeyId={access_key} https://s3-host/bibby/foo_2?Signature={signature}&Expires={expire}&AWSAccessKeyId={access_key}
Remote objects can be deleted using the commands rm or remove. While the majority of radula commands follow the position pattern of subject, target, the deletion command operates exclusively on remote objects. Therefore, it is one of the few that accept an arbitrary number of arguments. Globs are supported if they are quoted so as not to expand in the shell.
Use the -n,`--dry-run` flag to preview deletions without making any changes.
[bibby@machine ~]$ radula --dry-run rm mybucket/x DRY-RUN: rm mybucket/x [bibby@machine ~]$ radula rm mybucket/x 'mybucket/y*' x y1 y2
If multipart uploads go awry, they can leave behind some unfinished
artifacts in the form of orphaned upload parts. radula
can now list
these can clean up.
The commands multipart-list
, mp-list
, and mpl
are
equivalent. For these examples, I've chosen to use mp-list
.
Listing can be done by bucket or for a key:
# list multipart uploads for a bucket $ radula mp-list mybucket bibby ones.img 2~Q8r-pWTmMTbx_rhHa8-u3I3m-vjCF5F Andrew Bibby 2015-09-23T19:39:14.000Z bibby zeros.img 2~MvM7KTr2sMcS_SfVzWO7T0chzJRUqvm Andrew Bibby 2015-09-23T19:35:44.000Z # list multipart uploads for a key $ radula mp-list mybucket/zeros.img bibby zeros.img 2~MvM7KTr2sMcS_SfVzWO7T0chzJRUqvm Andrew Bibby 2015-09-23T19:35:44.000Z
Cleaning up a failed multi-part upload is as easy using a clean command in place of list.
The commands multipart-clean
, mp-clean
, and mpc
are
equivalent. For these examples, I've chosen to use mp-clean
.
# clean multipart uploads for a key $ radula mp-clean mybucket/zeros.img INFO:root:Canceling zeros.img 2~MvM7KTr2sMcS_SfVzWO7T0chzJRUqvm True # clean multipart uploads for a bucket $ radula mp-list mybucket INFO:root:Canceling ones.img 2~Q8r-pWTmMTbx_rhHa8-u3I3m-vjCF5F True
Since radula 0.5.0, users are able to copy between different ceph
installations, or different buckets within the same installation,
without copying to the local disk. To facilitate this in the friendliest
possible manner, we've extended the boto
configuration slightly to
be able to specify a separate s3 host for a particular profile.
The profile
sections of ~/.boto
or /etc/boto.cfg
can now
accept the following items that are not supported by regular boto:
- host (string)
- port (int)
- is_secure (bool)
An example extended profile
[profile second_ceph] aws_access_key_id = wxyz... aws_secret_access_key = 9765432... host = second.ceph.of.mine port = 8184
The commans streaming-copy
and sc
are equivalent. For these
example, I've chosen to use sc
.
When copying, the -p
flag will apply the aws_profile for the
source/subject. Omitting this flag will use the default boto
credentials for the source.
The -d
flag will specify the profile used for the
destination/target to receive the files. Naming -d Default
will
use the default boto credentials for the destination.
radula -d second sc mybucket/file other_bucket/file
The above command used the default boto profile to send file
from
mybucket
located on the default ceph to the ceph defined in the
profile named second
.
radula -p second -d Default sc other_bucket/file mybucket/file
This is the inverse of the previous example. Using the second
profile as the source/subject (as specified by -p second
), we're
transfering a file to mybucket/file
located on the default s3 using
the default profile (as specified by -d Default
).
Avoiding the use of default profiles all together, you can copy using
both -p
and -d
flags.
radula -p here -d there sc here/stuff there/stuff