Skip to content
Suchandra Thapa edited this page Jun 13, 2014 · 5 revisions

Setting up Chirp

Chirp is required in order to access your data remotely. You'll need to download the CCTools tarball and untar it in /usr/local/. Then download the skeleton key script.

Invoking Skeleton Key

Skeleton Key can be run as skeleton_key -c [config_file]. It will then parse the config_file and generate a shell script called job_script.sh that can then be used in submit files or copied to another system and run.

Application Modifications Required

In order to work with Skeleton Key, applications must be modified in order to function correctly. The application will need to be set to access data from the location specified by the $CHIRP_MOUNT environment variable. For example, if the application normally writes to /mnt/hadoop/app_data, it should write to $CHIRP_MOUNT/app_data instead. In addition, all CVMFS mounts will have to be accessed as /cvmfs/repository_name.

Configuration File Format

Skeleton Key uses configuration files similar to windows ini files to determine what information to share and how to run applications remotely. Sections in the ini file are started using [Section]. Options within each section are specified using option_name = value. Everything after the equals sign is assigned to option so the value does not have to be given in quotes. A ; or # character at the start of a line is used to indicate that a line is a comment and should be ignored. In addition, ; can be used in a line to indicate that the following characters are comments and should be ignored. The sections and options in the configuration file Skeleton Key uses are given below. You will need to have at least an Application section in order for Skeleton Key to work.

Directories Section

The Directories section of the config file indicates which directories exported by Chirp should be shared and with what permissions. This section has the following settings:

Name Description
export_base This mandatory setting specifies the path to the directory that Chirp is exporting.
read A comma separated list of directories located in the directory specified in chirp_base that Skeleton Key should make available to the running application with read only privileges.
write A comma separated list of directories located in the directory specified in chirp_base that Skeleton Key should make available to the running application with read/write privileges.

Note: either read or write needs to be given.

Application Section

The Application section gives information on your application and how it should be run. The setting for this section are as given:

Name Description
location An optional setting giving an url to a tarball that should be downloaded and untarred be running the script or binary given in the script setting. The file must be a tar.gz file
script This mandatory option should have the location of the binary or script to run within the parrot environment. For example, if the application tarball untars into a directory called app, this may be set to ./app/bin/app_binary . Likewise, if parrot_run should run an application using CVMFS, this may be set to something similar to /cvmfs/repo_name/bin/my_app
arguments This option should have any arguments that should be passed to the script or binary specified in the script setting
http_proxy An optional setting giving a server to use as a http proxy

CVMFS Section

The CVMFS section can be used to specify CVMFS repositories that should be setup in the environment that your application will run in. All configured repositories will be available as /cvmfs/repo_name where repo_name is the specified name of the repository.

Name Description
repoN This setting should give the repository name. Important: this name must match the repo name used when setting up the CVMFS master otherwise your application will segfault when trying to access this repository.
repoN_key This setting should give an URL to the public key associated with the CVMFS repository.
repoN_options This setting should give options for the CVMFS repository. Each option should be separated by a comma. At a minimum, url must be given. In addition, proxies must be given if http_proxy is not specified in the Application section.

In the settings listed above, N should be replaced with an integer. Each repository that should be made available should have a corresponding repoN and repoN_options setting starting from 1. E.g. the first repository should be specified by repo1 and repo1_options settings; the second by repo2 and repo2_options; and so on.

CVMFS options are described below, only url is necessary. proxies is only needed if http_proxy is not given or the environment does not have HTTP_PROXY set.

Option Description
url=URL The URL of the CernVM?-FS server(s): 'url1;url2;...'
proxies=HTTP_PROXIES Set the HTTP proxy list, such as 'proxy1¦proxy2'; Proxies separated by '¦' are randomly chosen for load balancing. Groups of proxies separated by ';' may be specified for failover. If the first group fails, the second group is used, and so on down the chain.
cachedir=DIR Where to store disk cache;
timeout=SECONDS Timeout for network operations;
timeout_direct=SECONDS Timeout in for network operations without proxy; default is given by -T option (PARROT_TIMEOUT)
max_ttl=MINUTES Maximum TTL for file catalogs; default: take from catalog
allow_unsigned Accept unsigned catalogs (allows man-in-the-middle attacks)
whitelist=URL HTTP location of trusted catalog certificates (defaults is /.cvmfswhitelist)
rebuild_cachedb Force rebuilding the quota cache db from cache directory
quota_limit=MB Limit size of cache. -1 (the default) means unlimited. If not -1, files larger than quota_limit-quota_threshold will not be readable.
quota_threshold=MB Cleanup cache until size is <= threshold
deep_mount=prefix Path prefix if a repository is mounted on a nested catalog
repo_name=NAME Unique name of the mounted repository; default is the name used for this configuration entry
mountpoint=PATH Path to root of repository; default is /cvmfs/repo_name
blacklist=FILE Local blacklist for invalid certificates. Has precedence over the whitelist.

Parrot Section

This optional section can be used to specify the location of a tarball with the Parrot binaries that should be used. If this section is not given, then a default set of binaries for the OSG Connect cluster will be used. The settings for this section are as follow:

Name Description
location URL to a tar.gz file that can be downloaded. The parrot_run binary must be found at ./parrot/bin/parrot_run after untarring the file