Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Rclone to Cache Storage #3455

Closed
wants to merge 54 commits into from
Closed
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
7dfcba1
initial commit
Feb 17, 2024
212df29
Merge branch 'skypilot-org:master' into master
shethhriday29 Feb 17, 2024
a08c034
newline
Feb 17, 2024
b85cbf9
comments
Feb 17, 2024
4fbfe17
run linter
Feb 17, 2024
6fc77e1
reminder for down
Feb 18, 2024
d6cb993
tentatively done with example
Feb 18, 2024
2d5aceb
formatting
Feb 18, 2024
4e1954a
yapf
Feb 19, 2024
eb22f62
Merge branch 'skypilot-org:master' into master
shethhriday29 Feb 22, 2024
27a8905
[Storage] Storage mounting tool permissions fix (#3215)
romilbhardwaj Feb 22, 2024
41a63df
[LLM] Example for Serving Gemma (#3207)
Michaelvll Feb 23, 2024
2b17e91
[LLM] Add logo for Gemma (#3220)
Michaelvll Feb 23, 2024
b326d12
Minor fixes for release 0.5.0 (#3212)
Michaelvll Feb 23, 2024
6d77872
[Docker] Add retry for docker pull due to daemon not ready (#3218)
Michaelvll Feb 24, 2024
cb695d5
added comments
Feb 26, 2024
94221eb
Merge branch 'skypilot-org:master' into master
shethhriday29 Feb 26, 2024
888f1a8
quick fix
Feb 27, 2024
4877669
finished pip issues
Feb 29, 2024
7a208dd
fix
Feb 29, 2024
152e36a
fix storage error message, add example link to docs
romilbhardwaj Feb 29, 2024
cf556ed
Merge branch 'master' of https://github.com/skypilot-org/skypilot int…
romilbhardwaj Feb 29, 2024
8213f02
Merge branch 'skypilot-org:master' into master
shethhriday29 Mar 3, 2024
60a530d
Merge branch 'skypilot-org:master' into master
shethhriday29 Mar 22, 2024
8df52d6
Merge branch 'skypilot-org:master' into master
shethhriday29 Mar 29, 2024
31291f4
Merge branch 'skypilot-org:master' into master
shethhriday29 Apr 17, 2024
c868494
prototype for gcs
Apr 18, 2024
953b4a6
removed non git files
Apr 18, 2024
e63e943
rclone support for aws
Apr 21, 2024
a4ddd48
minor formatting fix
Apr 21, 2024
9fa8f67
Merge branch 'skypilot-org:master' into rclone-cache
shethhriday29 May 26, 2024
19e714b
Merge branch 'skypilot-org:master' into rclone-cache
shethhriday29 May 27, 2024
32bb4fa
fixed merge conflict
May 27, 2024
0481d2c
fixed merge conflict
May 27, 2024
abf4488
fixed merge conflict
May 27, 2024
00a2a9f
reset perf example
May 27, 2024
213eb26
reset perf example
May 27, 2024
1349ba0
added tests for rclone functionality
May 28, 2024
3b0923a
update rclone vfs options
landscapepainter May 30, 2024
b82cc5b
testing
landscapepainter Jun 13, 2024
bad9172
test
landscapepainter Jun 30, 2024
211f1bc
test
landscapepainter Jun 30, 2024
a312ac1
update rclone config and rclone command options
landscapepainter Jul 2, 2024
ab179ab
Merge branch 'rclone-cache' of https://github.com/shethhriday29/skypi…
landscapepainter Jul 2, 2024
66a8e11
nit refactor
landscapepainter Jul 4, 2024
e628024
rclone refactor
landscapepainter Jul 5, 2024
0976fa1
nit
landscapepainter Jul 6, 2024
3d16255
update MOUNT_CACHE mode to MOUNT_CACHED mode
landscapepainter Jul 6, 2024
3818a05
update smoke test
landscapepainter Jul 7, 2024
d393206
Merge branch 'master' into shethhriday29-rclone-cache
landscapepainter Jul 7, 2024
27e702b
additional comments for mount cached command explaining options for r…
landscapepainter Jul 7, 2024
0612532
rclone class doc-string fix
landscapepainter Jul 8, 2024
ea290b8
nit format
landscapepainter Jul 8, 2024
64d9a7d
update step 7 of maybe_translate_local_file_mounts_and_sync_up
landscapepainter Jul 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -4557,7 +4557,8 @@ def _execute_storage_mounts(
storage_mounts = {
path: storage_mount
for path, storage_mount in storage_mounts.items()
if storage_mount.mode == storage_lib.StorageMode.MOUNT
if (storage_mount.mode == storage_lib.StorageMode.MOUNT or
storage_mount.mode == storage_lib.StorageMode.RCLONE)
}

# Handle cases when there aren't any Storages with MOUNT mode.
Expand Down Expand Up @@ -4587,7 +4588,10 @@ def _execute_storage_mounts(
'successfully without mounting the bucket.')
# Get the first store and use it to mount
store = list(storage_obj.stores.values())[0]
mount_cmd = store.mount_command(dst)
if storage_obj.mode == storage_lib.StorageMode.MOUNT:
mount_cmd = store.mount_command(dst)
else:
mount_cmd = store.mount_command_rclone(dst)
src_print = (storage_obj.source
if storage_obj.source else storage_obj.name)
if isinstance(src_print, list):
Expand Down
23 changes: 21 additions & 2 deletions sky/data/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

from filelock import FileLock

from sky import clouds
from sky import exceptions
from sky import sky_logging
from sky.adaptors import aws
Expand Down Expand Up @@ -402,6 +403,8 @@ class Rclone():
# to their respective profile prefix
class RcloneClouds(Enum):
IBM = 'sky-ibm-'
GCP = 'sky-gcp-'
AWS = 'sky-aws-'

@staticmethod
def generate_rclone_bucket_profile_name(bucket_name: str,
Expand All @@ -422,9 +425,9 @@ def generate_rclone_bucket_profile_name(bucket_name: str,

@staticmethod
def get_rclone_config(bucket_name: str, cloud: RcloneClouds,
region: str) -> str:
region: Optional[str]) -> str:
bucket_rclone_profile = Rclone.generate_rclone_bucket_profile_name(
bucket_name, cloud)
bucket_name, cloud)
if cloud is Rclone.RcloneClouds.IBM:
access_key_id, secret_access_key = ibm.get_hmac_keys()
config_data = textwrap.dedent(f"""\
Expand All @@ -438,6 +441,22 @@ def get_rclone_config(bucket_name: str, cloud: RcloneClouds,
location_constraint = {region}-smart
acl = private
""")
elif cloud is Rclone.RcloneClouds.GCP:
config_data = textwrap.dedent(f"""\
[{bucket_rclone_profile}]
type = google cloud storage
project_number = {clouds.GCP.get_project_id()}
bucket_acl = private
""")
elif cloud is Rclone.RcloneClouds.AWS:
config_data = textwrap.dedent(f"""\
[{bucket_rclone_profile}]
type = s3
provider = AWS
access_key_id = {aws.session().get_credentials().get_frozen_credentials().access_key}
secret_access_key = {aws.session().get_credentials().get_frozen_credentials().secret_key}
acl = private
""")
else:
with ux_utils.print_exception_no_traceback():
raise NotImplementedError('No rclone configuration builder was '
Expand Down
41 changes: 32 additions & 9 deletions sky/data/mounting_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
_RENAME_DIR_LIMIT = 10000
# https://github.com/GoogleCloudPlatform/gcsfuse/releases
GCSFUSE_VERSION = '1.3.0'

RCLONE_INSTALL_COMMAND = ('rclone version >/dev/null 2>&1 || '
'(curl https://rclone.org/install.sh | '
'sudo bash)')

def get_s3_mount_install_cmd() -> str:
"""Returns a command to install S3 mount utility goofys."""
Expand Down Expand Up @@ -68,14 +70,6 @@ def get_r2_mount_cmd(r2_credentials_path: str, r2_profile_name: str,
return mount_cmd


def get_cos_mount_install_cmd() -> str:
"""Returns a command to install IBM COS mount utility rclone."""
install_cmd = ('rclone version >/dev/null 2>&1 || '
'(curl https://rclone.org/install.sh | '
'sudo bash)')
return install_cmd


def get_cos_mount_cmd(rclone_config_data: str, rclone_config_path: str,
bucket_rclone_profile: str, bucket_name: str,
mount_path: str) -> str:
Expand All @@ -98,6 +92,35 @@ def get_cos_mount_cmd(rclone_config_data: str, rclone_config_path: str,
return mount_cmd


def get_mount_install_cmd_rclone() -> str:
"""Returns a command to install mount utility rclone."""
return RCLONE_INSTALL_COMMAND


def get_mount_cmd_rclone(rclone_config_data: str, rclone_config_path: str,
bucket_rclone_profile: str, bucket_name: str,
mount_path: str) -> str:
"""Returns a command to mount a GCP/AWS bucket using rclone."""
# creates a fusermount soft link on older (<22) Ubuntu systems for
# rclone's mount utility.
set_fuser3_soft_link = ('[ ! -f /bin/fusermount3 ] && '
'sudo ln -s /bin/fusermount /bin/fusermount3 || '
'true')
# stores bucket profile in rclone config file at the cluster's nodes.
configure_rclone_profile = (f'{set_fuser3_soft_link}; '
'mkdir -p ~/.config/rclone/ && '
f'echo "{rclone_config_data}" >> '
f'{rclone_config_path}')
# --daemon will keep the mounting process running in the background.
mount_cmd = (f'{configure_rclone_profile} && '
'rclone mount '
f'{bucket_rclone_profile}:{bucket_name} {mount_path} '
'--daemon --daemon --daemon-wait 0 \
--allow-other --rc --vfs-cache-mode full &&' #todo: figure out if this should be a semicolon or an &&
'rclone rc vfs/refresh')
return mount_cmd


def get_mounting_script(
mount_path: str,
mount_cmd: str,
Expand Down
42 changes: 41 additions & 1 deletion sky/data/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ def store_prefix(self) -> str:
class StorageMode(enum.Enum):
MOUNT = 'MOUNT'
COPY = 'COPY'
RCLONE = 'RCLONE'


class AbstractStore:
Expand Down Expand Up @@ -1036,6 +1037,9 @@ def __init__(self,
self.bucket: 'StorageHandle'
super().__init__(name, source, region, is_sky_managed,
sync_on_reconstruction)
self.bucket_rclone_profile = \
Rclone.generate_rclone_bucket_profile_name(
self.name, Rclone.RcloneClouds.AWS)

def _validate(self):
if self.source is not None and isinstance(self.source, str):
Expand Down Expand Up @@ -1354,6 +1358,22 @@ def mount_command(self, mount_path: str) -> str:
mount_path)
return mounting_utils.get_mounting_command(mount_path, install_cmd,
mount_cmd)

def mount_command_rclone(self, mount_path: str) -> str:
install_cmd = mounting_utils.get_mount_install_cmd_rclone()
rclone_config_data = Rclone.get_rclone_config(
self.bucket.name,
Rclone.RcloneClouds.AWS,
None
)
mount_cmd = mounting_utils.get_mount_cmd_rclone(rclone_config_data,
Rclone.RCLONE_CONFIG_PATH,
self.bucket_rclone_profile,
self.bucket.name,
mount_path)
return mounting_utils.get_mounting_command(mount_path, install_cmd,
mount_cmd)


def _create_s3_bucket(self,
bucket_name: str,
Expand Down Expand Up @@ -1445,6 +1465,9 @@ def __init__(self,
self.bucket: StorageHandle
super().__init__(name, source, region, is_sky_managed,
sync_on_reconstruction)
self.bucket_rclone_profile = \
Rclone.generate_rclone_bucket_profile_name(
self.name, Rclone.RcloneClouds.GCP)

def _validate(self):
if self.source is not None:
Expand Down Expand Up @@ -1794,6 +1817,23 @@ def mount_command(self, mount_path: str) -> str:
f'gcsfuse --version | grep -q {mounting_utils.GCSFUSE_VERSION}')
return mounting_utils.get_mounting_command(mount_path, install_cmd,
mount_cmd, version_check_cmd)


def mount_command_rclone(self, mount_path: str) -> str:
install_cmd = mounting_utils.get_mount_install_cmd_rclone()
rclone_config_data = Rclone.get_rclone_config(
self.bucket.name,
Rclone.RcloneClouds.GCP,
None
)
mount_cmd = mounting_utils.get_mount_cmd_rclone(rclone_config_data,
Rclone.RCLONE_CONFIG_PATH,
self.bucket_rclone_profile,
self.bucket.name,
mount_path)
return mounting_utils.get_mounting_command(mount_path, install_cmd,
mount_cmd)


def _download_file(self, remote_path: str, local_path: str) -> None:
"""Downloads file from remote to local on GS bucket
Expand Down Expand Up @@ -2585,7 +2625,7 @@ def mount_command(self, mount_path: str) -> str:
mount_path: str; Path to mount the bucket to.
"""
# install rclone if not installed.
install_cmd = mounting_utils.get_cos_mount_install_cmd()
install_cmd = mounting_utils.get_mount_install_cmd_rclone()
rclone_config_data = Rclone.get_rclone_config(
self.bucket.name,
Rclone.RcloneClouds.IBM,
Expand Down
3 changes: 2 additions & 1 deletion sky/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -1112,7 +1112,8 @@ def get_required_cloud_features(

# Storage mounting
for _, storage_mount in self.storage_mounts.items():
if storage_mount.mode == storage_lib.StorageMode.MOUNT:
if (storage_mount.mode == storage_lib.StorageMode.MOUNT or
storage_mount.mode == storage_lib.StorageMode.RCLONE):
required_features.add(
clouds.CloudImplementationFeatures.STORAGE_MOUNTING)
break
Expand Down
60 changes: 60 additions & 0 deletions tests/test_smoke.py
Original file line number Diff line number Diff line change
Expand Up @@ -999,6 +999,36 @@ def test_aws_storage_mounts_with_stop():
run_one_test(test)


@pytest.mark.aws
def test_aws_mount_rclone():
name = _get_cluster_name()
storage_name = f'sky-test-{int(time.time())}'
bucket_rclone_profile = Rclone.generate_rclone_bucket_profile_name(
storage_name, Rclone.RcloneClouds.AWS)
template_str = pathlib.Path(
'tests/test_yamls/test_rclone_mount.yaml').read_text()
template = jinja2.Template(template_str)
content = template.render(store_type=f'{storage_lib.StoreType.S3.value}',
storage_name=storage_name,
bucket_rclone_profile=bucket_rclone_profile)
with tempfile.NamedTemporaryFile(suffix='.yaml', mode='w') as f:
f.write(content)
f.flush()
file_path = f.name
test_commands = [
*storage_setup_commands,
f'sky launch -y -c {name} --cloud aws {file_path}',
f'sky logs {name} 1 --status', # Ensure job succeeded.
]
test = Test(
'aws_mount_rclone',
test_commands,
f'sky down -y {name}; sky storage delete -y {storage_name}',
timeout=20 * 60, # 20 mins
)
run_one_test(test)


@pytest.mark.gcp
def test_gcp_storage_mounts_with_stop():
name = _get_cluster_name()
Expand Down Expand Up @@ -1031,6 +1061,36 @@ def test_gcp_storage_mounts_with_stop():
run_one_test(test)


@pytest.mark.gcp
def test_gcp_mount_rclone():
name = _get_cluster_name()
storage_name = f'sky-test-{int(time.time())}'
bucket_rclone_profile = Rclone.generate_rclone_bucket_profile_name(
storage_name, Rclone.RcloneClouds.GCP)
template_str = pathlib.Path(
'tests/test_yamls/test_rclone_mount.yaml').read_text()
template = jinja2.Template(template_str)
content = template.render(store_type=storage_lib.StoreType.GCS.value,
storage_name=storage_name,
bucket_rclone_profile=bucket_rclone_profile)
with tempfile.NamedTemporaryFile(suffix='.yaml', mode='w') as f:
f.write(content)
f.flush()
file_path = f.name
test_commands = [
*storage_setup_commands,
f'sky launch -y -c {name} --cloud gcp {file_path}',
f'sky logs {name} 1 --status', # Ensure job succeeded.
]
test = Test(
'gcp_mount_rclone',
test_commands,
f'sky down -y {name}; sky storage delete -y {storage_name}',
timeout=20 * 60, # 20 mins
)
run_one_test(test)


@pytest.mark.kubernetes
def test_kubernetes_storage_mounts():
# Tests bucket mounting on k8s, assuming S3 is configured.
Expand Down
24 changes: 24 additions & 0 deletions tests/test_yamls/test_rclone_mount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
file_mounts:
# Mounting private buckets in RCLONE mode
/mount_private_rclone:
name: {{storage_name}}
source: ~/tmp-workdir
store: {{store_type}}
mode: RCLONE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new mode, or should we directly change the behavior of MOUNT to rclone?

Copy link
Collaborator

@landscapepainter landscapepainter Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed online, we are referring this mode separately as MOUNT_CACHED mode, and keeping our original MOUNT mode as it is.


run: |
set -ex

# Check private bucket contents
ls -ltr /mount_private_rclone/foo
ls -ltr /mount_private_rclone/tmp\ file

# Symlinks are not copied to buckets
! ls /mount_private_rclone/circle-link

# Write to private bucket in MOUNT mode should pass
echo "hello" > /mount_private_rclone/hello.txt

# Ensure that write is reflected in bucket
rclone ls {{ bucket_rclone_profile }}:{{ storage_name }}/hello.txt

Loading