Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Azure blob storage support #3032

Merged
Merged
Show file tree
Hide file tree
Changes from 174 commits
Commits
Show all changes
183 commits
Select commit Hold shift + click to select a range
d867646
first commit
landscapepainter Jan 25, 2024
4c49ed4
nit
landscapepainter Jan 26, 2024
1f2c584
implement fetching bucket
landscapepainter Jan 26, 2024
1da2c76
update batch sync
landscapepainter Jan 27, 2024
3ab8095
support file name with empty space sync
landscapepainter Jan 27, 2024
d7a0a26
support blobfuse2 mount/container name validate
landscapepainter Jan 28, 2024
0820037
support container deletion
landscapepainter Jan 28, 2024
32ef2ad
support download from container to remote vm
landscapepainter Jan 29, 2024
9126119
complete download from container to remote vm
landscapepainter Jan 30, 2024
52787e8
update mounting tool blobfuse2 download command
landscapepainter Jan 30, 2024
a4ce470
update mounting command
landscapepainter Jan 30, 2024
1dc2be6
_CREDENTIALS_FILES list update
landscapepainter Jan 30, 2024
dd3646d
add smoke test
landscapepainter Jan 30, 2024
0ecd1ad
update storage comment
landscapepainter Jan 30, 2024
02e06b5
update download commands to use account key
landscapepainter Feb 3, 2024
9765849
add account-key for upload
landscapepainter Feb 3, 2024
8200265
nit
landscapepainter Feb 3, 2024
c7ce2d5
nit fix
landscapepainter Feb 3, 2024
a156b56
data_utils fix
landscapepainter Feb 3, 2024
212e003
nit
landscapepainter Feb 3, 2024
1a0ad06
nit
landscapepainter Feb 3, 2024
6e76b7f
add comments
landscapepainter Feb 4, 2024
9975596
nit smoke
landscapepainter Feb 4, 2024
6becf33
implement verify_az_bucket
landscapepainter Feb 6, 2024
6350fdd
smoke test update and nit mounting_utils
landscapepainter Feb 7, 2024
7a1fe65
config schema update
landscapepainter Feb 9, 2024
23977b9
support public container usage
landscapepainter Feb 9, 2024
97fc0eb
nit fix for private bucket test
landscapepainter Feb 9, 2024
c56fa32
update _get_bucket to use from_container_url
landscapepainter Feb 10, 2024
d23dbb0
add _download_file
landscapepainter Feb 10, 2024
c7ed6fb
nit
landscapepainter Feb 10, 2024
a78b0f9
fix mounting blobfuse2 issues
landscapepainter Feb 10, 2024
861b718
nit
landscapepainter Feb 10, 2024
3cba1b6
format
landscapepainter Feb 10, 2024
7ba8b88
nit
landscapepainter Feb 11, 2024
8aaed84
container client fix
landscapepainter Mar 3, 2024
5e85fdc
smoke test update private_bucket
landscapepainter Mar 7, 2024
cb1e6a8
azure get_client update to use exists()
landscapepainter Mar 7, 2024
d036cd8
nit
landscapepainter Mar 7, 2024
95736e4
udpate fetch command for public containers
landscapepainter Mar 12, 2024
69539ed
nit
landscapepainter Mar 12, 2024
8411fff
update fetching command for public containers
landscapepainter Mar 12, 2024
d0d6105
silence client logging when used with public containers
landscapepainter Mar 12, 2024
3bcb2cb
az cli and blobfuse installation update
landscapepainter Apr 2, 2024
b9c21be
update for faster container client fetch
landscapepainter Apr 3, 2024
f1a072e
Handle private container without access
landscapepainter Apr 3, 2024
bc1534d
update private container without access smoke test
landscapepainter Apr 3, 2024
524a47f
Merge branch 'master' into azure-blob-storage
landscapepainter Apr 3, 2024
1c96055
change due to merging master branch
landscapepainter Apr 3, 2024
b355d25
updates from merging master
landscapepainter Apr 3, 2024
9e52672
update mounting smoke test
landscapepainter Apr 3, 2024
46111f7
mounting smoke test update
landscapepainter Apr 3, 2024
e8bc823
remove logger restriction
landscapepainter Apr 3, 2024
47b6909
update comments
landscapepainter Apr 3, 2024
f23d806
update verify_az_bucket to use for both private and public
landscapepainter Apr 3, 2024
1570022
update comments and formatting
landscapepainter Apr 3, 2024
2aa1e1f
update delete_az_bucket
landscapepainter Apr 3, 2024
85e323a
az cli installation versioning
landscapepainter Apr 4, 2024
e84c46f
update logging silence logic for get_client
landscapepainter Apr 4, 2024
bac7319
support azcopy for fetching
landscapepainter Apr 17, 2024
6e612c9
update sas token generation with az-cli
landscapepainter Apr 18, 2024
b67e87c
propagation hold
landscapepainter Apr 20, 2024
c994cfc
Merge branch 'master' of https://github.com/skypilot-org/skypilot int…
romilbhardwaj May 1, 2024
2f727ac
merge fix
romilbhardwaj May 1, 2024
4c5938a
add support to assign role to access storage account
landscapepainter May 20, 2024
0f810e0
Merge branch 'azure-blob-storage' of https://github.com/landscapepain…
landscapepainter May 20, 2024
881b0d2
nit
landscapepainter May 20, 2024
34a684c
silence logging from httpx request to get object_id
landscapepainter May 21, 2024
777cd8a
checks existance of storage account and resource group before creation
landscapepainter May 22, 2024
a607874
create storage account for different regions
landscapepainter May 23, 2024
eaf3058
fix source name when translating local file mounts for spot sync
landscapepainter May 25, 2024
7494600
smoke test update for storage account names
landscapepainter May 26, 2024
696b4c6
removing az-cli installation from cloud_stores.py
landscapepainter May 26, 2024
8ec5c08
nit
landscapepainter May 26, 2024
054efae
update sas token generation to use python sdk
landscapepainter May 26, 2024
f856063
nit
landscapepainter May 26, 2024
1612af4
Update sky/data/storage.py
landscapepainter May 26, 2024
70ae2b3
move sas token generating functions from data_utils to adaptors.azure
landscapepainter May 27, 2024
d63fc7a
use constant string format to obtain container url
landscapepainter May 27, 2024
be1cfaa
nit
landscapepainter May 28, 2024
5a1dd87
add comment for '/' and azcopy syntax
landscapepainter May 31, 2024
eb933df
refactor AzureBlobCloudStorage methods
landscapepainter May 31, 2024
8e94496
nit
landscapepainter May 31, 2024
fa113d0
Merge branch 'master' into azure-blob-storage
landscapepainter Jun 1, 2024
5df34de
format
landscapepainter Jun 1, 2024
de3d8a8
nit
landscapepainter Jun 7, 2024
806320c
update test storage mount yaml j2
landscapepainter Jun 7, 2024
63dab29
added rich status message for storage account and resource group crea…
landscapepainter Jun 7, 2024
b9dfa47
update rich status message when creating storage account and resource…
landscapepainter Jun 8, 2024
59d30fb
nit
landscapepainter Jun 8, 2024
119e7e5
Error handle for when storage account creation did not yet propagate …
landscapepainter Jun 8, 2024
ecbab0e
comment update
landscapepainter Jun 8, 2024
6640c15
merge error output into exception message
landscapepainter Jun 8, 2024
fb61be4
error comment
landscapepainter Jun 8, 2024
144a60d
additional error handling when creating storage account
landscapepainter Jun 8, 2024
56f8953
nit
landscapepainter Jun 8, 2024
408c240
update to use raw container url endpoint instead of 'az://'
landscapepainter Jun 14, 2024
720fe38
update config.yaml interface
landscapepainter Jun 14, 2024
80e1bce
remove resource group existance check
landscapepainter Jun 15, 2024
c923575
add more comments for az mount command
landscapepainter Jun 15, 2024
e86e24d
nit
landscapepainter Jun 15, 2024
a1940a2
add more exception handling for storage account initialization
landscapepainter Jun 15, 2024
4e5442f
Remove lru cache decorator from sas token generating functions
landscapepainter Jun 15, 2024
a877d7c
nit
landscapepainter Jun 15, 2024
508b8e1
nit
landscapepainter Jun 16, 2024
d9b70d4
Revert back to check if the resource group exists before running comm…
landscapepainter Jun 16, 2024
3b771dc
refactor function to obtain resource group and storage account
landscapepainter Jun 16, 2024
113cb12
nit
landscapepainter Jun 16, 2024
401796b
add support for storage account under AzureBlobStoreMetadata
landscapepainter Jun 18, 2024
fe83261
set default file permission to be 755 for mounting
landscapepainter Jun 18, 2024
613618e
Update sky/adaptors/azure.py
landscapepainter Jun 18, 2024
f30291f
nit
landscapepainter Jun 18, 2024
dacf597
Merge branch 'azure-blob-storage' of https://github.com/landscapepain…
landscapepainter Jun 18, 2024
6dd9d31
nit fixes
landscapepainter Jun 19, 2024
e5cc383
format and update error handling
landscapepainter Jun 19, 2024
892b504
nit fixes
landscapepainter Jun 19, 2024
aea8316
set default storage account and resource group name as string constant
landscapepainter Jun 19, 2024
3e8c96e
update error handle.
landscapepainter Jun 19, 2024
d58a10b
additional error handle for else branch
landscapepainter Jun 19, 2024
078ee52
Additional error handling
landscapepainter Jun 19, 2024
9512449
nit
landscapepainter Jun 19, 2024
01b48a4
update get_az_storage_account_key to replace try-exception with if st…
landscapepainter Jun 20, 2024
2bd5ec7
nit
landscapepainter Jun 20, 2024
0b16763
nit
landscapepainter Jun 20, 2024
7ec0f68
nit
landscapepainter Jun 20, 2024
7b3010c
format
landscapepainter Jun 20, 2024
78a2533
update public container example as not accessible anymore
landscapepainter Jun 26, 2024
7f6ad76
Merge branch 'master' into azure-blob-storage
landscapepainter Jun 26, 2024
cd33c18
nit
landscapepainter Jun 26, 2024
13f90b4
file_bucket_name update
landscapepainter Jun 26, 2024
6a8db72
add StoreType method to retrieve bucket endpoint url
landscapepainter Jun 26, 2024
5c67690
format
landscapepainter Jun 26, 2024
95fa03d
add azure storage blob dependency installation for controller
landscapepainter Jun 26, 2024
b864f42
fix fetching methods
landscapepainter Jun 27, 2024
08adcde
nit
landscapepainter Jun 27, 2024
493e300
additional docstr for _get_storage_account_and_resource_group
landscapepainter Jun 27, 2024
75d0cda
nit
landscapepainter Jun 27, 2024
89645ed
update blobfuse2 cache directory
landscapepainter Jun 27, 2024
cb606ef
format
landscapepainter Jun 27, 2024
98fcd5f
refactor get_storage_account_key method
landscapepainter Jun 27, 2024
1a15411
update docker storage mounts smoke test
landscapepainter Jun 28, 2024
f494725
sleep for storage account creation to propagate
landscapepainter Jun 30, 2024
6d0e128
handle externally removed storage account being fetched
landscapepainter Jul 1, 2024
f8ecddc
format
landscapepainter Jul 1, 2024
e52f797
Merge branch 'master' into azure-blob-storage
landscapepainter Jul 1, 2024
a6691ed
nit
landscapepainter Jul 1, 2024
41d1000
add logic to retry for role assignment
landscapepainter Jul 3, 2024
0cd9d73
add comment to _create_storage_account method
landscapepainter Jul 3, 2024
33fff63
additional error handling for role assignment
landscapepainter Jul 3, 2024
78c5fd3
format
landscapepainter Jul 3, 2024
db9fa49
nit
landscapepainter Jul 3, 2024
3b89bff
Update sky/adaptors/azure.py
landscapepainter Jul 3, 2024
9f48823
additional installation check for azure blob storage dependencies
landscapepainter Jul 3, 2024
f233cb2
format
landscapepainter Jul 3, 2024
5fe2d60
Merge branch 'master' into azure-blob-storage
landscapepainter Jul 3, 2024
b5e2cc5
update step 7 from maybe_translate_local_file_mounts_and_sync_up meth…
landscapepainter Jul 3, 2024
f87de9d
additional comment on container_client.exists()
landscapepainter Jul 4, 2024
b4421d0
explicitly check None for match
landscapepainter Jul 4, 2024
6b12bff
Update sky/cloud_stores.py
landscapepainter Jul 4, 2024
5909311
[style] import module instead of class or funcion
landscapepainter Jul 4, 2024
4c9ac44
Merge branch 'azure-blob-storage' of https://github.com/landscapepain…
landscapepainter Jul 4, 2024
bfee828
nit
landscapepainter Jul 4, 2024
691582e
docstring nit updates
landscapepainter Jul 7, 2024
64c770a
nit
landscapepainter Jul 8, 2024
908acb4
error handle failure to run list blobs API from cloud_stores.py::is_d…
landscapepainter Jul 9, 2024
c320575
nit
landscapepainter Jul 9, 2024
fb3d48e
nit
landscapepainter Jul 9, 2024
076ef01
Add role assignment logic to handle edge case
landscapepainter Jul 12, 2024
3e75bdf
format
landscapepainter Jul 12, 2024
e1c56f0
Merge branch 'master' into azure-blob-storage
landscapepainter Jul 12, 2024
f711911
remove redundant get_az_resource_group method from data_utils
landscapepainter Jul 12, 2024
d5c4e1b
asyncio loop lifecycle manage
landscapepainter Jul 12, 2024
7fefe7b
update constant values
landscapepainter Jul 12, 2024
b93adb1
add logs when resource group and storage account is newly created
landscapepainter Jul 12, 2024
3f81d7c
Merge branch 'master' into azure-blob-storage
landscapepainter Jul 16, 2024
3c05ff7
Update sky/skylet/constants.py
landscapepainter Jul 16, 2024
ccecf24
add comment and move return True within the try except block
landscapepainter Jul 16, 2024
5b1689f
reverse the order of two decorators for get_client method to allow ca…
landscapepainter Jul 16, 2024
f40604b
revert error handling at _execute_file_mounts
landscapepainter Jul 16, 2024
642a258
nit
landscapepainter Jul 16, 2024
e212ea2
raise error when non existent storage account or container name is pr…
landscapepainter Jul 16, 2024
beba84a
format
landscapepainter Jul 16, 2024
55a5e72
add comment for keeping decorator order
landscapepainter Jul 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
353 changes: 342 additions & 11 deletions sky/adaptors/azure.py

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions sky/adaptors/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ def wrapper(*args, **kwargs):
m.load_module()
return func(*args, **kwargs)

# Preserve lru_cache methods
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved
if hasattr(func, 'cache_clear'):
wrapper.cache_clear = func.cache_clear

return wrapper

return decorator
36 changes: 24 additions & 12 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -4508,23 +4508,35 @@ def _execute_file_mounts(self, handle: CloudVmRayResourceHandle,
continue

storage = cloud_stores.get_storage_from_path(src)
if storage.is_directory(src):
sync = storage.make_sync_dir_command(source=src,
destination=wrapped_dst)
# It is a directory so make sure it exists.
mkdir_for_wrapped_dst = f'mkdir -p {wrapped_dst}'
else:
sync = storage.make_sync_file_command(source=src,
destination=wrapped_dst)
# It is a file so make sure *its parent dir* exists.
mkdir_for_wrapped_dst = (
f'mkdir -p {os.path.dirname(wrapped_dst)}')
try:
if storage.is_directory(src):
sync_cmd = (storage.make_sync_dir_command(
source=src, destination=wrapped_dst))
# It is a directory so make sure it exists.
mkdir_for_wrapped_dst = f'mkdir -p {wrapped_dst}'
else:
sync_cmd = (storage.make_sync_file_command(
source=src, destination=wrapped_dst))
# It is a file so make sure *its parent dir* exists.
mkdir_for_wrapped_dst = (
f'mkdir -p {os.path.dirname(wrapped_dst)}')
except Exception as e: # pylint: disable=broad-except
logger.error(
f'Failed to fetch from the bucket {src!r} to '
f'remote instance at {dst!r}.\n'
'Error details: '
f'{common_utils.format_exception(e, use_bracket=True)}.')
# If 'cmd' was appended to 'symlink_commands' for this sync, we
# remove as it failed to sync.
if not dst.startswith('~/') and not dst.startswith('/tmp/'):
symlink_commands.pop()
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, it seems we skip the failed buckets that do not exist, which misaligns with our previous behavior for other cloud buckets. We should probably not skip all errors here.

file_mounts:
  /dst: s3://some-non-exist-bucket-zhwu

Also, it seems for an Azure blob that does not exist, unlike the other clouds, it raises error for the the sync command execution below. Can we get it fail here when we call list_blobs

file_mounts:
    /test: https://skyeastusa37461fd.blob.core.windows.net/some-non-exist-bucket-zhwu
I 07-15 22:12:46 backend_utils.py:1336] Syncing (to 1 node): https://skyeastusa37461fd.blob.core.windows.net/some-non-exist-bucket-zhwu -> /test
E 07-15 22:12:47 subprocess_utils.py:84] INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
E 07-15 22:12:47 subprocess_utils.py:84] 
E 07-15 22:12:47 subprocess_utils.py:84] Job 0c4734a3-9317-ad47-5ecc-0978846fbf69 has started
E 07-15 22:12:47 subprocess_utils.py:84] Log file is located at: /home/azureuser/.azcopy/0c4734a3-9317-ad47-5ecc-0978846fbf69.log
E 07-15 22:12:47 subprocess_utils.py:84] 
E 07-15 22:12:47 subprocess_utils.py:84] 
E 07-15 22:12:47 subprocess_utils.py:84] Cannot perform sync due to error: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
E 07-15 22:12:47 subprocess_utils.py:84] ===== RESPONSE ERROR (ServiceCode=ContainerNotFound) =====
E 07-15 22:12:47 subprocess_utils.py:84] Description=The specified container does not exist.
E 07-15 22:12:47 subprocess_utils.py:84] RequestId:8316c808-001e-005c-0b04-d71755000000
E 07-15 22:12:47 subprocess_utils.py:84] Time:2024-07-15T22:12:47.2300060Z, Details: 
E 07-15 22:12:47 subprocess_utils.py:84]    Code: ContainerNotFound
E 07-15 22:12:47 subprocess_utils.py:84]    GET https://skyeastusa37461fd.blob.core.windows.net/some-non-exist-bucket-zhwu?comp=list&delimiter=%2F&include=metadata&restype=container&se=2024-07-15t23%3A12%3A46z&sig=-REDACTED-&sp=rcwl&sr=c&sv=2024-05-04&timeout=901
E 07-15 22:12:47 subprocess_utils.py:84]    User-Agent: [AzCopy/10.17.0 Azure-Storage/0.15 (go1.19.2; linux)]
E 07-15 22:12:47 subprocess_utils.py:84]    X-Ms-Client-Request-Id: [79bd1b04-f229-46c8-5053-44fb7d1bc8c5]
E 07-15 22:12:47 subprocess_utils.py:84]    X-Ms-Version: [2020-10-02]
E 07-15 22:12:47 subprocess_utils.py:84]    --------------------------------------------------------------------------------
E 07-15 22:12:47 subprocess_utils.py:84]    RESPONSE Status: 404 The specified container does not exist.
E 07-15 22:12:47 subprocess_utils.py:84]    Content-Length: [225]
E 07-15 22:12:47 subprocess_utils.py:84]    Content-Type: [application/xml]
E 07-15 22:12:47 subprocess_utils.py:84]    Date: [Mon, 15 Jul 2024 22:12:47 GMT]
E 07-15 22:12:47 subprocess_utils.py:84]    Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
E 07-15 22:12:47 subprocess_utils.py:84]    X-Ms-Client-Request-Id: [79bd1b04-f229-46c8-5053-44fb7d1bc8c5]
E 07-15 22:12:47 subprocess_utils.py:84]    X-Ms-Error-Code: [ContainerNotFound]
E 07-15 22:12:47 subprocess_utils.py:84]    X-Ms-Request-Id: [8316c808-001e-005c-0b04-d71755000000]
E 07-15 22:12:47 subprocess_utils.py:84]    X-Ms-Version: [2020-10-02]
E 07-15 22:12:47 subprocess_utils.py:84] 
E 07-15 22:12:47 subprocess_utils.py:84] 
E 07-15 22:12:47 subprocess_utils.py:84] 
E 07-15 22:12:47 subprocess_utils.py:84] 

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first issue you mentioned is resolved at f40604b. And the second issue is resolved at e212ea2.


download_target_commands = [
# Ensure sync can write to wrapped_dst (e.g., '/data/').
mkdir_for_wrapped_dst,
# Both the wrapped and the symlink dir exist; sync.
sync,
sync_cmd,
]
command = ' && '.join(download_target_commands)
# dst is only used for message printing.
Expand Down
191 changes: 181 additions & 10 deletions sky/cloud_stores.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,22 @@
* Better interface.
* Better implementation (e.g., fsspec, smart_open, using each cloud's SDK).
"""
import shlex
import subprocess
import time
import urllib.parse

from sky import sky_logging
from sky.adaptors import aws
from sky.adaptors import azure
from sky.adaptors import cloudflare
from sky.adaptors import ibm
from sky.clouds import gcp
from sky.data import data_utils
from sky.data.data_utils import Rclone
from sky.skylet import constants

logger = sky_logging.init_logger(__name__)


class CloudStorage:
Expand Down Expand Up @@ -153,6 +160,167 @@ def make_sync_file_command(self, source: str, destination: str) -> str:
return ' && '.join(all_commands)


class AzureBlobCloudStorage(CloudStorage):
"""Azure Blob Storage."""
# AzCopy is utilized for downloading data from Azure Blob Storage
# containers to remote systems due to its superior performance compared to
# az-cli. While az-cli's `az storage blob sync` can synchronize data from
# local to container, it lacks support to sync from container to remote
# synchronization. Moreover, `az storage blob download-batch` in az-cli
# does not leverage AzCopy's efficient multi-threaded capabilities, leading
# to slower performance.
#
# AzCopy requires appending SAS tokens directly in commands, as it does not
# support using STORAGE_ACCOUNT_KEY, unlike az-cli, which can generate
# SAS tokens but lacks direct multi-threading support like AzCopy.
# Hence, az-cli for SAS token generation is ran on the local machine and
# AzCopy is installed at the remote machine for efficient data transfer
# from containers to remote systems.
# Note that on Azure instances, both az-cli and AzCopy are typically
# pre-installed. And installing both would be used with AZ container is
# used from non-Azure instances.
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved

_GET_AZCOPY = [
'azcopy --version > /dev/null 2>&1 || '
'(mkdir -p /usr/local/bin; '
'curl -L https://aka.ms/downloadazcopy-v10-linux -o azcopy.tar.gz; '
'sudo tar -xvzf azcopy.tar.gz --strip-components=1 -C /usr/local/bin --exclude=*.txt; ' # pylint: disable=line-too-long
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: separate it into two lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as a single line for readability since there are multiple commands ran.

'sudo chmod +x /usr/local/bin/azcopy; '
'rm azcopy.tar.gz)'
]

def is_directory(self, url: str) -> bool:
"""Returns whether 'url' of the AZ Container is a directory.

In cloud object stores, a "directory" refers to a regular object whose
name is a prefix of other objects.

Args:
url: Endpoint url of the container/blob.

Returns:
True if the url is an endpoint of a directory and False if it
is a blob(file).

Raises:
azure.core.exceptions.HttpResponseError: If the user's Azure
Azure account does not have sufficient IAM role for the given
storage account.
"""
storage_account_name, container_name, path = data_utils.split_az_path(
url)

# If there aren't more than just container name and storage account,
# that's a directory.
if not path:
return True

# If there are more, we need to check if it is a directory or a file.
container_url = data_utils.AZURE_CONTAINER_URL.format(
storage_account_name=storage_account_name,
container_name=container_name)
resource_group_name = azure.get_az_resource_group(storage_account_name)
role_assignment_start = time.time()
refresh_client = False
role_assigned = False

while (time.time() - role_assignment_start <
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved
constants.WAIT_FOR_STORAGE_ACCOUNT_ROLE_ASSIGNMENT):
container_client = data_utils.create_az_client(
client_type='container',
container_url=container_url,
storage_account_name=storage_account_name,
resource_group_name=resource_group_name,
refresh_client=refresh_client)

num_objects = 0
try:
for blob in container_client.list_blobs(name_starts_with=path):
if blob.name == path:
return False
num_objects += 1
if num_objects > 1:
return True
except azure.exceptions().HttpResponseError as e:
# Handle case where user lacks sufficient IAM role for
# a private container in the same subscription. Attempt to
# assign appropriate role to current user.
if 'AuthorizationPermissionMismatch' in str(e):
if not role_assigned:
logger.info('Failed to list blobs in container '
f'{container_url!r}. This implies '
'insufficient IAM role for storage account'
f' {storage_account_name!r}.')
azure.assign_storage_account_iam_role(
storage_account_name=storage_account_name,
resource_group_name=resource_group_name)
role_assigned = True
refresh_client = True
else:
logger.info(
'Waiting due to the propagation delay of IAM '
'role assignment to the storage account '
f'{storage_account_name!r}.')
time.sleep(
constants.RETRY_INTERVAL_AFTER_ROLE_ASSIGNMENT)
continue
raise
# A directory with few or no items
return True
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: keep the try block small

Suggested change
try:
for blob in container_client.list_blobs(name_starts_with=path):
if blob.name == path:
return False
num_objects += 1
if num_objects > 1:
return True
except azure.exceptions().HttpResponseError as e:
# Handle case where user lacks sufficient IAM role for
# a private container in the same subscription. Attempt to
# assign appropriate role to current user.
if 'AuthorizationPermissionMismatch' in str(e):
if not role_assigned:
logger.info('Failed to list blobs in container '
f'{container_url!r}. This implies '
'insufficient IAM role for storage account'
f' {storage_account_name!r}.')
azure.assign_storage_account_iam_role(
storage_account_name=storage_account_name,
resource_group_name=resource_group_name)
role_assigned = True
refresh_client = True
else:
logger.info(
'Waiting due to the propagation delay of IAM '
'role assignment to the storage account '
f'{storage_account_name!r}.')
time.sleep(
constants.RETRY_INTERVAL_AFTER_ROLE_ASSIGNMENT)
continue
raise
# A directory with few or no items
return True
try:
blobs = container_client.list_blobs(name_starts_with=path)
except azure.exceptions().HttpResponseError as e:
# Handle case where user lacks sufficient IAM role for
# a private container in the same subscription. Attempt to
# assign appropriate role to current user.
if 'AuthorizationPermissionMismatch' in str(e):
if not role_assigned:
logger.info('Failed to list blobs in container '
f'{container_url!r}. This implies '
'insufficient IAM role for storage account'
f' {storage_account_name!r}.')
azure.assign_storage_account_iam_role(
storage_account_name=storage_account_name,
resource_group_name=resource_group_name)
role_assigned = True
refresh_client = True
else:
logger.info(
'Waiting due to the propagation delay of IAM '
'role assignment to the storage account '
f'{storage_account_name!r}.')
time.sleep(
constants.RETRY_INTERVAL_AFTER_ROLE_ASSIGNMENT)
continue
raise
for blob in blobs:
if blob.name == path:
return False
num_objects += 1
if num_objects > 1:
return True
# A directory with few or no items
return True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried at first, but the error is raised when we iterate through the object within the for-loop for blob in blobs: rather than when the object is obtained with blobs = container_client.list_blobs(name_starts_with=path). So the suggested code won't be able to catch the error.

else:
raise TimeoutError(
'Failed to determine the container path status within '
f'{constants.WAIT_FOR_STORAGE_ACCOUNT_ROLE_ASSIGNMENT}'
'seconds.')

def _get_azcopy_source(self, source: str, is_dir: bool) -> str:
"""Converts the source so it can be used as an argument for azcopy."""
storage_account_name, container_name, blob_path = (
data_utils.split_az_path(source))
storage_account_key = data_utils.get_az_storage_account_key(
storage_account_name)

if storage_account_key is None:
# public containers do not require SAS token for access
sas_token = ''
else:
if is_dir:
sas_token = azure.get_az_container_sas_token(
storage_account_name, storage_account_key, container_name)
else:
sas_token = azure.get_az_blob_sas_token(storage_account_name,
storage_account_key,
container_name,
blob_path)
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved
# "?" is a delimiter character used when SAS token is attached to the
# container endpoint.
# Reference: https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens?tabs=Containers # pylint: disable=line-too-long
converted_source = f'{source}?{sas_token}' if sas_token else source

return shlex.quote(converted_source)

def make_sync_dir_command(self, source: str, destination: str) -> str:
"""Fetches a directory using AZCOPY from storage to remote instance."""
source = self._get_azcopy_source(source, is_dir=True)
# destination is guaranteed to not have '/' at the end of the string
# by tasks.py::set_file_mounts(). It is necessary to add from this
# method due to syntax of azcopy.
destination = f'{destination}/'
cblmemo marked this conversation as resolved.
Show resolved Hide resolved
download_command = (f'azcopy sync {source} {destination} '
'--recursive --delete-destination=false')
all_commands = list(self._GET_AZCOPY)
all_commands.append(download_command)
return ' && '.join(all_commands)

def make_sync_file_command(self, source: str, destination: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this two functions have a lot common codes. could we try to add a wrapper to increase code reuse

Copy link
Collaborator Author

@landscapepainter landscapepainter May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this at it is to keep the interface consistent with the other storage classes. Also, the interface for uploading codes from local to storage has the two function for file and dir separated as well. I think the consistency makes sense.

Copy link
Collaborator

@cblmemo cblmemo May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this, i.e. still keep 2 interface but wrap the shared code into a helper function and call it with different args.

def _shared_code(args):
    foo()
    bar(args)
    foo2()
def make_sync_dir_command():
    foo3()
    return _shared_code(some_sync_dir_related_args)
def make_sync_file_command():
    foo4()
    return _shared_code(some_sync_file_related_args)

Copy link
Collaborator Author

@landscapepainter landscapepainter May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the elaboration! Refactored at eb933df. But I'm not sure if this implementation is the most optimal way as the number of line of codes did not reduce much and perhaps it's more messy. I'd be happy to get more feedbacks on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will take a look soon.

Copy link
Collaborator Author

@landscapepainter landscapepainter Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cblmemo If there isn't a way to reduce this code further, this should be converted back to how it was. Readability seems to be worse compared to how it was even though the line of codes did not change. All the other storages seems to not share code for the same reason. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still reusing a lot of code, and if we don't need to repeat the blob_path twice, the last few lines could also be simplified from

        if is_dir:
            converted_source = f'{source}/{sas_token}'
        else:
            # "?" is a delimiter character used when SAS token is attached to
            # the blob endpoint.
            converted_source = f'{source}/{blob_path}?{sas_token}'

to

        converted_source = f'{source}/?{sas_token}'

, assuming we also remove the ? from container sas token generation.

"""Fetches a file using AZCOPY from storage to remote instance."""
source = self._get_azcopy_source(source, is_dir=False)
download_command = f'azcopy copy {source} {destination}'
all_commands = list(self._GET_AZCOPY)
cblmemo marked this conversation as resolved.
Show resolved Hide resolved
all_commands.append(download_command)
return ' && '.join(all_commands)


class R2CloudStorage(CloudStorage):
"""Cloudflare Cloud Storage."""

Expand Down Expand Up @@ -218,16 +386,6 @@ def make_sync_file_command(self, source: str, destination: str) -> str:
return ' && '.join(all_commands)


def get_storage_from_path(url: str) -> CloudStorage:
"""Returns a CloudStorage by identifying the scheme:// in a URL."""
result = urllib.parse.urlsplit(url)

if result.scheme not in _REGISTRY:
assert False, (f'Scheme {result.scheme} not found in'
f' supported storage ({_REGISTRY.keys()}); path {url}')
return _REGISTRY[result.scheme]


class IBMCosCloudStorage(CloudStorage):
"""IBM Cloud Storage."""
# install rclone if package isn't already installed
Expand Down Expand Up @@ -294,10 +452,23 @@ def make_sync_file_command(self, source: str, destination: str) -> str:
return self.make_sync_dir_command(source, destination)


def get_storage_from_path(url: str) -> CloudStorage:
"""Returns a CloudStorage by identifying the scheme:// in a URL."""
result = urllib.parse.urlsplit(url)
if result.scheme not in _REGISTRY:
assert False, (f'Scheme {result.scheme} not found in'
f' supported storage ({_REGISTRY.keys()}); path {url}')
return _REGISTRY[result.scheme]


# Maps bucket's URIs prefix(scheme) to its corresponding storage class
_REGISTRY = {
'gs': GcsCloudStorage(),
's3': S3CloudStorage(),
'r2': R2CloudStorage(),
'cos': IBMCosCloudStorage(),
# TODO: This is a hack, as Azure URL starts with https://, we should
# refactor the registry to be able to take regex, so that Azure blob can
# be identified with `https://(.*?)\.blob\.core\.windows\.net`
'https': AzureBlobCloudStorage()
landscapepainter marked this conversation as resolved.
Show resolved Hide resolved
}
13 changes: 13 additions & 0 deletions sky/clouds/azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,19 @@ def check_credentials(cls) -> Tuple[bool, Optional[str]]:
return False, (f'Getting user\'s Azure identity failed.{help_str}\n'
f'{cls._INDENT_PREFIX}Details: '
f'{common_utils.format_exception(e)}')

# Check if the azure blob storage dependencies are installed.
try:
# pylint: disable=redefined-outer-name, import-outside-toplevel, unused-import
from azure.storage import blob
import msgraph
except ImportError as e:
return False, (
f'Azure blob storage depdencies are not installed. '
'Run the following commands:'
f'\n{cls._INDENT_PREFIX} $ pip install skypilot[azure]'
f'\n{cls._INDENT_PREFIX}Details: '
f'{common_utils.format_exception(e)}')
return True, None

def get_credential_file_mounts(self) -> Dict[str, str]:
Expand Down
8 changes: 4 additions & 4 deletions sky/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -831,7 +831,7 @@ def storage_delete(name: str) -> None:
if handle is None:
raise ValueError(f'Storage name {name!r} not found.')
else:
store_object = data.Storage(name=handle.storage_name,
source=handle.source,
sync_on_reconstruction=False)
store_object.delete()
storage_object = data.Storage(name=handle.storage_name,
source=handle.source,
sync_on_reconstruction=False)
storage_object.delete()
Loading
Loading