diff --git a/README.md b/README.md index 2d2e471dc7e..b803a998924 100644 --- a/README.md +++ b/README.md @@ -58,7 +58,7 @@ SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offer SkyPilot **abstracts away cloud infra burdens**: - Launch jobs & clusters on any cloud - Easy scale-out: queue and run many jobs, automatically managed -- Easy access to object stores (S3, GCS, R2) +- Easy access to object stores (S3, GCS, Azure, R2, IBM) SkyPilot **maximizes GPU availability for your jobs**: * Provision in all zones/regions/clouds you have access to ([the _Sky_](https://arxiv.org/abs/2205.07147)), with automatic failover diff --git a/docs/source/docs/index.rst b/docs/source/docs/index.rst index 5a648dbcda4..b4bd66fba6f 100644 --- a/docs/source/docs/index.rst +++ b/docs/source/docs/index.rst @@ -33,7 +33,7 @@ SkyPilot **abstracts away cloud infra burdens**: - Launch jobs & clusters on any cloud - Easy scale-out: queue and run many jobs, automatically managed -- Easy access to object stores (S3, GCS, R2) +- Easy access to object stores (S3, GCS, Azure, R2, IBM) SkyPilot **maximizes GPU availability for your jobs**: diff --git a/docs/source/reference/config.rst b/docs/source/reference/config.rst index 7f24c59063f..cb06a28cdf0 100644 --- a/docs/source/reference/config.rst +++ b/docs/source/reference/config.rst @@ -368,6 +368,18 @@ Available fields and semantics: # Default: 'LOCAL_CREDENTIALS'. remote_identity: LOCAL_CREDENTIALS + # Advanced Azure configurations (optional). + # Apply to all new instances but not existing ones. + azure: + # Specify an existing Azure storage account for SkyPilot-managed containers. + # If not set, SkyPilot will use its default naming convention to create and + # use the storage account unless container endpoint URI is used as source. + # Note: SkyPilot cannot create new storage accounts with custom names; it + # can only use existing ones or create accounts with its default naming + # scheme. + # Reference: https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview + storage_account: user-storage-account-name + # Advanced Kubernetes configurations (optional). kubernetes: # The networking mode for accessing SSH jump pod (optional). diff --git a/docs/source/reference/storage.rst b/docs/source/reference/storage.rst index 20d7ca4685b..3c54680e79b 100644 --- a/docs/source/reference/storage.rst +++ b/docs/source/reference/storage.rst @@ -28,7 +28,7 @@ Object storages are specified using the :code:`file_mounts` field in a SkyPilot # Mount an existing S3 bucket file_mounts: /my_data: - source: s3://my-bucket/ # or gs://, r2://, cos:/// + source: s3://my-bucket/ # or gs://, https://.blob.core.windows.net/, r2://, cos:/// mode: MOUNT # Optional: either MOUNT or COPY. Defaults to MOUNT. This will `mount `__ the contents of the bucket at ``s3://my-bucket/`` to the remote VM at ``/my_data``. @@ -45,7 +45,7 @@ Object storages are specified using the :code:`file_mounts` field in a SkyPilot file_mounts: /my_data: name: my-sky-bucket - store: gcs # Optional: either of s3, gcs, r2, ibm + store: gcs # Optional: either of s3, gcs, azure, r2, ibm SkyPilot will create an empty GCS bucket called ``my-sky-bucket`` and mount it at ``/my_data``. This bucket can be used to write checkpoints, logs or other outputs directly to the cloud. @@ -68,7 +68,7 @@ Object storages are specified using the :code:`file_mounts` field in a SkyPilot /my_data: name: my-sky-bucket source: ~/dataset # Optional: path to local data to upload to the bucket - store: s3 # Optional: either of s3, gcs, r2, ibm + store: s3 # Optional: either of s3, gcs, azure, r2, ibm mode: MOUNT # Optional: either MOUNT or COPY. Defaults to MOUNT. SkyPilot will create a S3 bucket called ``my-sky-bucket`` and upload the @@ -281,14 +281,21 @@ Storage YAML reference source: str The source attribute specifies the path that must be made available - in the storage object. It can either be a local path or a list of local - paths or it can be a remote path (s3://, gs://, r2://, cos://). + in the storage object. It can either be: + - A local path + - A list of local paths + - A remote path using one of the following formats: + - s3:// + - gs:// + - https://.blob.core.windows.net/ + - r2:// + - cos:/// If the source is local, data is uploaded to the cloud to an appropriate - bucket (s3, gcs, r2, or ibm). If source is bucket URI, + bucket (s3, gcs, azure, r2, or ibm). If source is bucket URI, the data is copied or mounted directly (see mode flag below). - store: str; either of 's3', 'gcs', 'r2', 'ibm' + store: str; either of 's3', 'gcs', 'azure', 'r2', 'ibm' If you wish to force sky.Storage to be backed by a specific cloud object storage, you can specify it here. If not specified, SkyPilot chooses the appropriate object storage based on the source path and task's cloud provider. diff --git a/docs/source/reference/yaml-spec.rst b/docs/source/reference/yaml-spec.rst index 35e56726ad4..0354d3d0395 100644 --- a/docs/source/reference/yaml-spec.rst +++ b/docs/source/reference/yaml-spec.rst @@ -300,8 +300,8 @@ Available fields: # Mounts the bucket at /datasets-storage on every node of the cluster. /datasets-storage: name: sky-dataset # Name of storage, optional when source is bucket URI - source: /local/path/datasets # Source path, can be local or s3/gcs URL. Optional, do not specify to create an empty bucket. - store: s3 # Could be either 's3', 'gcs' or 'r2'; default: None. Optional. + source: /local/path/datasets # Source path, can be local or bucket URI. Optional, do not specify to create an empty bucket. + store: s3 # Could be either 's3', 'gcs', 'azure', 'r2', or 'ibm'; default: None. Optional. persistent: True # Defaults to True; can be set to false to delete bucket after cluster is downed. Optional. mode: MOUNT # Either MOUNT or COPY. Defaults to MOUNT. Optional. diff --git a/llm/vicuna-llama-2/scripts/hardcoded_questions.py b/llm/vicuna-llama-2/scripts/hardcoded_questions.py index 9ed7490ca96..bfb8494b086 100644 --- a/llm/vicuna-llama-2/scripts/hardcoded_questions.py +++ b/llm/vicuna-llama-2/scripts/hardcoded_questions.py @@ -190,7 +190,7 @@ def generate_conversations(questions, answers): SkyPilot abstracts away cloud infra burdens: * Launch jobs & clusters on any cloud * Easy scale-out: queue and run many jobs, automatically managed - * Easy access to object stores (S3, GCS, R2) + * Easy access to object stores (S3, GCS, Azure, R2, IBM) SkyPilot maximizes GPU availability for your jobs: * Provision in all zones/regions/clouds you have access to (the Sky), with automatic failover