Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCP] Support private IPs for GCP #2819

Merged
merged 28 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions docs/source/cloud-setup/cloud-permissions/gcp.rst
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ The custom VPC should contain the :ref:`required firewall rules <gcp-minimum-fir
Using Internal IPs
-----------------------
For security reason, users may only want to use internal IPs for SkyPilot instances.
To do so, you can use SkyPilot's global config file ``~/.sky/config.yaml`` to specify the ``gcp.use_internal_ips`` and ``gcp.ssh_proxy_command`` field (to see the detailed syntax, see :ref:`config-yaml`):
To do so, you can use SkyPilot's global config file ``~/.sky/config.yaml`` to specify the ``gcp.use_internal_ips`` and ``gcp.ssh_proxy_command`` fields (to see the detailed syntax, see :ref:`config-yaml`):

.. code-block:: yaml

Expand All @@ -295,7 +295,15 @@ Instances created with internal IPs only on GCP cannot access public internet by
cloud NAT needs to be setup for the VPC (see `GCP's documentation <https://cloud.google.com/nat/docs/overview>`__ for details).


Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
Cloud NAT is a regional resource, so it will need to be created in each region that SkyPilot will be used in. To limit SkyPilot to use some specific regions only, you can specify the ``gcp.ssh_proxy_command`` to be a dict mapping from region to the SSH proxy command for that region (see :ref:`config-yaml` for details):
Cloud NAT is a regional resource, so it will need to be created in each region that SkyPilot will be used in.


.. image:: ../../images/screenshots/gcp/cloud-nat.png
:width: 80%
:align: center
:alt: GCP Cloud NAT

To limit SkyPilot to use some specific regions only, you can specify the ``gcp.ssh_proxy_command`` to be a dict mapping from region to the SSH proxy command for that region (see :ref:`config-yaml` for details):

.. code-block:: yaml

Expand Down
Binary file added docs/source/images/screenshots/gcp/cloud-nat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,14 +136,14 @@ Available fields and semantics:
# Please refer to the aws.ssh_proxy_command section above for more details.
### Format 1 ###
# A string; the same proxy command is used for all regions.
ssh_proxy_command: ssh -W %h:%p -i ~/.ssh/sky-key -o StrictHostKeyChecking=no ec2-user@<jump server public ip>
ssh_proxy_command: ssh -W %h:%p -i ~/.ssh/sky-key -o StrictHostKeyChecking=no gcpuser@<jump server public ip>
### Format 2 ###
# A dict mapping region names to region-specific proxy commands.
# NOTE: This restricts SkyPilot's search space for this cloud to only use
# the specified regions and not any other regions in this cloud.
ssh_proxy_command:
us-east-1: ssh -W %h:%p -p 1234 -o StrictHostKeyChecking=no [email protected]east-1.proxy
us-east-2: ssh -W %h:%p -i ~/.ssh/sky-key -o StrictHostKeyChecking=no ec2-user@<jump server public ip>
us-central1: ssh -W %h:%p -p 1234 -o StrictHostKeyChecking=no [email protected]central1.proxy
us-west1: ssh -W %h:%p -i ~/.ssh/sky-key -o StrictHostKeyChecking=no gcpuser@<jump server public ip>


# Reserved capacity (optional).
Expand Down
17 changes: 10 additions & 7 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -2342,8 +2342,9 @@ def __init__(self,
self.cluster_name_on_cloud = cluster_name_on_cloud
self._cluster_yaml = cluster_yaml.replace(os.path.expanduser('~'), '~',
1)
# List of (internal_ip, external_ip) tuples for all the nodes
# in the cluster, sorted by the external ips.
# List of (internal_ip, feasible_ip) tuples for all the nodes in the
# cluster, sorted by the feasible ips. The feasible ips can be either
# internal or external ips, depending on the use_internal_ips flag.
self.stable_internal_external_ips = stable_internal_external_ips
self.stable_ssh_ports = stable_ssh_ports
self.launched_nodes = launched_nodes
Expand Down Expand Up @@ -2511,7 +2512,7 @@ def is_provided_ips_valid(ips: Optional[List[Optional[str]]]) -> bool:
logger.debug('Skipping the fetching of internal IPs as the cached '
'external IPs matches the newly fetched ones.')
# Optimization: If the cached external IPs are the same as the
# retrieved external IPs, then we can skip retrieving internal
# retrieved feasible IPs, then we can skip retrieving internal
# IPs since the cached IPs are up-to-date.
return
logger.debug(
Expand All @@ -2521,10 +2522,9 @@ def is_provided_ips_valid(ips: Optional[List[Optional[str]]]) -> bool:

if use_internal_ips:
# Optimization: if we know use_internal_ips is True (currently
# only exposed for AWS), then our AWS NodeProvider is
# guaranteed to pick subnets that will not assign public IPs,
# thus the first list of IPs returned above are already private
# IPs. So skip the second query.
# only exposed for AWS and GCP), then our provisioner is guaranteed
# to not assign public IPs, thus the first list of IPs returned
# above are already private IPs. So skip the second query.
cluster_internal_ips = list(cluster_feasible_ips)
elif is_provided_ips_valid(internal_ips):
logger.debug(f'Using provided internal IPs: {internal_ips}')
Expand All @@ -2543,6 +2543,9 @@ def is_provided_ips_valid(ips: Optional[List[Optional[str]]]) -> bool:
f'Expected same number of internal IPs {cluster_internal_ips}'
f' and external IPs {cluster_feasible_ips}.')
concretevitamin marked this conversation as resolved.
Show resolved Hide resolved

# List of (internal_ip, feasible_ip) tuples for all the nodes in the
# cluster, sorted by the feasible ips. The feasible ips can be either
# internal or external ips, depending on the use_internal_ips flag.
internal_external_ips: List[Tuple[str, str]] = list(
concretevitamin marked this conversation as resolved.
Show resolved Hide resolved
zip(cluster_internal_ips, cluster_feasible_ips))

Expand Down
6 changes: 3 additions & 3 deletions sky/skypilot_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,11 @@ def set_nested(keys: Iterable[str], value: Any) -> Dict[str, Any]:
return to_return


def overwrite_config_file(config: dict) -> None:
def unsafe_overwrite_config_file_on_controller(config: dict) -> None:
"""Overwrites the config file with the current config.

This function should only be called very carefully to avoid unexpected
behavior due to the overwrite. Currently, it is only used by the spot/serve
This function should be called very carefully to avoid unexpected behavior
due to the overwrite. Currently, it is only used by the spot/serve
controllers to reconfigure the network settings before any further
operations are done.
"""
Expand Down
2 changes: 1 addition & 1 deletion sky/utils/controller_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ def setup_proxy_command_on_controller():
config_dict = skypilot_config.set_nested(proxy_command_key,
ssh_proxy_command)

skypilot_config.overwrite_config_file(config_dict)
skypilot_config.unsafe_overwrite_config_file_on_controller(config_dict)


def maybe_translate_local_file_mounts_and_sync_up(task: 'task_lib.Task',
Expand Down