Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core][Serve] Support TailScale VPN #3458

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft

[Core][Serve] Support TailScale VPN #3458

wants to merge 11 commits into from

Conversation

cblmemo
Copy link
Collaborator

@cblmemo cblmemo commented Apr 22, 2024

A revised version of #3276. If VPN is enabled, does not provision public IP for the cluster and use VPN Private IP.

TODO:

  • Support multi-node cluster
  • [to discuss] Maybe does not use Private IP in VPN to setup SkyPilot runtime env as it needs to wait until instance is up After discussion, we decided to go the current implementation.
  • Cleanup VPN record after cluster termination
  • Skip open ports if VPN is used
  • Reformat to support other VPN providers
  • Smoke tests
  • Check if TailScale is installed in local machine
  • Validate the api key before use
  • Double check if cluster still accessible after sky stop + sky start
  • [to discuss] show use VPN or not in sky status After discussion, we decided to skip it for now.
  • Error out if use VPN and launch on an existing non-VPN cluster; vise versa
  • When setup VPN, check cloud status and set a timeout after the instance is ready on the cloud.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)

All of the following is with this skypilot config, and client joined the VPN:

aws:
  vpn:
    tailscale:
      auth_key: tskey-auth-xxx
      api_key: tskey-api-xxx
      tailnet: [email protected]
  1. Launch a single cluster
$ sky launch --cloud aws --cpus 2 -c t-aws-vpn 'python -m http.server 7001' # without open ports
...

$ sky status --ip t-aws-vpn
100.73.240.56 # an internal ip

$ curl $(sky status --ip t-aws-vpn):7001/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
</ul>
<hr>
</body>
</html>

# in a machine that joins the VPN
$ curl 100.73.240.56:7001/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
</ul>
<hr>
</body>
</html>

# in a random machine
$ curl 100.73.240.56:7001/
# blocking and cannot got response
  1. Basic SkyServe example
# aws-vpn.yaml
service:
  readiness_probe: /
  replicas: 1
resources:
  ports: 8080
  cloud: aws
  cpus: 2
run: python -m http.server 8080
$ sky serve up aws-vpn.yaml -n aws-vpn
...

$ sky serve status --endpoint aws-vpn
100.95.198.34:30001

$ curl -L $(sky serve status --endpoint aws-vpn)/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
</ul>
<hr>
</body>
</html>

# from a machine that joins the VPN
$ curl -L 100.95.198.34:30001/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
</ul>
<hr>
</body>
</html>
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@github-actions github-actions bot added the Stale label Sep 3, 2024
@cblmemo cblmemo removed the Stale label Sep 3, 2024
@skypilot-org skypilot-org deleted a comment from github-actions bot Sep 9, 2024
@cblmemo cblmemo mentioned this pull request Sep 20, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant