Docs: update SkyServe docs. (#2894)

* Docs: update SkyServe docs. * Rewording
skypilot-org · Dec 24, 2023 · 7c514ba · 7c514ba
1 parent d6f57cc
commit 7c514ba
Show file tree

Hide file tree

Showing 2 changed files with 29 additions and 26 deletions.
diff --git a/docs/source/serving/service-yaml-spec.rst b/docs/source/serving/service-yaml-spec.rst
@@ -10,12 +10,12 @@ Available fields:
 
 .. code-block:: yaml
 
-    # Additional section to turn your skypilot task.yaml to a service
+    # The `service` section turns a skypilot task yaml into a service yaml.
     service:
 
-      # Readiness probe (required). This describe how SkyServe determine your
-      # service is ready for accepting traffic. If the readiness probe get a 200,
-      # SkyServe will start routing traffic to your service.
+      # Readiness probe (required). Used by SkyServe to check if your service
+      # replicas are ready for accepting traffic. If the readiness probe returns
+      # a 200, SkyServe will start routing traffic to that replica.
       readiness_probe:
         # Path to probe (required).
         path: /v1/models
@@ -28,9 +28,9 @@ Available fields:
         # based on your service's startup time.
         initial_delay_seconds: 1200
 
-      # We have a simplified version of readiness probe that only contains the
-      # readiness probe path. If you want to use GET method for readiness probe
-      # and the default initial delay, you can use the following syntax:
+      # Simplified version of readiness probe that only contains the readiness
+      # probe path. If you want to use GET method for readiness probe and the
+      # default initial delay, you can use the following syntax:
       readiness_probe: /v1/models
 
       # One of the two following fields (replica_policy or replicas) is required.
@@ -41,10 +41,10 @@ Available fields:
         # Minimum number of replicas (required).
         min_replicas: 1
         # Maximum number of replicas (optional). If not specified, SkyServe will
-        # use fixed number of replicas same as min_replicas and ignore any QPS
-        # threshold specified below.
+        # use a fixed number of replicas (the same as min_replicas) and ignore
+        # any QPS threshold specified below.
         max_replicas: 3
-        # Following thresholds describe when to scale up or down.
+        # Thresholds below describe when to scale up or down.
         # QPS threshold for scaling up (optional). If the QPS of your service
         # exceeds this threshold, SkyServe will scale up your service by one
         # replica. If not specified, SkyServe will **NOT** scale up your service.
@@ -54,23 +54,19 @@ Available fields:
         # replica. If not specified, SkyServe will **NOT** scale down your service.
         qps_lower_threshold: 2
 
-      # Also, for convenience, we have a simplified version of replica policy that
-      # use fixed number of replicas. Just use the following syntax:
+      # Simplified version of replica policy that uses a fixed number of
+      # replicas:
       replicas: 2
 
-      # Controller resources (optional). This describe the resources to use for
-      # the controller. Default to a 4+ vCPU instance with 100GB disk.
-      controller_resources:
-        cloud: aws
-        region: us-east-1
-        instance_type: p3.2xlarge
-        disk_size: 256
+    ##### Fields below describe each replica #####
+
+    # Besides the `service` section, the rest is a regular SkyPilot task YAML.
 
     resources:
-      # Port to run your service (required). This port will be automatically exposed
-      # by SkyServe. You can access your service at http://<endpoint-ip>:<port>.
+      # Port to run your service on each replica (required). This port will be
+      # automatically exposed to the public internet by SkyServe.
       ports: 8080
       # Other resources config...
 
-    # Then comes your SkyPilot task YAML...
+    # Other fields of your SkyPilot task YAML...
 
diff --git a/docs/source/serving/sky-serve.rst b/docs/source/serving/sky-serve.rst
@@ -22,9 +22,9 @@ Why SkyServe?
 
 How it works:
 
-- Each service gets an endpoint that automatically redirects requests to its underlying replicas.
-- The replicas of the same service can run in different regions and clouds — reducing cloud costs and increasing availability.
-- SkyServe transparently handles the load balancing, recovery, and autoscaling of the replicas.
+- Each service gets an endpoint that automatically redirects requests to its replicas.
+- Replicas of the same service can run in different regions and clouds — reducing cloud costs and increasing availability.
+- SkyServe handles the load balancing, recovery, and autoscaling of the replicas.
 
 .. GPU availability has become a critical bottleneck for many AI services. With Sky
 .. Serve, we offer a lightweight control plane that simplifies deployment across
@@ -74,9 +74,14 @@ Use :code:`sky serve status` to check the status of the service:
 
    <div style="height: 20px;"></div>
 
+.. tip::
+
+  Notice that the two replicas are launched in different regions/clouds for the lowest cost and highest GPU availability.
+  This is performed automatically, like a regular ``sky launch``.
+
 If you see the :code:`STATUS` column becomes :code:`READY`, then the service is ready to accept traffic!
 
-Simply ``curl`` the service endpoint --- for the above example, use
+Simply ``curl -L`` the service endpoint --- for the above example, use
 ``44.211.131.51:30001`` which automatically load-balances across the two replicas:
 
 .. code-block:: console
@@ -85,6 +90,7 @@ Simply ``curl`` the service endpoint --- for the above example, use
         -X POST \
         -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
         -H 'Content-Type: application/json'
+
     # Example output:
     {"generated_text":"\n\nDeep learning is a subset of machine learning that uses artificial neural networks to model and solve"}
 
@@ -302,6 +308,7 @@ Send a request using the following cURL command:
         -X POST \
         -d '{"model":"vicuna-13b-v1.3","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Who are you?"}],"temperature":0}' \
         -H 'Content-Type: application/json'
+
     # Example output:
     {"id":"chatcmpl-gZ8SfgUwcm9Xjbuv4xfefq","object":"chat.completion","created":1702082533,"model":"vicuna-13b-v1.3","choices":[{"index":0,"message":{"role":"assistant","content":"I am Vicuna, a language model trained by researchers from Large Model Systems Organization (LMSYS)."},"finish_reason":"stop"}],"usage":{"prompt_tokens":19,"total_tokens":43,"completion_tokens":24}}