Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial changes for IG raw deployment mode #340

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/admin/kubernetes_deployment.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Kubernetes Deployment Installation Guide
KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.

** Starting with Kserve vx.xx release `InferenceGraph` as well supports `RawDeployment` mode
See release notes

Kubernetes 1.22 is the minimally required version and please check the following recommended Istio versions for the corresponding
Kubernetes version.

Expand Down
88 changes: 88 additions & 0 deletions docs/blog/articles/2024-XX-XX-KServe-X.XX-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Announcing: KServe vx.xx

We are excited to announce the release of KServe x.xx, in this release we made enhancements to the KServe control plane, especially brining RawDeployment for `InferenceGraph` as well. Previously `RawDeployment` existed only for `InferenceService`

Here is a summary of the key changes:

## KServe Core Inference Enhancements

- Inference Graph enhancements for supporting `RawDeployment` along with Auto Scaling configuration right within the `InferenceGraphSpec`

IG `RawDeployment` makes the deployment light weight using native k8s resources. See the comparison below

![Inference graph Knative based deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png)

![Inference graph raw deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png)

AutoScaling configuration fields were introduced to support scaling needs in
`RawDeployment` mode. These fields are optional and when added effective only when this annotation `serving.kserve.io/autoscalerClass` not pointing to `external`
see the following example with Auto scaling fields `MinReplicas`, `MaxReplicas`, `ScaleTarget` and `ScaleMetric`:

```yaml
apiVersion: serving.kserve.io/v1alpha1
kind: InferenceGraph
metadata:
name: graph_with_switch_node
annotations:
serving.kserve.io/deploymentMode: "RawDeployment"
spec:
nodes:
root:
routerType: Sequence
steps:
- name: "rootStep1"
nodeName: node1
dependency: Hard
- name: "rootStep2"
serviceName: {{ success_200_isvc_id }}
node1:
routerType: Switch
steps:
- name: "node1Step1"
serviceName: {{ error_404_isvc_id }}
condition: "[@this].#(decision_picker==ERROR)"
dependency: Hard
MinReplicas: 5
MaxReplicas: 10
ScaleTarget: 50
ScaleMetric: "cpu"
```
For more details please refer to the [issue](https://github.com/kserve/kserve/issues/2454).

-

### Enhanced Python SDK Dependency Management

-
-

### KServe Python Runtimes Improvements
-

### LLM Runtimes

#### TorchServe LLM Runtime

#### vLLM Runtime

## ModelMesh Updates

### Storing Models on Kubernetes Persistent Volumes (PVC)

### Horizontal Pod Autoscaling (HPA)

### Model Metrics, Metrics Dashboard, Payload Event Logging

## What's Changed? :warning:

## Join the community

- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP))
- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!


Thanks for all the contributors who have made the commits to 0.11 release!

The KServe Working Group
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 57 additions & 0 deletions docs/reference/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,63 @@ Kubernetes core/v1.Affinity
<em>(Optional)</em>
</td>
</tr>

<tr>
<td>
<code>minReplicas</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.</p>
</td>
</tr>
<tr>
<td>
<code>maxReplicas</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>Maximum number of replicas for autoscaling.</p>
</td>
</tr>
<tr>
<td>
<code>scaleTarget</code><br/>
<em>
int
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
concurrency and rps targets are supported by Knative Pod Autoscaler
(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-targets/">https://knative.dev/docs/serving/autoscaling/autoscaling-targets/</a>).</p>
</td>
</tr>
<tr>
<td>
<code>scaleMetric</code><br/>
<em>
<a href="#serving.kserve.io/v1beta1.ScaleMetric">
ScaleMetric
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>ScaleMetric defines the scaling metric type watched by autoscaler
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
Knative Pod Autoscaler(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-metrics">https://knative.dev/docs/serving/autoscaling/autoscaling-metrics</a>).</p>
</td>
</tr>


</tbody>
</table>
<h3 id="serving.kserve.io/v1alpha1.InferenceGraphStatus">InferenceGraphStatus
Expand Down