Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a Higher Cluster Wide MaximumMaxReplicas category #426

Open
AVSBharadwaj opened this issue Dec 16, 2024 · 5 comments
Open

Introduce a Higher Cluster Wide MaximumMaxReplicas category #426

AVSBharadwaj opened this issue Dec 16, 2024 · 5 comments
Assignees
Labels
kind/feature New feature or request

Comments

@AVSBharadwaj
Copy link
Collaborator

WHAT

We would like to introduce a second category of the tortoise-controller spec MaximumMaxReplicas called HigherMaximumMaxReplicas which will have value >= MaximumMaxReplicas. This will also be a Int value and will be defined beforehand ad a cluster wide constant.

WHY

We at mercari recently observed a internal outage which happened basically due to the fact that the service was in need of Replicas more than the cluster wide limit specified in MaximumMaxReplicas. So we want to bring another higher category but limit it only to certain pre configured services defined in the HigherMaximumMaxReplicasServiceWhitelist

@AVSBharadwaj
Copy link
Collaborator Author

Created initial PR for the same for ref: #425

@sanposhiho
Copy link
Collaborator

sanposhiho commented Dec 16, 2024

-1, the proposed API looks weird and not general enough for the future.

So, essentially, the problem is that we have MaximumMaxReplicas, but it's difficult to define the one value for it that fits all services because there're various sized services.
Probably, we should define a group of services in the config, and change the config to define a different MaximumMaxReplicas for each.

So, the config would be like:

type Config struct {
  // serviceGroups defines the groups that are used in other configs.
  ServiceGroups []ServiceGroup

  MaximumMaxReplicas []MaximumMaxReplicasPerGroup
}

type MaximumMaxReplicasPerGroup struct {
  // ServiceName refers to one ServiceGroup at Config.ServiceGroups
  // It can be nil; if it's nil, this MaximumMaxReplica would apply to all services.
  ServiceName *string 

  MaximumMaxReplica int32
}

// ServiceGroup defines a group of services.
// Only one of Namespace or LabelSelector should be specified.
type ServiceGroup struct {
  // Name is the group's name. (e.g., big-service, fintech-service, etc)
  Name string

  Namespace *string

  // ... Or at the first step, we can just support Namespace and skip implementing LabelSelector. It's up to you.
  LabelSelector *metav1.LabelSelector
}

Then, in the future, we can do the same with MaximumCPURequest etc, if we want.

What do you think?

@sanposhiho sanposhiho added the kind/feature New feature or request label Dec 16, 2024
@AVSBharadwaj
Copy link
Collaborator Author

AVSBharadwaj commented Dec 17, 2024

-1, the proposed API looks weird and not general enough for the future.

So, essentially, the problem is that we have MaximumMaxReplicas, but it's difficult to define the one value for it that fits all services because there're various sized services. Probably, we should define a group of services in the config, and change the config to define a different MaximumMaxReplicas for each.

So, the config would be like:

type Config struct {
  // serviceGroups defines the groups that are used in other configs.
  ServiceGroups []ServiceGroup

  MaximumMaxReplicas []MaximumMaxReplicasPerGroup
}

type MaximumMaxReplicasPerGroup struct {
  // ServiceName refers to one ServiceGroup at Config.ServiceGroups
  // It can be nil; if it's nil, this MaximumMaxReplica would apply to all services.
  ServiceName *string 

  MaximumMaxReplica int32
}

// ServiceGroup defines a group of services.
// Only one of Namespace or LabelSelector should be specified.
type ServiceGroup struct {
  // Name is the group's name. (e.g., big-service, fintech-service, etc)
  Name string

  Namespace *string

  // ... Or at the first step, we can just support Namespace and skip implementing LabelSelector. It's up to you.
  LabelSelector *metav1.LabelSelector
}

Then, in the future, we can do the same with MaximumCPURequest etc, if we want.

What do you think?

It makes sense. This is the most general approach of having multiple categories of MaximumMaxReplicas instead of just having 2 categories. It makes sense to right away implement this general approach. Let me discuss with @randytqwjp as well.

Also i want to propose some changes:

type Config struct {
  // serviceGroups defines the groups that are used in other configs.
  ServiceGroups []ServiceGroup

  MaximumMaxReplicas []MaximumMaxReplicasPerGroup
}

type MaximumMaxReplicasPerGroup struct {
  // ServiceName refers to one ServiceGroup at Config.ServiceGroups
  // It can be nil; if it's nil, this MaximumMaxReplica would apply to all services.
  ServiceGroupName *string       <------ name change from `ServiceName to ServiceGroupName`

  MaximumMaxReplica int32
}

// below structure has been changed 

// ServiceGroup defines a group of services.
// Namespace represents a Kubernetes namespace and its associated label selectors.
type Namespace struct {
    Name           string                  // Namespace name
    LabelSelectors []*metav1.LabelSelector // Slice of label selectors within this namespace
}

// ServiceGroup represents a collection of services grouped together with namespace awareness.
type ServiceGroup struct {
    // Name is the group's name (e.g., big-service, fintech-service, etc).
    Name string

    // Namespaces represent multiple namespaces with their label selectors.
    Namespaces []Namespace // A slice of Namespace structs
}

@sanposhiho
Copy link
Collaborator

👍 Your proposed change also makes sense.

@AVSBharadwaj
Copy link
Collaborator Author

logic is implemented in this PR: #428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants