Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descheduling support for FlinkDeployments #5987

Open
deefreak opened this issue Dec 27, 2024 · 14 comments
Open

Descheduling support for FlinkDeployments #5987

deefreak opened this issue Dec 27, 2024 · 14 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@deefreak
Copy link

What would you like to be added:
Currently, the descheduler only supports Deployment objects.
For our use case, we wanted to support FlinkDeployments and for that I have done changes and it has been working fine in our environment.

Why is this needed:
This is needed so that descheduler can support FlinkDeployments and reschedules it to other clusters if flinkdeployment pods are unschedulable.

I would like to contribute for this feature.

@deefreak deefreak added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 27, 2024
@XiShanYongYe-Chang
Copy link
Member

Hi @deefreak thanks for your feedback!

For our use case, we wanted to support FlinkDeployments and for that I have done changes and it has been working fine in our environment.

What are your main changes? Can you explain them briefly?

@deefreak
Copy link
Author

deefreak commented Dec 28, 2024

Hi @XiShanYongYe-Chang two changes are required:

  1. Add flinkdeployment gvk here
  2. As descheduler tries to get number of unschedulable pods and decide on rescheduling,
    I added a case for FlinkDeployments here.
    As FlinkDeployments have jobManager and taskManager pods, we need to list both of them and check if any pod is unschedulable in the same way we are doing for deployment object.

@XiShanYongYe-Chang
Copy link
Member

Thanks @deefreak for your quick response.

Would you like to share it at a community meeting?

@deefreak
Copy link
Author

@XiShanYongYe-Chang sure.

@deefreak
Copy link
Author

@XiShanYongYe-Chang We will have this meeting on 6th January 2025? Should I add it in the document of the community meeting?

@XiShanYongYe-Chang
Copy link
Member

We will have this meeting on 6th January 2025?

Yes, we will

Should I add it in the document of the community meeting?

Yes, you can do it.

@deefreak
Copy link
Author

deefreak commented Jan 4, 2025

@XiShanYongYe-Chang I don't have edit access to the document, I have requested for it.
Can you add this as an item for the meet on 7th?

@XiShanYongYe-Chang
Copy link
Member

Hi @deefreak, you can do it by:

By joining the google groups you will be able to edit the meeting notes.
Join google group mailing list: https://groups.google.com/forum/#!forum/karmada

@RainbowMango
Copy link
Member

This is needed so that descheduler can support FlinkDeployments and reschedules it to other clusters if flinkdeployment pods are unschedulable.

Hi @deefreak, have you ever looked at the Application State Preservation feature which was designed exactly based on FlinkDeployments failover scenario.

By the way, why is FlinkDeployment unschedulable? As far as I know, the Karmada scheduler will select a cluster with enough available resources, which would help reduce the chance that the workload can not started after deploying to cluster.

@deefreak
Copy link
Author

deefreak commented Jan 6, 2025

@RainbowMango I wasn't aware of this. Just went through it. So basically we define some failover conditions in the propagation policy itself and accordingly, it takes rescheduling decisions. (Mostly if it is "Unhealthy").

I wasn't using this feature, instead when I checked descheduler code which takes decision for triggering the rescheduling, it checks if there are any pod in the cluster which is unschedulable. It only has support for Deployments. I tried adding support for flinkdeployment there which means descheduler will trigger a rescheduling if none of the pods belonging to flinkdeployments(job manager + task manager pods) is schedulable.

@RainbowMango
Copy link
Member

So basically we define some failover conditions in the propagation policy itself and accordingly, it takes rescheduling decisions. (Mostly if it is "Unhealthy").

Yes, you can find an example from #5788 (comment).

@RainbowMango
Copy link
Member

Hi @deefreak I see you added an agenda to this week's community meeting. I'd love to meet you there, just want to know which time zone you are located in. I'm thinking of moving the meeting 1 hour earlier because it is kind of too late(midnight 00:00) for me.

@deefreak
Copy link
Author

deefreak commented Jan 6, 2025

My timezone is IST. 1 hour earlier is fine for me.

@RainbowMango
Copy link
Member

OK. I will send an email to the Karmada mail group and then update the calendar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: No status
Development

No branches or pull requests

3 participants