Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User pod launching timeout with large single user image when cluster scaling up #3010

Open
xcompass opened this issue Feb 2, 2023 · 7 comments
Labels

Comments

@xcompass
Copy link
Contributor

xcompass commented Feb 2, 2023

Bug description

During the cluster scaling up, once the node is provisioned and is ready in k8s, the user pods may get scheduled to the newly provisioned node. However, if the continuous image puller is still pulling the single user image because the image is large (a few GB) or slow network, the user will get timeout error message from UI if the image pulling is not completed by the set timeout. Increasing the timeout might be a quick fix but it is not a good user experience as user may have to wait long time to get access (for us, it may take 10mins).

Expected behaviour

The user pods should not be scheduled to the new node before image pulling is done.

Actual behaviour

See Bug description

How to reproduce

  1. Create a large single user image or use a slower registry/network
  2. Trigger a scaling up event
  3. Spawn a few user pods, some of them will be scheduled to the new node when the node is ready
  4. The user will see timeout from UI

Our current workaround

Adding a taint before starting image puller or when provisioning the node to prevent the user pod being schedule to the node. Then remove taint after image puller is done.

We have to code ready to send a PR. But want to see if there is a better way to solve it. Happy to send the PR anytime.

@xcompass xcompass added the bug label Feb 2, 2023
@welcome
Copy link

welcome bot commented Feb 2, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@consideRatio
Copy link
Member

KubeSpawner makes a pod be registered with the k8s api-server, at that point in time, the jupyterhub's server startup timeout starts. You can configure this timeout, and typically I increase it to 20 minutes when large images are involved.

https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-starttimeout

@xcompass
Copy link
Contributor Author

xcompass commented Feb 3, 2023

Thanks for the reference. We tried to increase the timeout, but our user has to wait a long time (in our case around 10mins) to be able to get to the UI after logging in. However, we still have resources in the existing node to assign user. This happens when user placeholder is enabled with node density scheduling or default scheduling without placeholder enabled. As new node has no pods running, the user pod will be scheduled there as soon as it is ready (before single user image is pulled).

This is kind like an edge case as it is not being triggered very often. It has to satisfy the following conditions:

  1. A new node being provisioned (scaling up)
  2. User placeholders are being used (if using node density scheduling)
  3. Large single user image is being used and pre-pulled by continuous image puller
  4. A user is trying to login after the new node is ready but before single user image is pulled.

@consideRatio
Copy link
Member

This is a tricky topic, I've considered in the past and I recall I couldn't conclude a strategy to resolve it that was simple and robust enough to think it was worth working towards.

  • If the strategy used to solve this isn't very robust, it may cause other problems for users.
  • If the strategy used to solve this isn't simple enough, it may cause complexity maintaining it long term - and capacity to maintain open source projects is always limited.

If you have a strategy to handle this situation, that you believe will be robust and simple - for example not assuming to much about the cloud providers or requiring users to setup node pools with various taints etc, then I'm open to reviewing it!

I've not now read up all notes from past cosiderations of this, but this is a topic I've considered in depth in the past without a clear resolution. Here are some past topics that relates to some degree:

@xcompass
Copy link
Contributor Author

xcompass commented Feb 3, 2023

Thanks @consideRatio. I wasn't aware there has been quite extensive discussions. I think my solution is simple enough. So sending in the PR #3011 for review. There is one simple go binary added to manage the taints currently hosted in our repo. Happy to donate to jupyterhub as well.

@consideRatio
Copy link
Member

I understand now from the reproduction steps that we thought about two different situations. You considered if there were multiple nodes available for the user to schedule on, and I considered if the user pod starting triggered a scale up because it couldn't fit on a node.

Hmmm if you are using the jupyterhub chart provided user-scheduler, you can make user pods pack on the most used node which may help you. Then there is use of user-placeholder pods complicating things as well.

Are you currently using user-scheduler and/or user-placeholder pods @xcompass? I'd like to pinpoint the problems we could look to resolve with this new strategy that isn't resolved using existing techniques. This will be required for the documentation we need for a feature like this anyhow.

@xcompass
Copy link
Contributor Author

xcompass commented Feb 7, 2023

Yes, we are using user-scheduler and user-placeholder. I remember we did the testing with both enabled. I'll redo a test to make sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants