Skip to content
This repository has been archived by the owner on Dec 19, 2022. It is now read-only.

Cocalc project restarts. How to investigate? #21

Open
Debilski opened this issue Nov 6, 2020 · 2 comments
Open

Cocalc project restarts. How to investigate? #21

Debilski opened this issue Nov 6, 2020 · 2 comments

Comments

@Debilski
Copy link

Debilski commented Nov 6, 2020

We are experimenting with Cocalc (a slightly slimmed image with fewer kernels and with increased memory defaults) for remote teaching/pair programming. (It works pretty well!) I am currently noticing three different types of crashes and would like to get a hint as to how to find out why the crash occurred/how I can fix it/see the logs.

  1. Python kernel crashes. Seems to occur when I allocate too much memory in a numpy array for example. The relevant cell gets a red tag with the kernel killed message. All understandable, I can live with that. (Although I wouldn’t mind seeing this somewhere in some project admin/server admin logs.)

  2. Project Pod sometimes gets killed. All I see is a Killed event in kubectl get events. Doesn’t happen super often, so it is not too bad, but I’d still like to get an idea why.

  3. Project restarts without notice. Sometimes this happens every 10 minutes while people are working on a project, so it doesn’t seem to be some idle timeout. (I figured it’s not the worst thing that can happen for teaching, as it clears all hidden variables and gives the student a clean state. ;) ) This is the nastiest problem as the reason is very unclear to me and I wouldn’t know where to look (and which limit to increase).

Any hints?

@williamstein
Copy link
Contributor

It might be that just updating the images would fix the problem. I don't know. Note that I spent about a month last year creating cocalc-kubernetes based on how cocalc-docker worked, but we've had a grand total of zero customers for cocalc-kubernetes (compared to quite a few for cocalc-docker). Thus development on cocalc-kubernetes has stalled, due to lack of demand from serious customers.

@Debilski
Copy link
Author

Thanks for the info. The images are already running an updated (and slightly patched – the /health endpoint wouldn’t work, causing even earlier crashes) image. I hadn’t looked into cocalc-docker though. Maybe it would already be sufficient for the next edition of our course.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants