Kernel nanny proposal #14

takluyver · 2016-04-18T16:12:21Z

As discussed at the dev meeting. There are a few TODOs which have not yet been decided. We can bikeshed about them, or whoever gets to implementing the relevant bits first can try their preferred option ;-).

Pinging @JanSchulz, who was interested in this for IRkernel logging.

jankatins · 2016-04-18T16:30:42Z

Not sure if I understand the details, but currently the notebook isn't very good at shutting down the R kernel on windows, because the R kernel is not a single process, but more like R.exe -> cmd -> rterm.exe [see https://github.com/jupyter/jupyter_client/issues/104]. I'm not sure if the "nanny" can detect such a thing without a heartbeat?

[Such things might happen even for python kernels, if you use a batch file with activate <env> & python <kernel startup line>, which is needed to get the correct PATH in a kernel...]

takluyver · 2016-04-18T16:37:05Z

I suspect it won't make much of a difference either way in that situation. Both currently and with the kernel nanny, it will send shutdown_request to ask the kernel to shut itself down, and if it doesn't shut down within some time period, it will terminate it more forcefully. I'd guess that second bit is where it goes wrong, since it only knows about the top-level process that it started.

Besides fiddling with the time we wait for the kernel to shut itself down, I'm not sure what we could do to improve that.

minrk · 2016-04-19T09:33:52Z

kernel-nanny.md

+  instructing the nanny to shut down the kernel.
+* A new message type on the control channel from the frontend to the nanny,
+  instructing the nanny to signal/interrupt the kernel. (*TODO: Expose all Unix
+  signals, or just SIGINT?*)


A signal_request message makes the most sense. I don't think there's a reason to limit to interrupt/term/kill, all of which we probably want.

For Unix systems that certainly makes sense. For Windows, should we just pick some numbers to refer to the available ways we have of interrupting/stopping the kernel process?

I think only one or two signals work on Windows reliably, but they are still integers, aren't they?

AIUI Windows doesn't really have signals at all, but Python exposes certain similar operations through the same interface it uses for signals on Windows. The description of os.kill has some useful info:

https://docs.python.org/3/library/os.html#os.kill

We could quite reasonably expose the same set of options with the same meanings as Python does, of course.

Given that the nanny process is going to run in the same machine as the kernel, it makes sense that the nanny process is asked to interrupt the kernel by means of a message similar to shutdown_request, then the nanny process interrupts the kernel process by sending the appropriate signal.

Right, that's exactly how this will work. We're just trying to work out what form the message will take. If all the world was Unix, we'd almost certainly just call it signal_request, and pass a signal number or name. But things get a bit more complicated when we consider kernels running on Windows.

For the windows problems, see here: jupyter/jupyter_client#104

minrk · 2016-04-19T09:36:34Z

Thanks for writing this up, @takluyver!

willingc · 2016-04-19T20:41:35Z

kernel-nanny.md

+
+When a frontend wants to start a kernel, it currently instantiates a `KernelManager`
+object which reads the kernelspec to find how to start the kernel, writes a
+connection file, and launches the kernel process. With this process, it will


Perhaps for clarity:
"With this proposed process, the frontend will..."

Thanks, tweaked

willingc · 2016-04-19T21:37:39Z

@takluyver Nicely designed and clearly written. I made a drawing by hand of the frontends, nanny, kernel, and channels; let me know if you would like a copy. 😄

rgbkrk · 2016-04-19T23:15:34Z

kernel-nanny.md

+- There will be a consistent way to start kernels without a frontend
+  (`jupyter kernel --kernel x`).
+- Kernel stdout & stderr can be captured at the OS level, with real-time updates
+  of output.


takluyver · 2016-04-20T10:06:54Z

Thanks all!

@willingc, yes, it would be good to see your drawing, to check if the explanation conveyed what I was thinking clearly.

willingc · 2016-04-20T16:27:38Z

@takluyver Here's the link to the drawing's folder on Dropbox: https://www.dropbox.com/sh/kzc9bom60c9e57x/AAAWcdlGo8RZB9cklEv7jC2ua?dl=0

takluyver · 2016-04-20T17:45:29Z

Thanks, that looks good.

willingc · 2016-04-20T17:55:09Z

@takluyver Great. You detailed things out very clearly 🔑

n-riesco · 2016-04-21T15:46:42Z

kernel-nanny.md

+advantages over the current situation, including:
+
+- Kernels will no longer need to implement the 'heartbeat' for frontends to
+  check that they are still alive.


How would the nanny process check the kernel is alive?

e.g. subprocess.Popen.poll(), but depending on how it's written, there may well be smarter ways. On Unix, the parent process is sent SIGCHLD when one of its children dies.

n-riesco · 2016-04-21T16:47:26Z

I would suggest that this proposal is split into four proposals:

a proposal to replace SIGINT.
a proposal for kernels to capture the low level stderr and stdout streams and forward them to the frontend.
a proposal to introduce the command jupyter kernel --kernel x
a proposal (specific to IPython) that will use a nanny process to implement the first two proposals.

takluyver · 2016-04-21T16:54:26Z

a proposal (specific to IPython) that will use a nanny process to implement the first two proposals.

This is absolutely not specific to IPython - the proposal is for the nanny process to be used for all kernels.

Your 1 & 2 don't really make sense without a nanny process. 3 is doable, but it's a more incidental benefit. I don't see the benefit of splitting this up into smaller pieces: it's one change to the architecture that lets us do a number of useful things, which seems like exactly the right scope for a JEP.

This was also discussed at the in-person dev meeting, and while I don't want to suggest that it's closed for discussion, we did spend some time hashing out what we wanted, and I'd really hope that the remaining issues to work out are details, not the fundamental nature of the proposal.

n-riesco · 2016-04-21T17:09:53Z

On 21/04/16 17:54, Thomas Kluyver wrote:

Your 1 & 2 don't really make sense without a nanny process.

I think the kernel is in a better position to handle 1 and specially 2 than an agnostic nanny process:

a kernel, if really needed, can implement its own nanny process to handle 1 and 2
the nanny process cannot determine the origin of stdout and stderr without the kernel's help
a kernel can always capture the low level stdout/stderr

lbustelo · 2016-04-21T18:06:38Z

@takluyver How about for kernels that are remote? I know this is not officially supported by the notebook server, but it is something that we've experimented with and could be a requirement in certain deployments.

jasongrout · 2016-04-21T19:22:47Z

@n-riesco - the idea in the proposal is that rather than having every kernel implement the capturing and signal/interrupt logic, we'd implement it once outside of the kernel and everyone automatically benefits. As for capturing output, that's opt-in for a kernel, so a kernel absolutely can do their own input/output instead of having the nanny handle it. The nanny makes it much easier to have this automatically taken care of.

jasongrout · 2016-04-21T19:26:11Z

Another concern brought up in the meeting was the latency introduced in forwarding messages through the nanny. Can you mention that in the proposal? I thought @minrk said he might run some tests to get some idea about how much the latency on messages would be impacted by this proposal.

n-riesco · 2016-04-21T22:18:14Z

@takluyver I'm sorry for suggesting a proposal split.

Here are 2 suggestions to the current proposal:

make the nanny an opt-in feature declared in the kernel spec (currently, IJavascript can run in the official docker images for Node.js; if the nanny was to be made compulsory, then the nanny (and all the dependencies) would have to be installed in the docker container).
consider extending the jupyter protocol with a message for the kernel to record log messages (so that frontends have access to them).

jasongrout · 2016-04-21T22:37:03Z

@minrk - how heavy-weight do you see the nanny being? I imagined either a python file, or a lightweight OS-specific C program with zeromq as a dependency.

takluyver · 2016-04-22T13:32:27Z

@lbustelo The idea is that the nanny and the kernel are always running together on the same system. They may both be remote from the frontend (e.g. the notebook server), and this will work much like it already does - zmq messages sent over the network. One of the key advantages of this is that will allow interrupting remote kernels, which is currently impossible.

@n-riesco @jasongrout I definitely see the nanny as being a lightweight thing with few dependencies. In the first instance, it will likely be written in Python, because that's what we can write and debug most effectively, but I may later use it as an excuse to brush up on a language like Rust or Go, which will make it even lighter.

consider extending the jupyter protocol with a message for the kernel to record log messages (so that frontends have access to them).

The logging system is what you'll want to rely on to debug problems with the messaging, so I want it to be a) a separate mechanism, and b) as simple as possible, like 'open this file and write to it'. We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it. I like this Unix-y approach here, because it provides a lot of flexibility while requiring very little complexity in the kernels.

n-riesco · 2016-04-22T16:48:52Z

On 22/04/16 14:32, Thomas Kluyver wrote:

[...] We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it.

How would that work in the case of remote kernels?

takluyver · 2016-12-08T23:08:14Z

There are advantages either way; this way allows the nanny to be running as a service, so remote frontends have a standard way to connect, see which kernel specs are available, and start one. Neither of us felt strongly, but I want to pick something so I can go and prototype it.

Even with nanny-per-system, I'd still like to provide an entry point to start the nanny and immediately start one kernel, for the cases where you're spinning up a VM or container to run a single kernel.

jasongrout · 2016-12-09T00:23:36Z

I want to pick something so I can go and prototype it.

Definitely +1!

lbustelo · 2016-12-09T01:07:39Z

When thinking about remote frontends, it starts to feel like this may be something that belongs in kernel gateway. See https://github.com/jupyter/kernel_gateway_demos/tree/master/nb2kg.

takluyver · 2017-02-16T17:36:40Z

The proposed 'kernel nanny', now that we're leaning towards one nanny managing multiple kernels, is indeed quite like kernel gateway. The key difference is that this would expose a ZeroMQ interface, rather than an HTTP/websockets interface.

Do we still want to do this, or should we always use HTTP for managing remote kernels? I see some potential advantages of exposing a ZMQ interface, but I don't know if they're compelling enough:

Security could be simpler in some situations as it doesn't rely on SSL certificates.
There may be more (or better supported) zmq libraries for some languages than websocket client libraries, as the main purpose of websockets is to talk to a browser. I haven't surveyed zmq & websocket client libraries, though.

If we do want a ZMQ interface to managing kernels, would it share much code with kernel gateway? Would it make sense to be part of the kernel gateway project?

tristanz · 2017-02-16T19:08:34Z

As a consumer, I find ZMQ and custom encryption to be a cost compared to WebSockets + SSL.

takluyver · 2017-02-16T20:01:18Z

Right, there are definitely situations where that's easier. But I could imagine other situations where dealing with SSL certificates is more complex than a simple shared secret for the connection.

lbustelo · 2017-02-17T00:44:03Z

@takluyver Checkout the suppor in kernel gateway for personalities. Might be that it can get enhanced to expose ZMQ.

parente · 2017-02-17T03:07:26Z

The kernel gateway is using the notebook server code as a package and so only knows how to communicate with the outside world using HTTP/Websocket based protocols. Adding a zmq personality would be quite an undertaking.

Also, I'm not sure what the future holds for KG in general. Notebook has subsumed some of its features (e.g., token auth, kernel activity monitoring) and jupyter_server is on the enhancement proposal table for separating the frontend and backend bits of notebook.

minrk · 2017-02-17T17:16:42Z

websockets definitely have some nice characteristics when talking over remote networks (dealing with all of the connections over a single port is a big one). I think supporting additional transports for remote kernels is a bit of a separate discussion, though. Adding the KernelNanny should make such a proposal easier to implement, since we would know that we always have a process running next to the kernel which could serve as that multiplexer. Part of the point of the nanny proposal is that it is transparent to both the kernel and the client.

The most basic functionality that I want from KernelNanny is the original idea in IPEP 12:

KernelClient gets the interrupt/restart/etc methods of KernelManager that can be expected to work for all kernels.

Since the kernel client API is all zmq, it makes the most sense to me for these to be zmq request/reply messages like all the rest, rather than adding additional HTTP requests to an otherwise zmq API. If we went with HTTP, these would presumably be regular HTTP requests, though, not websockets, so the bar is lower for clients to talk to them. It does seem internally inconsistent to have some zmq requests and some HTTP requests for the same API, though.

To me, starting remote kernels, which is solved by KG, is not part of the nanny proposal. But they do creep closer together when we make the nanny a singleton kernel-providing service rather than a simple manager of one kernel, since it has to gain start/list APIs in addition to the more basic stop/restart.

takluyver · 2017-02-17T21:15:28Z

I was roughly thinking that with local kernels, the server would integrate the nanny functionality (i.e. it would keep handling interrupts as it currently does), and the remote case would be handled by nb2kg (or a ZMQ equivalent if we wrote one). I suspect it would be no harder to write a KernelManager/KernelClient pair wrapping the HTTP/websockets interface than to write a ZMQ kernel gateway.

Maybe this points back to the nanny being per kernel, if we've already got KG as the multiple kernel manager.

I think I need to think more carefully about what situations we actually want remote kernels in, and how we manage them (e.g. putting kernels in docker lends itself to one kernel per container; does the nanny process need to be inside the container with the kernel? Or can it do what it needs from outside?).

minrk · 2017-02-21T10:37:31Z

I was roughly thinking that with local kernels, the server would integrate the nanny functionality (i.e. it would keep handling interrupts as it currently does)

I think that makes sense. The QtConsole and jupyter_console and all other entrypoints would also need to run the nanny or talk to an existing one.

putting kernels in docker lends itself to one kernel per container; does the nanny process need to be inside the container with the kernel? Or can it do what it needs from outside?

I think it can go either way. The simplest version is the nanny in the container with the kernel, where there's really no awareness that docker is involved. The more sophisticated version is a "DockerKernelNanny", akin to the DockerSpawner in JupyterHub, where 'raw' kernels are in containers and the nanny does all of its activities via the docker API.

thomasjm · 2017-10-24T09:50:43Z

Hi all -- sorry to comment on an old thread, but I'm hoping to get clarification about something...what is the current status of the deprecation of kernel heartbeats? There are some mentions of deprecating the heartbeat in this proposal, but the latest version of the messaging spec that I can find still contains them.

This is troubling me because I just noticed a kernel (gophernotes) which seems to have dropped heartbeat support in July 2017. I somehow missed the memo about this whole thing, so could someone let me know if heartbeats are dead and the nanny process is how things work now? Is there another source of documentation/truth besides http://jupyter-client.readthedocs.io ? Thanks!

takluyver · 2017-10-24T10:03:54Z

No problem!

The Jupyter notebook doesn't rely on heartbeats, and since that's the interface most people use, some kernels only worry about supporting it. We could make all interfaces work without the heartbeat so long as the kernel runs on the same machine (which it normally does). I don't remember which ones already work like this.

Getting rid of the heartbeat entirely was waiting on this proposal because it would have provided another way to tell if a remote kernel is still alive.

thomasjm · 2017-10-24T10:36:14Z

Got it, thanks! Is there some official way to keep up-to-date on the latest version of the messaging spec? Speaking as someone working on an alternate Jupyter frontend, dropping heartbeats seems like a major change that would merit a new major version number on the "Messaging in Jupyter" page...

takluyver · 2017-10-24T11:12:32Z

If we ever do officially drop them, that would certainly be a new protocol version. For now, kernels are theoretically expected to implement it, but in typical cases there's no need, so many don't. We should probably clarify that in the messaging doc.

thomasjm · 2017-10-25T01:56:50Z

Cool, thanks for the clarification. Tagging @dwhitena so he sees these comments.

dwhitena · 2017-10-25T13:58:09Z

Thanks @thomasjm. We will work on supporting this in gophernotes.

sushantd195 · 2022-06-18T04:13:44Z

a proposal for kernels to capture the low level stderr and stdout streams and forward them to the frontend.

This is still an issue as seen here. I'm on latest IRkernel and reticulate

cladosphaero · 2022-10-14T19:40:54Z

Hi @takluyver @jasongrout was this ever implemented? It looks like OS-level output is still suppressed by Jupyter, as evidenced by reticulate not displaying Python stdout in IRkernel. If not, do you have any suggestions for manually rerouting OS-level output to Jupyter frontend?

krassowski · 2023-08-17T15:52:08Z

There is an adjacent pre-proposal which addresses some of the same issues (but not IRkernel logging issue) over at #117.

Zsailer · 2024-03-04T16:21:06Z

Hi all 👋 —Zach here from the @jupyter/software-steering-council.

We're working through old JEPs and closing proposals that are no longer active or may not be relevant anymore. Under Jupyter's new governance model, we have an active Software Steering Council who reviews JEPs weekly. We are catching up on the backlog now. Since there has been no active discussion on this JEP in awhile, I'd propose we close it here (we'll leave it open for two more weeks in case you'd like to revive the conversation). If you would like to re-open the discussion after we close it, you are welcome to do that too.

Kernel nanny proposal

4eb56de

minrk reviewed Apr 19, 2016
View reviewed changes

jankatins mentioned this pull request Apr 19, 2016

Use a proper logging library IRkernel/IRkernel#119

Open

willingc reviewed Apr 19, 2016
View reviewed changes

rgbkrk reviewed Apr 19, 2016
View reviewed changes

Clarify wording

77d97ba

n-riesco reviewed Apr 21, 2016
View reviewed changes

minrk approved these changes Jan 12, 2017

View reviewed changes

takluyver mentioned this pull request Aug 10, 2017

Add 'jupyter kernel' command jupyter/jupyter_client#240

Merged

thomasjm referenced this pull request in gopherdata/gophernotes Oct 25, 2017

initial commit version 1

e779a2a

flying-sheep mentioned this pull request Sep 9, 2018

rstan sampling messages not displayed IRkernel/IRkernel#584

Closed

syu-id mentioned this pull request Sep 15, 2018

MCMC sampling messages not displayed in JupyterLab stan-dev/rstan#559

Open

flying-sheep mentioned this pull request Jan 16, 2019

R-Kernel: No output of cell that runs python(r-keras) IRkernel/IRkernel#609

Closed

jasongrout-db mentioned this pull request Apr 14, 2023

IPKernelApp configs capture_fd_output and quiet conflict and break ipykernel ipython/ipykernel#859

Closed

Zsailer closed this Mar 25, 2024

gabalafou mentioned this pull request May 16, 2024

SSC meeting minutes 2024 jupyter/software-steering-council-team-compass#22

Open

Kernel nanny proposal #14

Kernel nanny proposal #14

Conversation

takluyver commented Apr 18, 2016

jankatins commented Apr 18, 2016 • edited Loading

takluyver commented Apr 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

minrk commented Apr 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willingc commented Apr 19, 2016

Choose a reason for hiding this comment

takluyver commented Apr 20, 2016

willingc commented Apr 20, 2016

takluyver commented Apr 20, 2016

willingc commented Apr 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

n-riesco commented Apr 21, 2016

takluyver commented Apr 21, 2016

n-riesco commented Apr 21, 2016

lbustelo commented Apr 21, 2016

jasongrout commented Apr 21, 2016

jasongrout commented Apr 21, 2016

n-riesco commented Apr 21, 2016 • edited Loading

jasongrout commented Apr 21, 2016

takluyver commented Apr 22, 2016

n-riesco commented Apr 22, 2016

takluyver commented Dec 8, 2016

jasongrout commented Dec 9, 2016

lbustelo commented Dec 9, 2016

takluyver commented Feb 16, 2017

tristanz commented Feb 16, 2017

takluyver commented Feb 16, 2017

lbustelo commented Feb 17, 2017

parente commented Feb 17, 2017

minrk commented Feb 17, 2017

takluyver commented Feb 17, 2017

minrk commented Feb 21, 2017

thomasjm commented Oct 24, 2017

takluyver commented Oct 24, 2017

thomasjm commented Oct 24, 2017

takluyver commented Oct 24, 2017

thomasjm commented Oct 25, 2017

dwhitena commented Oct 25, 2017

sushantd195 commented Jun 18, 2022 • edited Loading

cladosphaero commented Oct 14, 2022 • edited Loading

krassowski commented Aug 17, 2023

Zsailer commented Mar 4, 2024

jankatins commented Apr 18, 2016 •

edited

Loading

n-riesco commented Apr 21, 2016 •

edited

Loading

sushantd195 commented Jun 18, 2022 •

edited

Loading

cladosphaero commented Oct 14, 2022 •

edited

Loading