Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated some old spark-kernel docs to Toree. #4

Open
wants to merge 1 commit into
base: OverhaulSite
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion _data/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@
section_id: "user"
section_url: "/documentation/user/quick-start"

- section_name: "User -- Advanced Topics"
section_id: "advanced"
section_url: "/documentation/advanced/advanced-topics"

- section_name: "Developer"
section_id: "developer"
section_url: "/documentation/developer/contributing-to-the-project"

- section_name: "References"
section_id: "references"
section_url: "/documentation/references/scaladocs"

Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: docpage
title: Advanced Topics
type: doc
section: user
section: advanced
weight: 60
tagline: Apache Project !
---
Expand All @@ -11,3 +11,4 @@ tagline: Apache Project !

- Comm API

{% include_relative sharing-spark-context.md %}
199 changes: 199 additions & 0 deletions documentation/advanced/comm-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
layout: docpage
title: Comm Api
type: doc
section: advanced
weight: 60
tagline: Apache Project !
---

The Comm API exposed by the Toree Kernel Client and Toree Kernel serves to
provide a clean method of communication between the Toree Kernel and its
clients.

The API provides the ability to create and send custom messages with the
focus on synchronizing data between a kernel and its clients, although that
use case is not enforced.

Access to the Comm API is made available for the client via
`<client_instance>.comm` and for the kernel via `kernel.comm`.

Example of Registration and Communication
-----------------------------------------

The following example demonstrates the _client_ connecting to the _kernel_,
receiving a response, and then closing it's connection.

This is an example of registering an open callback on the _kernel_ side:

// Register the callback to respond to being opened from the client
kernel.comm.register("my target").addOpenHandler {
(commWriter, commId, targetName, data) =>
commWriter.writeMsg(Map("response" -> "Hello World!"))
}

This is the corresponding example of registering a message receiver on the
_client_ and initiating the Comm connection via _open_:

val client: SparkKernelClient = /* Created elsewhere */

// Register the callback to receive a message from the kernel, print it
// out, and then close the connection
client.comm.register("my target").addMsgHandler {
(commWriter, commId, data) =>
println(data("response"))
commWriter.close()
}

// Initiate the Comm connection
client.comm.open("my target")

Comm Events
-----------

The Comm API provides three types of events that can be captured:

1. Open

- Triggered when the client/kernel receives an open request for a target
that has been registered

2. Msg

- Triggered when the client/kernel receives a Comm message for an open
Comm instance

3. Close

- Triggered when the client/kernel receives a close request for an open
Comm instance

### Registering Callbacks ###

To register callbacks that are triggered during these events, the following
function is provided:

register(<target name>)

This function, when invoked, registers the provided target on the
client/kernel, but does not add any callbacks. To add functions to be called
during events, you can chain methods onto the register function.

#### Adding Open Callbacks ####

To add an open callback, use the `addOpenHandler(<function>)` method:

register(<target name>).addOpenHandler(<function>)

The function is given the following four arguments:

- CommWriter

- The instance of the Comm-based writer that can send messages back

- CommId

- The id associated with the new Comm instance

- TargetName

- The name of the Comm that is created

- Data (_Optional_)

- The map of key/value pairs representing data associated with the new
Comm instance

#### Adding Message Callbacks ####

To add a message callback, use the `addMsgHandler(<function>)` method:

register(<target name>).addMsgHandler(<function>)

The function is given the following three arguments:

- CommWriter

- The instance of the Comm-based writer that can send messages back

- CommId

- The id associated with the Comm instance

- Data

- The map of key/value pairs representing data associated with the
received message

#### Adding Close Callbacks ####

To add a close callback, use the `addCloseHandler(<function>)` method:

register(<target name>).addCloseHandler(<function>)

The function is given the following three arguments:

- CommWriter

- Unused as the Comm instance associated with the writer has been closed

- CommId

- The id associated with the Comm instance that was closed

- Data

- The map of key/value pairs representing data associated with the
received message

Comm Messaging
--------------

The Comm API exposes an _open_ method that initiates a new Comm instance on
both sides of the connection:

`open(<target name>)`

This returns an instance of _CommWriter_ that can be used to send data via
the Comm protocol.

The kernel would initiate the connection via `kernel.comm.open(<target name>)`
while the client would start via `<client instance>.comm.open(<target name>)`.

As per the IPython protocol definition, the Comm instance can be opened from
either side.

### Using the Comm Writer ###

The Comm API provides an implementation of [java.io.Writer][1] that is used to
send _open_, _msg_, and _close_ Comm messages to the client or kernel (client
to kernel or vice versa).

The following methods are available with _CommWriter_ implementations:

1. `writeOpen(<target name> [, data])`

- Sends an open request with the given target name and optional map of data

2. `writeMsg(<data>)`

- Sends the map of data as a Comm message

3. `write(<character array>, <offset>, <length>)`

- Sends the character array as a Comm message (in the same form as a
_Writer's_ write(...) method) with the key for the data as "message"

- E.g. `commWriter.write(<array>, 0, <array length>)` translates to

Data("message": "<array>")

3. `writeClose([data])`

- Sends a close request with the optional map of data

4. `close()`

- Sends a close request with no data

[1]: http://docs.oracle.com/javase/7/docs/api/java/io/Writer.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,32 @@ weight: 0
tagline: Apache Project !
---

The Spark Kernel provides a pluggable interface for magics that allows
developers to write their own magics. This guide will focus on the technical details of implementing your own magics; for an introduction and conceptual overview of magics, see [Overview of Magics for the Spark Kernel](https://github.com/ibm-et/spark-kernel/wiki/Overview-of-Magics-for-the-Spark-Kernel).
Apache Toree provides a pluggable interface for magics that allows
developers to write their own magics. This guide will focus on the technical details of implementing your own magics; for an introduction and conceptual overview of magics, see [Overview of Magics for Toree](overview-of-magics).

In this guide we'll look at the dependencies required to develop a magic, walk through creating a line magic and a cell magic, and discuss some useful magic features.

### Dependencies ###

In order to write a magic, you need to add the _kernel-api_ and _protocol_
modules of the Spark Kernel to your project.
In order to write a magic, you need to add the _kernel-api_ and _protocol_
modules of Apache Toree to your project.

In _sbt_, you can add the following lines:

libraryDependencies ++= Seq(
"com.ibm.spark" %% "kernel-api" % "0.1.1-SNAPSHOT",
"com.ibm.spark" %% "protocol" % "0.1.1-SNAPSHOT"
"org.apache.toree" %% "kernel-api" % "0.1.0,
"org.apache.toree" %% "protocol" % "0.1.0"
)

As the modules are not hosted on any repository, you will also need to build
and publish them locally. From the root of the Spark Kernel, you can execute
the following to compile and make available the Spark Kernel modules:
and publish them locally. From the root of Apache Toree, you can execute
the following to compile and make available the Apache Toree modules:

sbt compile && sbt publishLocal

## Developing Magics

A magic is implemented by extending either the ```LineMagic``` or ```CellMagic``` trait provided by the Spark Kernel. Each trait consists of a single function, ```execute```, that defines the magic's functionality.
A magic is implemented by extending either the ```LineMagic``` or ```CellMagic``` trait provided by Apache Toree. Each trait consists of a single function, ```execute```, that defines the magic's functionality.

### Developing a Line Magic ###

Expand Down Expand Up @@ -61,7 +61,7 @@ kernel.magics.helloLineMagic("foo bar")
Behind the scenes, the ```execute``` method of ```HelloLineMagic``` gets called with ```"foo bar"``` as input.

### Developing a Cell Magic ###
A cell magic receives an entire cell of code as input and returns a mapping of MIME types to data. This mapping, defined by the type ```CellMagicOutput```, can be used to distinguish different data types produced by the magic. In an IPython setting, the ```CellMagicOutput``` mapping will influence the way a cell is rendered.
A cell magic receives an entire cell of code as input and returns a mapping of MIME types to data. This mapping, defined by the type ```CellMagicOutput```, can be used to distinguish different data types produced by the magic. In an IPython setting, the ```CellMagicOutput``` mapping will influence the way a cell is rendered.

#### An HTML Cell Magic ###
As a concrete example, we'll develop an ```HTML``` cell magic that causes a cell to render its contents as HTML.
Expand All @@ -70,8 +70,8 @@ To create a cell magic, we extend the `CellMagic` trait, and override its `execu

```scala
class Html extends CellMagic {
override def execute(code: String): CellMagicOutput = {
// TODO
override def execute(code: String): CellMagicOutput = {
// TODO
}
}
```
Expand All @@ -80,7 +80,7 @@ In this case, we want to package the code that the magic receives as HTML. To do

```scala
class Html extends CellMagic {
override def execute(code: String): CellMagicOutput = {
override def execute(code: String): CellMagicOutput = {
CellMagicOutput(MIMEType.TextHtml -> code)
}
}
Expand Down Expand Up @@ -119,7 +119,7 @@ class HelloParsing extends LineMagic with ArgumentParsingSupport {
.withOptionalArg()
.ofType(classOf[Boolean])
.defaultsTo(true)

override def execute(code: String): LineMagicOutput = {
val args = parseArgs(code)
if (args(0)) // do something
Expand Down Expand Up @@ -171,8 +171,8 @@ should become
}
```

### Adding an external magic to the Spark Kernel ###
In order to use an external magic we first need a `.jar` containing a magic in the `com.ibm.spark.magic.builtin` package. Assuming we have such a `.jar` at location `/src/path/to/my/exampleMagic.jar` the `kernel.json` file needs to be changed to supply the path to the external magic. The command-line argument we need to add is `--magic-url` which takes a string:
### Adding an external magic to Apache Toree ###
In order to use an external magic we first need a `.jar` containing a magic in the `org.apache.toree.magic.builtin` package. Assuming we have such a `.jar` at location `/src/path/to/my/exampleMagic.jar` the `kernel.json` file needs to be changed to supply the path to the external magic. The command-line argument we need to add is `--magic-url` which takes a string:

```json
{
Expand All @@ -197,4 +197,4 @@ For some example implementations, check out the ```com.ibm.spark.magic.builtin``

### Other Notes ###

There is a limitation with the current magic implementation that will force magic invocations to be case sensitive unless defined in the package _com.ibm.spark.magic.builtin_.
There is a limitation with the current magic implementation that will force magic invocations to be case sensitive unless defined in the package _org.apache.toree.magic.builtin_.
63 changes: 63 additions & 0 deletions documentation/advanced/sharing-spark-context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
layout: docpage
title: Sharing spark contexts
type: doc
section: advanced
weight: 60
tagline: Apache Project !
---


# Sharing a spark context between from Toree with ipykernel

## Rationale

Sometimes you want to be able to connect to the spark context used by an
Apache Toree kernel and do some plotting in matplotlib making use of some of the
features exposed by `ipykernel`.

This is luckily relatively easy to do for python.

## Example

For this you will need to run two notebooks. One with a Toree kernel and the
other running ipykernel

### In the toree kernel

```scala

import py4j.GatewayServer
// Creates a Py4J Gateway server with our spark context as the entry point
val gatewayServer: GatewayServer = new GatewayServer(kernel.javaSparkContext, 0)
gatewayServer.start()

// Get the port assignment for the py4j gateway server
val boundPort: Int = gatewayServer.getListeningPort
boundPort
```

### In an ipykernel kernel

Assuming that you can import pyspark in your py4j kernel.
For easier ways to do that see `findspark` or `spylon`.

If for example we bound the py4j gateway to `45678`.

```python
import os
from pyspark.java_gateway import launch_gateway
from pyspark import SparkContext

# Set the port that spark looks for.
os.environ["PYSPARK_GATEWAY_PORT"] = "45678"

jvm_gateway = launch_gateway()
jsc = jvm_gateway.entry_point

# Ensure that we bind the gateway we created to the current spark context
sc = SparkContext(jsc=jsc, gateway=jvm_gateway)
```

Once you've run this you should have a shared spark context available on both
kernels.
Loading