From fa78c9768b03049f17d225a5d3077e8b7e5f24eb Mon Sep 17 00:00:00 2001 From: Marius van Niekerk Date: Fri, 4 Nov 2016 13:48:40 -0400 Subject: [PATCH] Migrated some old spark-kernel docs to Toree. Mostly just renames --- _data/documentation.yml | 5 +- .../{user => advanced}/advanced-topics.md | 3 +- documentation/advanced/comm-api.md | 199 ++++++++++++++++++ .../developing-magics.md | 34 +-- .../advanced/sharing-spark-context.md | 63 ++++++ .../developer/development-workflow.md | 122 +++++++++++ .../test-structure-of-project.md | 4 +- .../old/quick-start/development-workflow.md | 10 +- .../overview-of-magics.md | 8 +- 9 files changed, 418 insertions(+), 30 deletions(-) rename documentation/{user => advanced}/advanced-topics.md (66%) create mode 100644 documentation/advanced/comm-api.md rename documentation/{old/quick-start => advanced}/developing-magics.md (83%) create mode 100644 documentation/advanced/sharing-spark-context.md create mode 100644 documentation/developer/development-workflow.md rename documentation/{old/quick-start => developer}/test-structure-of-project.md (97%) rename documentation/{old/quick-start => user}/overview-of-magics.md (93%) diff --git a/_data/documentation.yml b/_data/documentation.yml index fc17a44..39ca76c 100644 --- a/_data/documentation.yml +++ b/_data/documentation.yml @@ -18,6 +18,10 @@ section_id: "user" section_url: "/documentation/user/quick-start" +- section_name: "User -- Advanced Topics" + section_id: "advanced" + section_url: "/documentation/advanced/advanced-topics" + - section_name: "Developer" section_id: "developer" section_url: "/documentation/developer/contributing-to-the-project" @@ -25,4 +29,3 @@ - section_name: "References" section_id: "references" section_url: "/documentation/references/scaladocs" - diff --git a/documentation/user/advanced-topics.md b/documentation/advanced/advanced-topics.md similarity index 66% rename from documentation/user/advanced-topics.md rename to documentation/advanced/advanced-topics.md index 5fcd2f4..b5b59ef 100644 --- a/documentation/user/advanced-topics.md +++ b/documentation/advanced/advanced-topics.md @@ -2,7 +2,7 @@ layout: docpage title: Advanced Topics type: doc -section: user +section: advanced weight: 60 tagline: Apache Project ! --- @@ -11,3 +11,4 @@ tagline: Apache Project ! - Comm API +{% include_relative sharing-spark-context.md %} diff --git a/documentation/advanced/comm-api.md b/documentation/advanced/comm-api.md new file mode 100644 index 0000000..e504dd4 --- /dev/null +++ b/documentation/advanced/comm-api.md @@ -0,0 +1,199 @@ +--- +layout: docpage +title: Comm Api +type: doc +section: advanced +weight: 60 +tagline: Apache Project ! +--- + +The Comm API exposed by the Toree Kernel Client and Toree Kernel serves to +provide a clean method of communication between the Toree Kernel and its +clients. + +The API provides the ability to create and send custom messages with the +focus on synchronizing data between a kernel and its clients, although that +use case is not enforced. + +Access to the Comm API is made available for the client via +`.comm` and for the kernel via `kernel.comm`. + +Example of Registration and Communication +----------------------------------------- + +The following example demonstrates the _client_ connecting to the _kernel_, +receiving a response, and then closing it's connection. + +This is an example of registering an open callback on the _kernel_ side: + + // Register the callback to respond to being opened from the client + kernel.comm.register("my target").addOpenHandler { + (commWriter, commId, targetName, data) => + commWriter.writeMsg(Map("response" -> "Hello World!")) + } + +This is the corresponding example of registering a message receiver on the +_client_ and initiating the Comm connection via _open_: + + val client: SparkKernelClient = /* Created elsewhere */ + + // Register the callback to receive a message from the kernel, print it + // out, and then close the connection + client.comm.register("my target").addMsgHandler { + (commWriter, commId, data) => + println(data("response")) + commWriter.close() + } + + // Initiate the Comm connection + client.comm.open("my target") + +Comm Events +----------- + +The Comm API provides three types of events that can be captured: + +1. Open + + - Triggered when the client/kernel receives an open request for a target + that has been registered + +2. Msg + + - Triggered when the client/kernel receives a Comm message for an open + Comm instance + +3. Close + + - Triggered when the client/kernel receives a close request for an open + Comm instance + +### Registering Callbacks ### + +To register callbacks that are triggered during these events, the following +function is provided: + + register() + +This function, when invoked, registers the provided target on the +client/kernel, but does not add any callbacks. To add functions to be called +during events, you can chain methods onto the register function. + +#### Adding Open Callbacks #### + +To add an open callback, use the `addOpenHandler()` method: + + register().addOpenHandler() + +The function is given the following four arguments: + +- CommWriter + + - The instance of the Comm-based writer that can send messages back + +- CommId + + - The id associated with the new Comm instance + +- TargetName + + - The name of the Comm that is created + +- Data (_Optional_) + + - The map of key/value pairs representing data associated with the new + Comm instance + +#### Adding Message Callbacks #### + +To add a message callback, use the `addMsgHandler()` method: + + register().addMsgHandler() + +The function is given the following three arguments: + +- CommWriter + + - The instance of the Comm-based writer that can send messages back + +- CommId + + - The id associated with the Comm instance + +- Data + + - The map of key/value pairs representing data associated with the + received message + +#### Adding Close Callbacks #### + +To add a close callback, use the `addCloseHandler()` method: + + register().addCloseHandler() + +The function is given the following three arguments: + +- CommWriter + + - Unused as the Comm instance associated with the writer has been closed + +- CommId + + - The id associated with the Comm instance that was closed + +- Data + + - The map of key/value pairs representing data associated with the + received message + +Comm Messaging +-------------- + +The Comm API exposes an _open_ method that initiates a new Comm instance on +both sides of the connection: + + `open()` + +This returns an instance of _CommWriter_ that can be used to send data via +the Comm protocol. + +The kernel would initiate the connection via `kernel.comm.open()` +while the client would start via `.comm.open()`. + +As per the IPython protocol definition, the Comm instance can be opened from +either side. + +### Using the Comm Writer ### + +The Comm API provides an implementation of [java.io.Writer][1] that is used to +send _open_, _msg_, and _close_ Comm messages to the client or kernel (client +to kernel or vice versa). + +The following methods are available with _CommWriter_ implementations: + +1. `writeOpen( [, data])` + + - Sends an open request with the given target name and optional map of data + +2. `writeMsg()` + + - Sends the map of data as a Comm message + +3. `write(, , )` + + - Sends the character array as a Comm message (in the same form as a + _Writer's_ write(...) method) with the key for the data as "message" + + - E.g. `commWriter.write(, 0, )` translates to + + Data("message": "") + +3. `writeClose([data])` + + - Sends a close request with the optional map of data + +4. `close()` + + - Sends a close request with no data + +[1]: http://docs.oracle.com/javase/7/docs/api/java/io/Writer.html diff --git a/documentation/old/quick-start/developing-magics.md b/documentation/advanced/developing-magics.md similarity index 83% rename from documentation/old/quick-start/developing-magics.md rename to documentation/advanced/developing-magics.md index b67b0ff..11212e0 100644 --- a/documentation/old/quick-start/developing-magics.md +++ b/documentation/advanced/developing-magics.md @@ -7,32 +7,32 @@ weight: 0 tagline: Apache Project ! --- -The Spark Kernel provides a pluggable interface for magics that allows -developers to write their own magics. This guide will focus on the technical details of implementing your own magics; for an introduction and conceptual overview of magics, see [Overview of Magics for the Spark Kernel](https://github.com/ibm-et/spark-kernel/wiki/Overview-of-Magics-for-the-Spark-Kernel). +Apache Toree provides a pluggable interface for magics that allows +developers to write their own magics. This guide will focus on the technical details of implementing your own magics; for an introduction and conceptual overview of magics, see [Overview of Magics for Toree](overview-of-magics). In this guide we'll look at the dependencies required to develop a magic, walk through creating a line magic and a cell magic, and discuss some useful magic features. ### Dependencies ### -In order to write a magic, you need to add the _kernel-api_ and _protocol_ -modules of the Spark Kernel to your project. +In order to write a magic, you need to add the _kernel-api_ and _protocol_ +modules of Apache Toree to your project. In _sbt_, you can add the following lines: libraryDependencies ++= Seq( - "com.ibm.spark" %% "kernel-api" % "0.1.1-SNAPSHOT", - "com.ibm.spark" %% "protocol" % "0.1.1-SNAPSHOT" + "org.apache.toree" %% "kernel-api" % "0.1.0, + "org.apache.toree" %% "protocol" % "0.1.0" ) As the modules are not hosted on any repository, you will also need to build -and publish them locally. From the root of the Spark Kernel, you can execute -the following to compile and make available the Spark Kernel modules: +and publish them locally. From the root of Apache Toree, you can execute +the following to compile and make available the Apache Toree modules: sbt compile && sbt publishLocal ## Developing Magics -A magic is implemented by extending either the ```LineMagic``` or ```CellMagic``` trait provided by the Spark Kernel. Each trait consists of a single function, ```execute```, that defines the magic's functionality. +A magic is implemented by extending either the ```LineMagic``` or ```CellMagic``` trait provided by Apache Toree. Each trait consists of a single function, ```execute```, that defines the magic's functionality. ### Developing a Line Magic ### @@ -61,7 +61,7 @@ kernel.magics.helloLineMagic("foo bar") Behind the scenes, the ```execute``` method of ```HelloLineMagic``` gets called with ```"foo bar"``` as input. ### Developing a Cell Magic ### -A cell magic receives an entire cell of code as input and returns a mapping of MIME types to data. This mapping, defined by the type ```CellMagicOutput```, can be used to distinguish different data types produced by the magic. In an IPython setting, the ```CellMagicOutput``` mapping will influence the way a cell is rendered. +A cell magic receives an entire cell of code as input and returns a mapping of MIME types to data. This mapping, defined by the type ```CellMagicOutput```, can be used to distinguish different data types produced by the magic. In an IPython setting, the ```CellMagicOutput``` mapping will influence the way a cell is rendered. #### An HTML Cell Magic ### As a concrete example, we'll develop an ```HTML``` cell magic that causes a cell to render its contents as HTML. @@ -70,8 +70,8 @@ To create a cell magic, we extend the `CellMagic` trait, and override its `execu ```scala class Html extends CellMagic { - override def execute(code: String): CellMagicOutput = { - // TODO + override def execute(code: String): CellMagicOutput = { + // TODO } } ``` @@ -80,7 +80,7 @@ In this case, we want to package the code that the magic receives as HTML. To do ```scala class Html extends CellMagic { - override def execute(code: String): CellMagicOutput = { + override def execute(code: String): CellMagicOutput = { CellMagicOutput(MIMEType.TextHtml -> code) } } @@ -119,7 +119,7 @@ class HelloParsing extends LineMagic with ArgumentParsingSupport { .withOptionalArg() .ofType(classOf[Boolean]) .defaultsTo(true) - + override def execute(code: String): LineMagicOutput = { val args = parseArgs(code) if (args(0)) // do something @@ -171,8 +171,8 @@ should become } ``` -### Adding an external magic to the Spark Kernel ### -In order to use an external magic we first need a `.jar` containing a magic in the `com.ibm.spark.magic.builtin` package. Assuming we have such a `.jar` at location `/src/path/to/my/exampleMagic.jar` the `kernel.json` file needs to be changed to supply the path to the external magic. The command-line argument we need to add is `--magic-url` which takes a string: +### Adding an external magic to Apache Toree ### +In order to use an external magic we first need a `.jar` containing a magic in the `org.apache.toree.magic.builtin` package. Assuming we have such a `.jar` at location `/src/path/to/my/exampleMagic.jar` the `kernel.json` file needs to be changed to supply the path to the external magic. The command-line argument we need to add is `--magic-url` which takes a string: ```json { @@ -197,4 +197,4 @@ For some example implementations, check out the ```com.ibm.spark.magic.builtin`` ### Other Notes ### -There is a limitation with the current magic implementation that will force magic invocations to be case sensitive unless defined in the package _com.ibm.spark.magic.builtin_. +There is a limitation with the current magic implementation that will force magic invocations to be case sensitive unless defined in the package _org.apache.toree.magic.builtin_. diff --git a/documentation/advanced/sharing-spark-context.md b/documentation/advanced/sharing-spark-context.md new file mode 100644 index 0000000..90946b1 --- /dev/null +++ b/documentation/advanced/sharing-spark-context.md @@ -0,0 +1,63 @@ +--- +layout: docpage +title: Sharing spark contexts +type: doc +section: advanced +weight: 60 +tagline: Apache Project ! +--- + + +# Sharing a spark context between from Toree with ipykernel + +## Rationale + +Sometimes you want to be able to connect to the spark context used by an +Apache Toree kernel and do some plotting in matplotlib making use of some of the +features exposed by `ipykernel`. + +This is luckily relatively easy to do for python. + +## Example + +For this you will need to run two notebooks. One with a Toree kernel and the +other running ipykernel + +### In the toree kernel + +```scala + +import py4j.GatewayServer +// Creates a Py4J Gateway server with our spark context as the entry point +val gatewayServer: GatewayServer = new GatewayServer(kernel.javaSparkContext, 0) +gatewayServer.start() + +// Get the port assignment for the py4j gateway server +val boundPort: Int = gatewayServer.getListeningPort +boundPort +``` + +### In an ipykernel kernel + +Assuming that you can import pyspark in your py4j kernel. +For easier ways to do that see `findspark` or `spylon`. + +If for example we bound the py4j gateway to `45678`. + +```python +import os +from pyspark.java_gateway import launch_gateway +from pyspark import SparkContext + +# Set the port that spark looks for. +os.environ["PYSPARK_GATEWAY_PORT"] = "45678" + +jvm_gateway = launch_gateway() +jsc = jvm_gateway.entry_point + +# Ensure that we bind the gateway we created to the current spark context +sc = SparkContext(jsc=jsc, gateway=jvm_gateway) +``` + +Once you've run this you should have a shared spark context available on both +kernels. diff --git a/documentation/developer/development-workflow.md b/documentation/developer/development-workflow.md new file mode 100644 index 0000000..9b0fd65 --- /dev/null +++ b/documentation/developer/development-workflow.md @@ -0,0 +1,122 @@ +--- +layout: docpage +title: Development Workflow +type: doc +section: developer +weight: 0 +tagline: Apache Project ! +--- + +While it is not necessary to follow this guide for development, it is being +documented to encourage some form of standard practice for this project. + +### Tooling ### + +Most of the developers for Apache Toree thus far have chosen to use +_IntelliJ_ as their means of development. Because of this, a plugin for _sbt_ +is included in our project to allow easy construction of an IntelliJ project +that contains all of the modules. + +Obviously, _git_ is used as the source control for the project. + +Finally, we use _sbt_ for our build and test runner. You can find more +information about compiling/testing in the main RAEDME. + +### Building IntelliJ Project ### + +To build the IntelliJ project using _sbt_, you can trigger the plugin by +executing the following from the root of the project: + + sbt gen-idea + +This should create *.idea/* and *.idea\_modules/* directories. + +From there, you should be able to open (not import) the project using IntelliJ. + +### Using Branches for Development ### + +When we tackle defects or features in Apache Toree, we typically break the +problems up into the smallest pieces possible. Once we have something simple +like "I need the kernel to print out hello world when it starts," we create a +branch from our development branch (in the case of this project, it is +typically master). For this example, let's call the branch +"AddHelloWorldDuringBoot" and use it for our feature. + +Once development has finished, it is good practice to ensure that all tests +are still passing. To do this, run `sbt test` from the root of project. + +If everything passes, we want to ensure that our branch is up-to-date with the +latest code in the kernel. So, move back to the development branch (master in +our case) and pull the latest changes. If there are changes, we want to rebase +our branch on top of those new changes. From the _AddHelloWorldDuringBoot_ +branch, run `git rebase master` to bring the branch up to speed with master. + +The advantage of using rebase on a _local_ branch is that it makes merging back +with _master_ much cleaner for the maintainers. If your branch has been pushed +remotely, you want to avoid rebasing in case someone else has branched off of +your branch. Tricky stuff! + +After rebasing on top of master, it is a good idea to rerun the tests for your +branch to ensure that nothing has broken from the changes: `sbt test` + +Finally, if the tests pass, switch back to the development branch (master) and +merge the changes: `git merge AddHelloWorldDuringBoot`. As a last check, +rerun the tests to ensure that the merge went well (`sbt test` in master). If +those tests still pass, the changes can be pushed! + +### Writing proper unit tests ### + +The goal of our unit tests was to be isolated. This means that absolutely _no_ +external logic is needed to run the tests. This includes fixtures and any +possible dependencies referenced in the code. We use _Mockito_ to provide +mocking facilities for our dependencies and try our best to isolate dependency +creation. + + class MyClass { + // Bad design + val someDependency = new SomeDependency() + + // ... + } + +instead move it to the constructor + + class MyClass(someDependency: SomeDependency) { + // ... + } + +or use traits to mix in dependencies + + trait MyDependency { + val someDependency = new SomeDependency() + } + + class MyClass extends MyDependency { + + } + +For testing, we use _ScalaTest_ with the _FunSpec_ to provide the basic +structure of our tests (in a BDD manner). Typically, _Matchers_ from +_ScalaTest_ are also included to provide a better flow. + + class MyClassSpec extends FunSpec with Matchers { + describe("MyClass") { + describe("#someMethod") { + it("should indicate success by default") { + val myClass = new MyClass(new SomeDependency()) + val expected = true + val actual = myClass.someMethod() + + actual should be (expected) + } + } + } + } + +The above structure is to use a _describe_ block to represent the name of the +class being tested. We nest a second layer of _describe_ blocks to indicate +tests for individual public methods. Finally, _it_ blocks are structured to +test single cases (such as different logical routes to be encountered). + +We have attempted to keep the majority of our tests clear and concise. +Typically, we avoid helper functions because they can obfuscate the tests. diff --git a/documentation/old/quick-start/test-structure-of-project.md b/documentation/developer/test-structure-of-project.md similarity index 97% rename from documentation/old/quick-start/test-structure-of-project.md rename to documentation/developer/test-structure-of-project.md index 66a7c0f..0d0bd47 100644 --- a/documentation/old/quick-start/test-structure-of-project.md +++ b/documentation/developer/test-structure-of-project.md @@ -2,12 +2,12 @@ layout: docpage title: Test Structure of Project type: doc -section: quick-start +section: developer weight: 0 tagline: Apache Project ! --- -### Prerequisites +### Prerequisites You must install the [library dependencies][1] to properly run the tests. diff --git a/documentation/old/quick-start/development-workflow.md b/documentation/old/quick-start/development-workflow.md index c68dbfd..32ac7da 100644 --- a/documentation/old/quick-start/development-workflow.md +++ b/documentation/old/quick-start/development-workflow.md @@ -12,7 +12,7 @@ documented to encourage some form of standard practice for this project. ### Tooling ### -Most of the developers for the Spark Kernel thus far have chosen to use +Most of the developers for Apache Toree thus far have chosen to use _IntelliJ_ as their means of development. Because of this, a plugin for _sbt_ is included in our project to allow easy construction of an IntelliJ project that contains all of the modules. @@ -25,7 +25,7 @@ information about compiling/testing in the main RAEDME. ### Building IntelliJ Project ### To build the IntelliJ project using _sbt_, you can trigger the plugin by -executing the following from the root of the Spark Kernel project: +executing the following from the root of the Apache Toree project: sbt gen-idea @@ -35,15 +35,15 @@ From there, you should be able to open (not import) the project using IntelliJ. ### Using Branches for Development ### -When we tackle defects or features in the Spark Kernel, we typically break the +When we tackle defects or features in Apache Toree, we typically break the problems up into the smallest pieces possible. Once we have something simple like "I need the kernel to print out hello world when it starts," we create a branch from our development branch (in the case of this project, it is -typically master). For this example, let's call the branch +typically master). For this example, let's call the branch "AddHelloWorldDuringBoot" and use it for our feature. Once development has finished, it is good practice to ensure that all tests -are still passing. To do this, run `sbt test` from the root of the Spark Kernel +are still passing. To do this, run `sbt test` from the root of the Apache Toree project. If everything passes, we want to ensure that our branch is up-to-date with the diff --git a/documentation/old/quick-start/overview-of-magics.md b/documentation/user/overview-of-magics.md similarity index 93% rename from documentation/old/quick-start/overview-of-magics.md rename to documentation/user/overview-of-magics.md index 15fece0..34ca054 100644 --- a/documentation/old/quick-start/overview-of-magics.md +++ b/documentation/user/overview-of-magics.md @@ -3,7 +3,7 @@ layout: docpage title: Overview of Magics type: doc section: quick-start -weight: 0 +weight: 30 tagline: Apache Project ! --- @@ -32,13 +32,13 @@ import com.google.common.base.Strings._ ``` ### Other Things to Note -- Magic names are case insensitive; if a line magic `AddJar` exists, then `%addjar`, `%ADDJar`, and all other variants are valid. +- Magic names are case insensitive; if a line magic `AddJar` exists, then `%addjar`, `%ADDJar`, and all other variants are valid. - Each magic has its own arguments; usage information can be obtained for a magic by typing `%`. - Line magics receive the _literal_ rest of the line as arguments, so the following string interpolation will not work: ```scala -for(i <- (1 to 10)) +for(i <- (1 to 10)) %addDeps s"com.google.guava guava $i" ``` @@ -57,7 +57,7 @@ As an example, the `%%HTML` cell magic renders the contents of the cell as HTML: # Programmatic Magic Usage ### Description -There exists a programmatic API for those who do not wish to use the IPython-esque `%` and `%%` syntax. The Spark Kernel exposes a `kernel` object which provides programmatic invocation of magic code in the form: +There exists a programmatic API for those who do not wish to use the IPython-esque `%` and `%%` syntax. Apache Toree exposes a `kernel` object which provides programmatic invocation of magic code in the form: ```scala //magicName is case insensitive kernel.magics.("")