Cocoon is designed around task automation, but offers instant feedback through interactive visualisations, making it especially attractive for tasks involving large datasets.
Though tasks are defined declaratively using YAML, Cocoon comes with a feature-rich, browser-based editor that lets users build complex automation workflows using direct manipulation.
Design goals for Cocoon are:
- Interactive: Exploring and working with large datasets should be a fun experience, through rich visualisations and instant feedback.
- Extensible: Cocoon leverages the npm ecosystem for creating and importing task nodes and visualisations.
- Modern: Using JS/TypeScript and React means that most web developers will be right at home when extending Cocoon's functionality.
- Fast: Cocoon's editor uses a dedicated Node.js instance for processing (that can even be run remotely), to ensure that the UI is always responsive.
Here's a visual rundown of Cocoon's main functionality:
Each data processing operation in Cocoon happens in a node, which is visually represented as a graph in the editor.
The graph can be created with simple direct manipulation technqiues, like drag & drop, right in the browser editor.
The data at each node can be inspected in the browser's developer console.
Visualisations can be attached to nodes in order to facilitate in-depth exploration of the data at any step in the process.
Visualisations are fully interactive and can interact with the node's state, allowing visual definitions of complex filter criteria.
By attaching visualisations to connected nodes, Cocoon automatically synchronises them, creating a powerful brushing & linking environment.
Cocoon's biggest emphasis is on extensibility. Custom nodes are simple Javascript objects wrapping a function. Code changes reflect immediately.
Coming soon.
Coming soon.
Interested in giving Cocoon a try yourself? While we're not ready to fully open source Cocoon quite yet, there is a free distribution version hosted on npm.
Follow these instructions to run the examples in this repository, or to build your own workflow:
-
Make sure to have a recent version of Node.js installed.
-
Install the dependencies by running
npm install
oryarn
. -
Run any of the examples. To learn the basics, it is recommended to start with:
npm run example:simple-api
If you want to create a new workflow, simply create an empty .yml
file and point your browser to it.
While there's no step-by-step tutorial for Cocoon, the examples are generally filled with documentation that try and explain various concepts and are all aimed at beginners. They can technically be studied in any order, but some of the basisc are only explained in the simpler examples, to avoid repetition. The recommended order is:
-
Teaches the basics of creating a custom dataflow by querying an API, along with re-shaping, inspecting and visualising the data.
-
Shows how custom nodes and views can be implemented in Cocoon using Javascript and React.
-
By linking different visualisations on the same data together, brushing becomes a powerful data exploration tool.
A reference documentation for nodes and views can be found here.
Cocoon was initially developed for internal purposes only. Even when working with a small team of data scientists, data processing scripts are often hard to read and even more difficult to maintain. For many projects, one ends up having to make sense of a clutter of Python and Bash scripts, Excel sheets and Databases on various servers.
The purpose of Cocoon isn't to replace any of these tools, but rather to unify them into a self-documenting way. Adopting Cocoon shouldn't mean migrating your existing scripts and resources, but rather automating their usage while, at the same time, documenting them and making them more accessible to new developers.
But Cocoon is not the first flow-based data processing environment, of course. So you should make sure that the following more mature tools don't fit your needs better:
Flow-based, built with Node.js, using JSON. Node-RED has a strong focus on interacting with APIs and IOT devices. Unlike Cocoon it supports real-time streaming of data, but doesn't have any integrated data mining/visualisation capabilities. While somewhat similar from a technical perspective, the project's aim and direction is very different.
Cocoon is heavily inspired by KNIME. It is a flow-based Data Mining tool with a huge community and an impressive collection of extensions and integrations. If KNIME's extensions fit your bill, it is almost certainly the better choice. Cocoon was mainly born out of frustration with KNIME's lack of extensibility, dated UI and lackluster UX.
What makes Luna special is that it is a functional language that has a visual mapping. If the prospect of writing Haskell-like code that can also be represented and edited in a visual way excites you, have a look at this impressive project. (If you're more of an OO kind of person, check out Julia instead). Although it is worth noting that Cocoon can be extended using elm, Reason or any other language that can compile to JS.
Although the team behind Cocoon has been using it in production for many months now, we are still in the early stages of development.
If you think you have a good use-case for Cocoon or want to support its development, or if you have questions/feedback, we'd be eager to hear from you.
Most commercial applications will likely require custom nodes and visualisation to get the most out of Cocoon. We're happy to consult, or take full-time or half-time positions to tailor the workflow to your needs.
All of Cocoon's developers have masters degrees (AI/computer science) with a strong background in visual analytics. If you can offer one or more PhD position where Cocoon could aid through data mining, machine learning or visual analytics, we are interested in hearing about it!
For questions, feedback and offers, either open an issue in this repository or write directly to aengl.