Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphical visualization of Categoricals using Spark Notebook #23

Open
nightscape opened this issue Apr 9, 2015 · 4 comments
Open

Graphical visualization of Categoricals using Spark Notebook #23

nightscape opened this issue Apr 9, 2015 · 4 comments

Comments

@nightscape
Copy link

Hi Daniel,

I've finally managed to create a visualization for Bayesian Networks constructed from Categoricals.
Check out the README of the Gist here:
https://gist.github.com/nightscape/c2fcccac859b3ae34c99#file-readme-md

Could you check if it runs on your machine?
If so we can think about how to maybe integrate this into bayes-scala :)

Best and thanks again for your help!
Martin

@danielkorzekwa
Copy link
Owner

error on cell 11: :17: error: not found: value dk
import dk.bayes.dsl.infer

In [6]:

:local-repo /tmp/snb/repo

res10: String = Repo changed to /tmp/snb/repo!

Out[6]:
Repo changed to /tmp/snb/repo!
In [7]:

:remote-repo sonasnap % default %
https://oss.sonatype.org/content/repositories/snapshots/

res11: String = Remote repo added: sonasnap % default %
https://oss.sonatype.org/content/repositories/snapshots/!

Out[7]:
Remote repo added: sonasnap % default %
https://oss.sonatype.org/content/repositories/snapshots/!
In [10]:

:dp com.github.danielkorzekwa % bayes-scala_2.11 % 0.5-SNAPSHOT

warning: there were 2 feature warning(s); re-run with -feature for details
jars: Array[String] = [Ljava.lang.String;@4cdddd
res20: List[String] =
List(/tmp/snb/repo/com/googlecode/efficient-java-matrix-library/ejml/0.20/ejml-0.20.jar,
/tmp/snb/repo/com/github/fommil/netlib/netlib-native_ref-linux-armhf/1.1/netlib-native_ref-linux-armhf-1.1-natives.jar,
/tmp/snb/repo/net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar,
/tmp/snb/repo/com/github/fommil/netlib/netlib-native_system-linux-x86_64/1.1/netlib-native_system-linux-x86_64-1.1-natives.jar,
/tmp/snb/repo/com/github/fommil/netlib/netlib-native_ref-win-i686/1.1/netlib-native_ref-win-i686-1.1-natives.jar,
/tmp/snb/repo/org/spire-math/spire_2.11/0.7.4/spire_2.11-0.7.4.jar,
/tmp/snb/repo/org/scalanlp/breeze_2.11/0.11.2/breeze_2.11-0.11.2.jar,
/tmp/snb/repo/org/scalanlp/breeze-macros_2.11/0.11.2/breeze-macros_2.11-0.11.2....

Out[10]:

In [11]:

import notebook.front.third.d3._

import notebook., front., widgets._

import notebook.JsonCodec._

import play.api.libs.json._

import dk.bayes.dsl.variable.Categorical

import dk.bayes.dsl.infer

type CategoricalWithInfo = (String, Categorical, Seq[String])

val loadedCode = {

val source = scala.io.Source.fromURL("
https://gist.githubusercontent.com/nightscape/c2fcccac859b3ae34c99/raw/d3_bayesian_network.js
")

val res = source.mkString

source.close()

res

}

import play.api.libs.json._

import play.api.libs.functional.syntax._

import dk.bayes.dsl.variable.Categorical

import dk.bayes.dsl.infer

case class ConditionalProbabilityTable(node: BnNode, parents: Seq[BnNode],
probabilities: Seq[Double])

case class BnNode(name: String, states: Seq[String], currentState:
Option[String] = None)

case class BnEdge(source: Int, target: Int)

def categoricalsToNetwork(marginalizer: Categorical => Seq[Double],
cptExtractor: Categorical => Seq[Double] = _.cpd)(categoricalsWithNames:
Seq[CategoricalWithInfo]): (Seq[(BnNode, ConditionalProbabilityTable,
ConditionalProbabilityTable)], Seq[BnEdge]) = {

import breeze.linalg._

import breeze.numerics._

val nodes = categoricalsWithNames.map { case(name, categorical, states) =>

val currentState = categorical.getValue().map(states.apply)

BnNode(name, states, currentState)

}

val categoricals = categoricalsWithNames.map(_._2)

val nodeMap = categoricals.zip(nodes).toMap

val cpts = categoricals.zip(nodes).map { case(cat, node) =>

val parents = cat.parents.map(nodeMap)

val numCols = node.states.size

val cpd = cptExtractor(cat)

val inferredCpd = infer(cat).cpd

val numRows =  cpd.size / numCols

val cptArray = cpd.toArray

val cpt = new DenseMatrix(numCols, numRows, cptArray).t

(ConditionalProbabilityTable(node, parents, cpt.toArray),

ConditionalProbabilityTable(node, Seq(), inferredCpd.toArray))

}

val edges = nodeMap.flatMap { case(cat, node) =>

val parents = cat.parents.map(nodeMap)

parents.map(p => BnEdge(nodes.indexOf(p), nodes.indexOf(node)))

}.toSeq

(nodes.zip(cpts).map { case(n, (c, m)) => (n, c, m)}, edges)

}

object ConditionalProbabilityTable {

implicit val conditionalProbabilityTableWrites:
Writes[ConditionalProbabilityTable] =
Json.writes[ConditionalProbabilityTable]

}

object BnNode {

implicit val nodeWrites: Writes[BnNode] = Json.writes[BnNode]

}

object BnEdge {

implicit val edgeWrites: Writes[BnEdge] = Json.writes[BnEdge]

}

def networkToJson(nodesWithCpts: Seq[(BnNode, ConditionalProbabilityTable,
ConditionalProbabilityTable)], edges: Seq[BnEdge]): JsObject = {

val nodeJs = nodesWithCpts.map { case(node, cpt, marginalized) =>

Json.toJson(node).asInstanceOf[JsObject] + ("cpt", Json.toJson(cpt)) +

("marginalized", Json.toJson(marginalized))

}

Json.obj("nodes" -> Json.toJson(nodeJs), "edges" -> Json.toJson(edges))

}

val convertCategoricals: Seq[CategoricalWithInfo] => JsObject =
(categoricalsToNetwork({cat: Categorical => infer(cat).cpd})
_).andThen((networkToJson _).tupled)

implicit val categoricalsCodec = new Codec[JsValue,
Seq[CategoricalWithInfo]] {

def encode(x:JsValue):Seq[CategoricalWithInfo] = Seq()

def decode(x:Seq[CategoricalWithInfo]):JsValue = convertCategoricals(x)

}

val playgroundCode = s"""

function(dataPipe, e) {

$loadedCode

var bnGraph = BayesianNetworkGraph(e)

bnGraph(this.dataInit[0])

dataPipe.subscribe(function(d) {

bnGraph(d[0])

})

}

"""

()

:17: error: not found: value dk
import dk.bayes.dsl.infer
^
:16: error: not found: value dk
import dk.bayes.dsl.variable.Categorical
^

On 9 April 2015 at 21:44, Martin Mauch [email protected] wrote:

Hi Daniel,

I've finally managed to create a visualization for Bayesian Networks
constructed from Categoricals.
Check out the README of the Gist here:
https://gist.github.com/nightscape/c2fcccac859b3ae34c99#file-readme-md

Could you check if it runs on your machine?
If so we can think about how to maybe integrate this into bayes-scala :)

Best and thanks again for your help!
Martin


Reply to this email directly or view it on GitHub
#23.

Daniel Korzekwa
Machine Learning Engineer
priv: https://www.linkedin.com/in/danielkorzekwa http://danmachine.com/
blog: http://blog.danmachine.com

@nightscape
Copy link
Author

Ah, you're probably using the Scala 2.10 download of spark-notebook, right?
Had the same problem and I think @andypetrella is fixing this as we speak :)
In the meantime, you can use the Scala 2.11 download, that works for me.

@andypetrella
Copy link

@nightscape you're right man, @danielkorzekwa if you want you can also clone the current master branch and use it right away.
I will probably release (0.4.1) it soon, but I want to be sure people noticing are happy with the current fixes :-D

@danielkorzekwa
Copy link
Owner

Works for me, graphs are generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants