bokeh_project_one

Messing with demo data from <People> to make a cool viz.

Assorted possibly-useful analysis libraries

Things we could throw at the data to perhaps produce something useful. NOT something necessarily more useful than

skl-groups for clusters of text documents with identical authors.
MIT-LL's pLSA topic clustering To get a topic model of prominent text themes. Always handy as a way for showing relationships between documents. (Newman incorporates MIT's pLSA tools)
If we have a particular type of document that we'd like to identify ActiveSearch would be useful when we spend some time classifying stuff.
NILS isn’t explained clearly enough to understand whether it’s very clever or (more likely) a replication of some sort of network calculation already in iGraph. Could be easy fodder for a network viz.
snap.py and igraph both include a lot of network analytics written in C and with python front ends, so if we want to calculate networks stats on any FERC data we can use either one. Since snap.py doesn't support python 3 I'm inclined towards igraph, but either is fine. The SNAP group at Stanford is kind of hot.
We could try to use Blaze as a drop-in substitute for pandas. It's kind of clunky still, but should make it easier to work with the data interactively while on disk.

Assorted possibly-useful visualization platforms

Use VizLinc instead of / in addition to Newman? Another let's-find-insight-in-text tool that we could toss the raw data into.
UpSet visualization of different entity characteristics? We'd need to consider what sort of set relationships we might find in-between the different documents
Neon is yet another one-size-fits-all text analysis dashboard. It might also be worth an ingest.
Bokeh is less a platform than a particular visualization library, but that's honestly better. Let's stick with the examples, turn off interactivity unless we really want it, and call that that.

Some candidate workflows:

Dump data into SQLite / SQL / something so that it gets a bit preprocessed.
Use blaze and dask to do some parallel processing of data frames to generate new fields
Render the results.
Dump data into neon
Visualize it, get some insights.
Generate some relational sets.
Put the data into UpSet.

Things I'd like from Sotera

Ingest of all the data into Newman!

Random call notes

"need to stick to the established model" (Database, GUI, etc.)
Mike wants to represent the volume of emails across multiple targets the goal
Want to stick with the XDATA catalog?
Why not Newman?
- several issues with getting full-featured stuff out of the XDATA catalog.
  - need to find out if something is full featured
  - we're biting off a big support challenge if we choose to use something.
- outlook messages are super compressed.
According to Hugh, Sotera will deal with ingesting the data into Newman.

"Not a lot of data if it's to, from, and count"

elastic for the messages
igraph / snappy to turn elastic messages into graphs
Tangelo working with loaded graph data.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bokeh_project_one

Assorted possibly-useful analysis libraries

Assorted possibly-useful visualization platforms

Some candidate workflows:

Things I'd like from Sotera

Random call notes

About

Releases

Packages

Contributors 2

giantoak/bokeh_project_one

Folders and files

Latest commit

History

Repository files navigation

bokeh_project_one

Assorted possibly-useful analysis libraries

Assorted possibly-useful visualization platforms

Some candidate workflows:

Things I'd like from Sotera

Random call notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages