Skip to content
This repository was archived by the owner on May 21, 2022. It is now read-only.
This repository was archived by the owner on May 21, 2022. It is now read-only.

Save/load model/state and Specs #9

Open
@tbreloff

Description

@tbreloff

Frequently it's not enough to have an in-memory representation of a model or the optimization algorithm state. We want to serialize the structure, parameters, and state of the models and sub-learners that we're working with.

I'd like to take this one step further, and define a generic "spec" for these stateful items that we can both serialize/deserialize, but which could be loaded into another "backend" easily. I'm using the same terminology as Plots on purpose... I think there are a lot of similarities in the problems we're looking to solve.

Some examples of backends:

  • Transformations, etc
  • Optim
  • TensorFlow
  • Theano
  • BayesNet

The idea is that, where there is overlapping functionality, there is the opportunity to generalize. Suppose we have a general concept "I want a 3 layer neural net with these numbers of nodes and relu activations, and these initial weight values". Lots of software implements this. If we build this information into a generic spec (similar to the Plot object in Plots), then we only need to connect the spec to a backend's constructor, and we have the ability to convert and transfer models between backends.

The same goes for optimization routines... many times there is a 1-1 mapping between, for example, an Adam optimizer in TensorFlow and Theano. But each backend reimplements the concept in a different way, and reinvents the wheel completely. The Plots model applies here as well. Define a generic concept Adam updater, then map from that to a backend's implementation.

The end result is that I'd like for models and learning algorithms to be built from specs, which then get mapped to sub-learners specific to a particular spec. This would allow us to serialize/deserialize a backend-agnostic spec, not actual Julia objects.

For example, it would allow us to experiment with structures/models/algos in something more flexible (JuliaML I hope), and then convert to a TensorFlow graph for pounding the GPU or sharing with other stubborn researchers that aren't using Julia.

This is closely related to JuliaPlots/Plots.jl#390, and I think design decisions can be used for both.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions