-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d07fea2
commit f33bb72
Showing
4 changed files
with
103 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,65 @@ | ||
# The SPENProblem API | ||
# Implementing New SPEN Applications | ||
|
||
SPEN applications, such as SPENMultilabelClassification and SPENDenoise extend the SPENProblem class. You will need to implement the abstract methods defined towards the top of SPENProblem.lua for data loading, preprocessing, evaluation, etc. You will also need to make sure that your new class contains the following members: | ||
See main.lua for examples of various SPEN applications. The SPEN code is quite modular. The only thing that needs to be implemented is the load_problem method that is build in main.lua. This returns the following application-specific items. | ||
|
||
`problem.inference_net`: the energy network E_x(y), but using pre-computed features rather than the raw value of x. This takes {labels, features} and returns a single number per minibatch element | ||
`model` is an object that obeys the SPEN api, described below. | ||
|
||
`problem.fixed_features_net`: Feature mapping F(x). This may be pretrained using classification, or loaded from file. If the training mode is 'clampFeatures', then we don't update its parameters, and don't even backprop through it during training. | ||
|
||
`problem.learned_features_net`: (Optional) The overall feature mapping is fixed_features_net followed by learned_features_net. These features are learned even in 'clampFeatures' mode. | ||
`y_shape` is a table containing the shape of y, the iterates for gradient-based SPEN optimization. | ||
|
||
`problem.initialization_net`: This network takes {x,F(x)} and returns an initial guess y_0 for the labels. The reason it takes the raw input x is that this might be important for getting the size of the inputs when the problem can have variable-sized inputs. See SPENProblem.lua to see how this interacts with the --initAtLocalPrediction flag. Generally, you don't implement initialization_net directly. Instead, you decide to init the labels with the outputs of a local classifier, or initialize them to some fixed hard-coded value (eg. 0). | ||
`evaluator_factory` is a function that takes a batcher and a soft predictor and returns an object that implements an evaluate(timestep) method used for evaluating and logging performance. | ||
|
||
`problem.iterate_transform`: problem.iterate_transform --Everything is set up for unconstrained optimization. This maps things onto the constrain set at the end of optimization. For example, it converts logits to probabilities. Set to Identity() if you don't need a transformation. | ||
`preprocess_func` is a function that takes (y,x,num_examples) and returns optionally preprocessed versions of the data (eg. expanding int indices for y to a one-hot representation.) If preprocess_func is nil, then no such transformation will be applied. | ||
|
||
`train_batcher` Object that provides two methods: get_iterator() and get_ongoing_iterator(). The first is an iterator used typically used for test data, which returns {nil,nil,nil} when it reaches the end of the data. The second is an infinite iterator (eg., it loops back to the beginning of the dataset once it reaches the end). Each method returns a lua iterator that can be called and will return: {y,x, num_actual_examples}. The outer dimension for y and x is expected to be params.batchsize always. If there isn't enough data to fill a tensor of this size, it may zero-pad the data, in which case num_actual_examples refers to the number of actual examples. This is useful at test time to make sure that over-inflated accuracy numbers are not computed on the padding. | ||
|
||
`test_batcher` Similar batcher, but for test data. | ||
|
||
`problem.iterateRange`: A table of 2 numbers. If we are doing projected gradient descent for test-time optimization, this is the upper and lower bound on the value for each prediction variable, eg. {0,1} or {0,255}. | ||
|
||
`problem.input_features_pretraining_net`: This is a simple feed-forward classifier, often used for pretraining the features used by the inference_net. Also, for problems where the energy function has terms analogous to the 'unary potentials' of a graphical model, this classifier may provide these per-label terms. | ||
## The SPEN API | ||
|
||
`problem.structured_training_loss.loss_criterion`: The training criterion. Used for pretraining the 'unaries' and also for the RNN net. | ||
SPEN applications, extend the SPEN class, given in model/SPEN.lua. See model/ for various examples. | ||
|
||
###Block-Structured Y | ||
For some problems, there are multiple blocks of optimization variables. This was supported in earlier versions of SPEN, but not anymore. If you need this functionality, let me know and maybe we can reboot it. | ||
|
||
### Methods that SPEN Subclasses Must Implement | ||
|
||
`SPEN:features_net()` returns a network that takes in x and returns features F(x) | ||
|
||
`SPEN:unary_energy_net()` returns a network that takes F(x) and returns a set of 'local potentials' such that the local energy term is given between the inner product between the output of this network and self:convert_y_for_local_potentials(y). This network is used as a term in the SPEN energy. It is also used as the local classifier used in pretraining the features and optionally as a means to initialize a guess for y0 when performing iterative optimization to form predictions. | ||
|
||
`SPEN:convert_y_for_local_potentials(y)` takes an nngraph Node and returns an nngraph node used for taking inner products with the 'local potentials' of the unary energy network. Typically, this can be set to the identity. | ||
|
||
`SPEN:global_energy_net()` return a network that takes {y,F(x)} and returns a number. The total SPEN energy is the sum of this and the unary_energy_net, where the global term is weighted by config.global_term_weight. | ||
|
||
`SPEN:uniform_initialization_net()` returns a network that takes no inputs and returns an initial guess y0 for iterative optimization. A default implementation, using nn.Constant, is implemented in SPEN.lua. Only override this if necessary. | ||
|
||
## SPEN Members and Methods that Outside Code Accesses | ||
`spen.initialization_network` takes x and returns a guess for y for iterative optimization. May or may not be the same as the classifier network. | ||
|
||
`spen.features_network` takes x and returns features F(x) | ||
|
||
`spen.energy_network` takes {y,F(x)} and returns an energy value (to be minimized wrt y) | ||
|
||
`spen.classifier_network` takes x and returns a guess for y. This is used for pretraining. | ||
|
||
`spen.global_potentials_network` takes {y,F(x)} and returns the value of the global energy terms. | ||
|
||
`spen:set_feature_backprop(value)` takes a boolean value. If value is true, then no backprop will be performed through the features network during training. This prevents the parameters of the features network from being updated. | ||
|
||
`spen:set_unary_backprop(value)` Similarly, this prevents any updates to both the features network and the local potentials, which are a term in the energy function, and also may be used for the initialization_network. | ||
|
||
### Config options for SPEN | ||
|
||
The SPEN construct takes two tables, config and params, where the first is for application-specific options and the second contains general options for the entire SPEN software package. | ||
`params.use_cuda` whether to use the GPU. | ||
|
||
`params.use_cudnn` whether to use cudnn implementations for certain nn layers. | ||
|
||
`params.init_at_local_prediction` whether gradient-based prediction should initialize y0 uniformly or using the local classifier network. | ||
|
||
`config.y_shape` a table for the shape of y, the input to the energy function. | ||
|
||
`config.logit_iterates` whether gradient-based optimization of the energy is done in logit space or in normalized space. | ||
|
||
`config.global_term_weight` weight to place on global energy terms vs. local energy terms when composing the full energy function. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.