updates to docs

davidBelanger · Jan 5, 2017 · f33bb72 · f33bb72
1 parent d07fea2
commit f33bb72
Show file tree

Hide file tree

Showing 4 changed files with 103 additions and 49 deletions.
diff --git a/Applications.md b/Applications.md
@@ -1,26 +1,65 @@
-# The SPENProblem API
+# Implementing New SPEN Applications
 
-SPEN applications, such as SPENMultilabelClassification and SPENDenoise extend the SPENProblem class. You will need to implement the abstract methods defined towards the top of SPENProblem.lua for data loading, preprocessing, evaluation, etc. You will also need to make sure that your new class contains the following members:
+See main.lua for examples of various SPEN applications. The SPEN code is quite modular. The only thing that needs to be implemented is the load_problem method that is build in main.lua. This returns the following application-specific items.
 
-`problem.inference_net`: the energy network E_x(y), but using pre-computed features rather than the raw value of x. This takes {labels, features} and returns a single number per minibatch element
+`model` is an object that obeys the SPEN api, described below.
 
-`problem.fixed_features_net`: Feature mapping F(x). This may be pretrained using classification, or loaded from file. If the training mode is 'clampFeatures', then we don't update its parameters, and don't even backprop through it during training. 
-
-`problem.learned_features_net`: (Optional) The overall feature mapping is fixed_features_net followed by learned_features_net. These features are learned even in 'clampFeatures' mode.
+`y_shape` is a table containing the shape of y, the iterates for gradient-based SPEN optimization.
 
-`problem.initialization_net`: This network takes {x,F(x)} and returns an initial guess y_0 for the labels. The reason it takes the raw input x is that this might be important for getting the size of the inputs when the problem can have variable-sized inputs. See SPENProblem.lua to see how this interacts with the --initAtLocalPrediction flag. Generally, you don't implement initialization_net directly. Instead, you decide to init the labels with the outputs of a local classifier, or initialize them to some fixed hard-coded value (eg. 0).
+`evaluator_factory` is a function that takes a batcher and a soft predictor and returns an object that implements an evaluate(timestep) method used for evaluating and logging performance. 
 
-`problem.iterate_transform`: problem.iterate_transform --Everything is set up for unconstrained optimization. This maps things onto the constrain set at the end of optimization. For example, it converts logits to probabilities. Set to Identity() if you don't need a transformation. 
+`preprocess_func` is a function that takes (y,x,num_examples) and returns optionally preprocessed versions of the data (eg. expanding int indices for y to a one-hot representation.) If preprocess_func is nil, then no such transformation will be applied. 
 
+`train_batcher` Object that provides two methods: get_iterator() and get_ongoing_iterator(). The first is an iterator used typically used for test data, which returns {nil,nil,nil} when it reaches the end of the data. The second is an infinite iterator (eg., it loops back to the beginning of the dataset once it reaches the end). Each method returns a lua iterator that can be called and will return: {y,x, num_actual_examples}. The outer dimension for y and x is expected to be params.batchsize always. If there isn't enough data to fill a tensor of this size, it may zero-pad the data, in which case num_actual_examples refers to the number of actual examples. This is useful at test time to make sure that over-inflated accuracy numbers are not computed on the padding. 
 
+`test_batcher` Similar batcher, but for test data. 
 
-`problem.iterateRange`: A table of 2 numbers. If we are doing projected gradient descent for test-time optimization, this is the upper and lower bound on the value for each prediction variable, eg. {0,1} or {0,255}.
 
-`problem.input_features_pretraining_net`: This is a simple feed-forward classifier, often used for pretraining the features used by the inference_net. Also, for problems where the energy function has terms analogous to the 'unary potentials' of a graphical model, this classifier may provide these per-label terms. 
+## The SPEN API
 
-`problem.structured_training_loss.loss_criterion`: The training criterion. Used for pretraining the 'unaries' and also for the RNN net. 
+SPEN applications, extend the SPEN class, given in model/SPEN.lua. See model/ for various examples.
 
-###Block-Structured Y
-For some problems, there are multiple blocks of optimization variables. This was supported in earlier versions of SPEN, but not anymore. If you need this functionality, let me know and maybe we can reboot it.
+
+### Methods that SPEN Subclasses Must Implement 
+
+`SPEN:features_net()` returns a network that takes in x and returns features F(x)
+
+`SPEN:unary_energy_net()` returns a network that takes F(x) and returns a set of 'local potentials' such that the local energy term is given between the inner product between the output of this network and self:convert_y_for_local_potentials(y). This network is used as a term in the SPEN energy. It is also used as the local classifier used in pretraining the features and optionally as a means to initialize a guess for y0 when performing iterative optimization to form predictions.
+
+`SPEN:convert_y_for_local_potentials(y)` takes an nngraph Node and returns an nngraph node used for taking inner products with the 'local potentials' of the unary energy network. Typically, this can be set to the identity.
+
+`SPEN:global_energy_net()` return a network that takes {y,F(x)} and returns a number. The total SPEN energy is the sum of this and the unary_energy_net, where the global term is weighted by config.global_term_weight. 
+
+`SPEN:uniform_initialization_net()` returns a network that takes no inputs and returns an initial guess y0 for iterative optimization. A default implementation, using nn.Constant, is implemented in SPEN.lua. Only override this if necessary. 
+
+## SPEN Members and Methods that Outside Code Accesses
+`spen.initialization_network` takes x and returns a guess for y for iterative optimization. May or may not be the same as the classifier network. 
+
+`spen.features_network` takes x and returns features F(x)
+
+`spen.energy_network` takes {y,F(x)} and returns an energy value (to be minimized wrt y)
+
+`spen.classifier_network` takes x and returns a guess for y. This is used for pretraining. 
+
+`spen.global_potentials_network` takes {y,F(x)} and returns the value of the global energy terms.
+
+`spen:set_feature_backprop(value)` takes a boolean value. If value is true, then no backprop will be performed through the features network during training. This prevents the parameters of the features network from being updated. 
+
+`spen:set_unary_backprop(value)` Similarly, this prevents any updates to both the features network and the local potentials, which are a term in the energy function, and also may be used for the initialization_network. 
+
+### Config options for SPEN
+
+The SPEN construct takes two tables, config and params, where the first is for application-specific options and the second contains general options for the entire SPEN software package. 
+`params.use_cuda` whether to use the GPU. 
+
+`params.use_cudnn` whether to use cudnn implementations for certain nn layers. 
+
+`params.init_at_local_prediction` whether gradient-based prediction should initialize y0 uniformly or using the local classifier network. 
+
+`config.y_shape` a table for the shape of y, the input to the energy function.
+
+`config.logit_iterates` whether gradient-based optimization of the energy is done in logit space or in normalized space.
+
+`config.global_term_weight` weight to place on global energy terms vs. local energy terms when composing the full energy function. 
 
 
diff --git a/Denoising.md b/Denoising.md
@@ -1,30 +1,26 @@
 
 # Image Denoising with SPENs
 
-A SPEN architecture for Image Denoising is implemented in Denoising.lua, with some general functionality added in SPENProblem.lua. 
+To run a self-contained example of image denoising, cd to the base directory for SPEN, and then execute
 
-Let x be the input blurry image and y be the sharpened image we seek to predict. We recover y by MAP inference, where we find y that maximizes P(x | y ) P(y). We assume a Gaussian noise model, so that P(x|y) is scaled mean squared error. There are various parametrizations for the prior distribution P(y). Many previous works have employed a 'field of experts' model: P(y) \propto exp(\sum_i \sum_xy w_i \rho (f_i(x,y))), where f_1(\cdot,\cdot), \ldots, f_k(\cdot,\cdot) are a set of localized linear filter responses and \rho is a nonlinearity. 
-
-Early work estimated the the weights w_i and the filters by maximizing the likelihood of a dataset of sharp images. Inference in the field of experts model is intractable, and thus practitioners employed approximate methods such as contrastive divergence. 
-
-An alternative line of work estimated the parameters using end-to-end approaches, by applying automatic differentiation to the procedure of iteratively solving the MAP objective, for a fixed number of iterations. 
+`wget https://www.cics.umass.edu/~belanger/depth_denoise.tar.gz`
 
+`tar -xvf depth_denoise.tar.gz`
 
+`depth_cmd.sh`
 
-<!--- #"Generic Methods for Optimization-Based Modeling." AISTATS 2012. [link](http://www.jmlr.org/proceedings/papers/v22/domke12/domke12.pdf). --->
 
-We employ this end-to-end approach, but consider substantially more expressive prior distributions over y than a field of experts from linear filters. Namely, we consider an arbitrary deep network: P(y) \propto exp(D(y)). We also support functionality where D can have terms that operate in the frequency domain. 
+This is downloads a preprocessed version of a small amount of the depth denoising data from this [paper](http://www.cs.toronto.edu/~slwang/proximalnet.pdf), made available [here](https://bitbucket.org/shenlongwang/), and then fits a SPEN. The associated SPEN architecture is defined in model/DepthSPEN.lua. 
 
+Note that this isn't a traditional denoising task where we assume a parametric noise model, which can be used for producing training pairs of noisy and clean images. 
 
-### Data Processing
-You will need a large number of sharp images, which you can then add noise to using some simple code. The denoising code assumes a Gaussian likelihood, so to avoid model mis-specification you should add white noise. However, feel free to use alternative image corruptions. Some helpful utility code is: 
-
-`scripts/im_pairs_to_torch.lua <file_list> <num examples per file> <output_dir> <num_total_examples>`
+Let x be the input blurry image and y be the sharpened image we seek to predict. We recover y by MAP inference, where we find y that maximizes P(x | y ) P(y). We assume a Gaussian noise model, so that P(x|y) is scaled mean squared error. There are various parametrizations for the prior distribution P(y). Many previous works have employed a 'field of experts' model: P(y) \propto exp(\sum_i \sum_xy w_i \rho (f_i(x,y))), where f_1(\cdot,\cdot), \ldots, f_k(\cdot,\cdot) are a set of localized linear filter responses and \rho is a nonlinearity. 
 
-This depends on the torch gm package, will requires you to install graphicsmagick. The images can be any format loadable by graphicsmagick.
+Early work estimated the the weights w_i and the filters by maximizing the likelihood of a dataset of sharp images. Inference in the field of experts model is intractable, and thus practitioners employed approximate methods such as contrastive divergence. 
 
-Each line of file_list is of the form `<blurry_image>\s<sharp image>`
+An alternative line of work estimated the parameters using end-to-end approaches, by applying automatic differentiation to the procedure of iteratively solving the MAP objective, for a fixed number of iterations. 
 
+We employ this end-to-end approach, but consider substantially more expressive prior distributions over y than a field of experts from linear filters. Namely, we consider an arbitrary deep network: P(y) \propto exp(D(y)). 
 
 
 ### Related Work
@@ -40,6 +36,11 @@ Each line of file_list is of the form `<blurry_image>\s<sharp image>`
 
 > Justin Domke. Generic Methods for Optimization-Based Modeling." AISTATS 2012.
 
-### Applications
+### Data Processing
+You will need a large number of pairs of noisy and clean images. Some helpful utility code is: 
 
-Besides providing an effective image denoising network, this learning procedure produces a standalong network P(y), which returns the prior log-probability of a given image. This may be useful in various downstream tasks. You could even sample from the space of images using, for example, Hamiltonian Monte Carlo. 
+`scripts/im_pairs_to_torch.lua <file_list> <num examples per file> <output_dir> <num_total_examples>`
+
+This depends on the torch gm package, will requires you to install graphicsmagick. The images can be any format loadable by graphicsmagick.
+
+Each line of file_list is of the form `<blurry_image>\s<sharp image>`
diff --git a/MultiLabelClassification.md b/MultiLabelClassification.md
@@ -1,15 +1,22 @@
 # Multi-Label Classification with SPENs
 
-The SPEN architecture for MLC is described in detail in our [paper](https://people.cs.umass.edu/~belanger/belanger_spen_icml.pdf). It is implemented in SPENMultiLabelClassification.lua, with some general functionality added in SPENProblem.lua. See main.lua for a description of the command line arguments. 
 
+To run a self-contained example of multi-label classification, cd to the base directory for SPEN, and then execute
 
-SPENMultiLabelClassification calls MultiLabelEvaluation.lua, which computes F1 score. This depends on a threshold, between 0 and 1, for converting soft decisions to hard decisions. If you use the -predictionThresh argument (eg., when evaluating on your test set), then we use a single threshold. Otherwise, it tries a bunch of thresholds and finds the best F1.
+`wget http://www.cics.umass.edu/~belanger/icml_mlc_data.tar.gz`
 
+`tar -xvf icml_mlc_data.tar.gz`
 
-See ml_cmd.sh for an example script for running the code.
+`sh mlc_cmd.sh`
+
+The SPEN architecture for MLC is described in detail in our [paper](https://people.cs.umass.edu/~belanger/belanger_spen_icml.pdf). It is implemented in MLCSPEN.lua. See main.lua for the load_problem implementation for MLC. This also instantiates data loading, evaluation, etc.
+
+We evaluate using evaluate/MultiLabelEvaluation.lua, which computes F1 score. This depends on a threshold, between 0 and 1, for converting soft decisions to hard decisions. If you use the -predictionThresh argument (eg., when evaluating on your test set), then we use a single threshold. Otherwise, it tries a bunch of thresholds and finds the best F1.
+
+Note that our new code does not reproduce the configuration of the ICML experiments. The evaluation is the same, but the training method is substantially different. Even if you train with an SSVM loss, there are various configuration differences (eg. how we detect convergence of the inner prediction problem) 
 
 ### Data Processing
-It will be useful to use the conversion script
+For new data it will be useful to use the conversion script
 
 `scripts/ml2torch.lua <features_file> <labels_file> <out_file>`