Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

Exploring Jsonnet #5

Open
scarecrow1123 opened this issue Jul 6, 2019 · 0 comments
Open

Exploring Jsonnet #5

scarecrow1123 opened this issue Jul 6, 2019 · 0 comments

Comments

@scarecrow1123
Copy link
Owner

scarecrow1123 commented Jul 6, 2019

I found Jsonnet through AllenNLP. Hence a few words on that first. We use AllenNLP for writing our Deep Learning experiments in our team. It is primarily built for doing NLP research on top of PyTorch. However, the abstractions in the library are well designed and easily extensible that it can actually be used for building any fairly straightforward neural network experiments. Perhaps I would write a separate post on how to adopt AllenNLP for non-NLP experiments, but here is an example of how it has been used for Computer Vision.

Jsonnet

Jsonnet is a DSL for creating data templates and comes in handy to generate JSON based configuration data. It comes with a standard library std that includes features like list comprehension, string manipulation, etc. It is primarily meant for generating configuration files. std has a bunch of manifestation utilities that can be used to convert the template to generate targets in .ini, .yaml. For a robust templating language with more complex needs however, I'd suggest to use the awesome StringTemplate.

AllenNLP uses Jsonnet for writing experiment configurations. In other words, the dependencies for running an AllenNLP experiment are specified in a .jsonnet file and the objects are constructed using built in factories. Below is a section from a configuration that defines a simple feedforward MNIST classifier network:

Sample Configuration

mnist_feedforward.jsonnet
{
    // .....
    "model": {
        "mnist_encoder": {
            "num_layers": 2,
            "activations": ["relu", "relu"],
            "input_dim": 784,
            "hidden_dims": 512,
            "dropout": 0.25
        },
        "projection_layer": {
            "num_layers": 1,
            "activations": "relu",
            "input_dim": 512,
            "hidden_dims": 64,
            "dropout": 0.25
        },
        "final_layer": {
            "num_layers": 1,
            "activations": "linear",
            "input_dim": 64,
            "hidden_dims": 10
        }
    }
    // .....
}

Variables

One minor quibble with the above config: when running multiple experiments, the most obvious thing one would do is to change those numbers in every layer. Say to increase the output dimension of the mnist_encoder, input_dim of projection_layer needs to be adjusted too. For a more complex architecture, there is a good possibility that this would lead to a chain of changes to be done manually.

Obvious thing to do now is to use variables. In the below example, notice how the same variable is used for configuring both the output and input sizes of mnist_encoder and projection_layer layers respectively.

mnist_feedforward.jsonnet
local INPUT_SIZE = 784;
local INPUT_ENCODER_OUTPUT_SIZE = 512;
local PROJECTION_SIZE = 64;
local FINAL_OUTPUT_SIZE = 2;
local DROPOUT = 0.25;

{
    // .....
    "model": {
        "mnist_encoder": {
            "num_layers": 2,
            "activations": ["relu", "relu"],
            "input_dim": INPUT_SIZE,
            "hidden_dims": INPUT_ENCODER_OUTPUT_SIZE,
            "dropout": DROPOUT
        },
        "projection_layer": {
            "num_layers": 1,
            "activations": "relu",
            "input_dim": INPUT_ENCODER_OUTPUT_SIZE,
            "hidden_dims": PROJECTION_SIZE,
            "dropout": 0.25
        },
        "final_layer": {
            "num_layers": 1,
            "activations": "linear",
            "input_dim": PROJECTION_SIZE,
            "hidden_dims": FINAL_OUTPUT_SIZE
        }
    }
    // .....
}

Objects

The next natural step of the experiment is to try different types of layers. In the above example mnist_encoder is a feedforward block. It could be a convolution based encoder as below:

mnist_conv.jsonnet
local ConvSpec = {
    "num_layers": 2,
    "input_dim": 784,
    "kernels": [[3, 3], [3, 3]],
    "stride": [1, 1],
    "activations": ["relu", "relu"],
    "output_channels": [32, 64]
};
{
    // ....
    "model": {
        "mnist_encoder": {
            "type": "conv2d"
            "num_layers": ConvSpec.num_layers,
            "input_dim": ConvSpec.input_dim,
            "output_channels": ConvSpec.output_channels,
            "kernels": ConvSpec.kernels,
            "stride": ConvSpec.stride,
            "activations": ConvSpec.activations
        },
        // ....
    },
    // .....
}

Here ConvSpec is a jsonnet object with a bunch of member attributes that defines how the convolution block is used in the classifier. Jsonnet also supports inheritance of objects as shown here.

Functions

The subsequent layers such as projection_layer and final_layer are going to be present in the conv example too and the projection_layer needs an input size to be defined. This will be output size of the conv layer and hardcoding this number is going to cause the same set of problems that we saw above in the first example. Let's define a simple function that computes the output sizes of each layer in a conv block.

local conv_output(dim, kernel, stride, padding, dilation) = std.floor(((dim + 2 * padding - dilation * (kernel - 1) -1) / stride) + 1);

local compute_conv_output_sizes(conv_, input_size, axis, curr_idx=0, sizes=[]) =
    if curr_idx >= conv_.num_layers then
        sizes
    else
        if std.length(sizes) == 0 then
            compute_conv_output_sizes(conv_, input_size, axis, curr_idx+1, [conv_output(input_size, conv_.kernels[curr_idx][axis], conv_.stride[curr_idx][axis], conv_.padding[curr_idx][axis], conv_.dilation[curr_idx][axis])])
        else
            compute_conv_output_sizes(conv_, input_size, axis, curr_idx+1, sizes + [conv_output(sizes[std.length(sizes)-1], conv_.kernels[curr_idx][axis], conv_.stride[curr_idx][axis], conv_.padding[curr_idx][axis], conv_.dilation[curr_idx][axis])]);

compute_conv_output_sizes is a tail recursive function calculates the output size of each conv layer based on the defined kernel size, stride, padding and dilation.

Let's incorporate this definition in the mnist_encoder example:

mnist_conv.jsonnet
local ConvSpec = {
    "num_layers": 2,
    "input_dim": 784,
    "kernels": [[3, 3], [3, 3]],
    "stride": [1, 1],
    "activations": ["relu", "relu"],
    "output_channels": [32, 64],
    "output_sizes": compute_conv_output_sizes(self, self.input_dim, 0),
    "output_size": self.output_sizes[-1]
};
{
    // ....
    "model": {
        "mnist_encoder": {
            "type": "conv2d"
            "num_layers": ConvSpec.num_layers,
            "input_dim": ConvSpec.input_dim,
            "output_channels": ConvSpec.output_channels,
            "kernels": ConvSpec.kernels,
            "stride": ConvSpec.stride,
            "activations": ConvSpec.activations
        },
        // ....
    },
    // .....
}

Now a layer that follows mnist_encoder can make use of ConvSpec.output_size to configure its input sizes.

Imports

So we have defined two variants of encoders here for a classifier. Except for the encoder all the other parts of the model configuration and training configuration are going to be the same. Let's put the base scaffolding that defines the classifier in a .libsonnet file. This will serve as an importable lib that two different experiments can use.

lib-mnist.libsonnet
{
    MNIST(encoder, projection_size, final_output_size, dropout, num_projection_layers): {
    // .....
    "model": {
        "mnist_encoder": encoder,
        "projection_layer": {
            "num_layers": num_projection_layers,
            "activations": "relu",
            "input_dim": if std.objectHas(encoder, "type") then encoder.output_size else encoder.hidden_dims,
            "hidden_dims": projection_size,
            "dropout": dropout
        },
        "final_layer": {
            "num_layers": 1,
            "activations": "linear",
            "input_dim": projection_size,
            "hidden_dims": final_output_size
        }
    }
    // .....
    }
}

Notice above how the MNIST classifier has been parameterized which can be used from different variants of the architecture. Now, let's redefine the feedforward variant to do the import.

mnist_feedforward.jsonnet
local lib = import "lib-mnist.libsonnet";

local INPUT_SIZE = 784;
local INPUT_ENCODER_OUTPUT_SIZE = 512;
local PROJECTION_SIZE = 64;
local FINAL_OUTPUT_SIZE = 2;
local DROPOUT = 0.25;
local NUM_PROJECTION_LAYERS = 1;

local ENCODER = {
     "spec": {
            "num_layers": 2,
            "activations": ["relu", "relu"],
            "input_dim": INPUT_SIZE,
            "hidden_dims": INPUT_ENCODER_OUTPUT_SIZE,
            "dropout": DROPOUT
      }
};

lib.MNIST(ENCODER.spec, PROJECTION_SIZE, FINAL_OUTPUT_SIZE, DROPOUT, NUM_PROJECTION_LAYERS);
mnist_conv.jsonnet
local lib = import "lib-mnist.libsonnet";

local INPUT_SIZE = 784;
local PROJECTION_SIZE = 64;
local FINAL_OUTPUT_SIZE = 2;
local DROPOUT = 0.25;
local NUM_PROJECTION_LAYERS = 1;

local ConvSpec = {
    "num_layers": 2,
    "input_dim": INPUT_SIZE,
    "kernels": [[3, 3], [3, 3]],
    "stride": [1, 1],
    "activations": ["relu", "relu"],
    "output_channels": [32, 64],
    "output_sizes": compute_conv_output_sizes(self, self.input_dim, 0),
    "output_size": self.output_sizes[-1]
};

local ConvEncoder = {
    "spec":  {
            "type": "conv2d"
            "num_layers": ConvSpec.num_layers,
            "input_dim": ConvSpec.input_dim,
            "output_channels": ConvSpec.output_channels,
            "kernels": ConvSpec.kernels,
            "stride": ConvSpec.stride,
            "activations": ConvSpec.activations
      }
};

lib.MNIST(ConvEncoder.spec, PROJECTION_SIZE, FINAL_OUTPUT_SIZE, DROPOUT, NUM_PROJECTION_LAYERS)

Now this kind of a setup would allow to easily bring in rapid prototyping and experimentation. Just replace the encoder with RNN or Self Attention based layers and pass it to lib.MNIST.

Jsonnet brings in a lot of flexibility to defining configurations as we have seen above. There are a lot more interesting things that could be done with respect to configuring neural net experiments with Jsonnet. Imagine writing a grid search procedure in Jsonnet that would generate all possible configurations for the different hyperparameter combinations. That wouldn't be too difficult I guess. A lot of interesting example experiment configurations can be found in AllenNLP's source here.

@scarecrow1123 scarecrow1123 changed the title WIP: Exploring Jsonnet features Exploring Jsonnet Jul 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant