A Radial Basis Function (RBF) neural network is a specialised type of feedforward neural network formulated by Broomhead and Lowe in 1988, that uses radial basis functions as its activation function. It only has one hidden layer with activation with no weights between the input and hidden layer. It aims to perform function approximation and mapping non-linear behaviour of static processes. Conceptually, it is quite similar to a kernel SVM as it maps a non-linearly separable dataset into a different (lower or higher) dimension using the a radial basis function to find a linear separator. Clearly, it is only used for classification tasks due to the inherent architecture.
Here, we see a Gaussian radial basis function. Mathematically, it is given as
Clearly, mu is the centre and sigma is the standard deviation of the Gaussian. For the RBF, the sigma values for the activation functions of the neurons in the hidden layer are called spread parameter which along with the mu values constitute hyperparameters for our model. They can be set manually by the user using standardised values or calculated using statistical methods.
In this repo, we aim to use an RBF NN to approximate a simple function defined as follows
We use 441 input samples, defined below
The resulting distribution is plotted.
The data is split using the train_test_split()
function from the Scikit-Learn library into 2 parts with a ratio 8:2, one for training and one for testing. The training set has 352 observations while the test set has 89.
We implement 3 different approaches
- All training points are set as centres for the activation function of the hidden neurons. This means that the hidden layer will have 352 neurons, equal to the number of training points. The spread parameter is kept constant for all neurons, and we aim to compare the performance as this constant spread parameter is varied.
- 150 random data points are selected to be the centres, resulting in 150 hidden neurons. Again, the spread parameter is constant for all neurons and varied along a defined range to present a comparison.
- 150 centres are calculated using K-means Clustering algorithm, which is a widely used unsupervised machine learning technique. The spread parameter is defined just like the above 2 approaches.
The spread parameter is varied from 0.1 to 0.5 in increments of 0.05.
The metric chosen for comparison is Mean Squared Error (MSE) since our objective is to observe how best can the network approximate the function.
The Jupyter Notebook titled rbfnn.ipynb is the main file with all the code implementation and graphical representation. A report.pdf is also uploaded that goes deeper into deducing the results obtained. All image files are to supplement this README.