Skip to content

L-layer DNN Implementation with only numpy and the python standard library !

Notifications You must be signed in to change notification settings

mqasim41/DNN_From_Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Deep Neural Networks and Backpropagation

Deep Neural Networks (DNNs) are a type of artificial neural network (ANN) with multiple hidden layers between the input and output layers. These networks are capable of learning complex patterns and representations from data, making them suitable for a wide range of tasks such as image recognition, natural language processing, and speech recognition.

Structure of Deep Neural Networks

A DNN consists of multiple layers of interconnected neurons, organized into three main types of layers:

  1. Input Layer: This layer consists of neurons that receive input data. Each neuron represents a feature or attribute of the input data.

  2. Hidden Layers: These layers are responsible for learning and extracting meaningful features from the input data. Deep neural networks have multiple hidden layers, hence the term "deep". Each hidden layer performs transformations on the input data using weighted connections and activation functions.

  3. Output Layer: The final layer of the network produces the output predictions. The number of neurons in this layer depends on the type of problem being solved. For binary classification tasks, there may be a single neuron with a sigmoid activation function, while for multi-class classification tasks, there may be multiple neurons with softmax activation.

Activation Functions

Activation functions introduce non-linearity to the network, enabling it to learn complex mappings between inputs and outputs. Some commonly used activation functions in DNNs include:

  1. Sigmoid: $f(z) = \frac{1}{1 + e^{-z}}$
  2. ReLU (Rectified Linear Unit): $f(z) = \max(0, z)$
  3. Tanh (Hyperbolic Tangent): $f(z) = \frac{e^{z} - e^{-z}}{e^{z} + e^{-z}}$
  4. Softmax: $f(z)i = \frac{e^{z_i}}{\sum{j} e^{z_j}}$

Backpropagation

Backpropagation is a key algorithm used to train deep neural networks. It involves computing the gradient of the loss function with respect to the network's parameters (weights and biases) and updating these parameters to minimize the loss. The process consists of two main steps:

  1. Forward Pass: During the forward pass, input data is fed through the network, and predictions are made. The output of each layer is computed using the input data, weights, biases, and activation functions.

  2. Backward Pass: In the backward pass, the gradient of the loss function with respect to each parameter in the network is computed using the chain rule of calculus. This gradient indicates how much the loss would change with a small change in the parameter. The gradients are then used to update the parameters using optimization algorithms such as gradient descent.

Backpropagation Formulas

Loss Function:

Let $L$ denote the loss function, $y_i$ the true label, and $\hat{y}_i$ the predicted probability for class $i$. For binary classification, the commonly used loss function is binary cross-entropy:

$$ L(y, \hat{y}) = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] $$

For multi-class classification, cross-entropy loss or categorical cross-entropy is typically used:

$$ L(y, \hat{y}) = -\frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{C} y_{ij} \log(\hat{y}_{ij}) $$

Where:

  • $N$ is the number of samples
  • $C$ is the number of classes

Gradient Calculation:

The gradients of the loss function with respect to the parameters of the network are computed using the chain rule. For a parameter $w_{ij}$ in layer $l$, the gradient is given by:

$$ \frac{\partial L}{\partial w_{ij}^{(l)}} = \frac{\partial L}{\partial a_{j}^{(l)}} \frac{\partial a_{j}^{(l)}}{\partial z_{j}^{(l)}} \frac{\partial z_{j}^{(l)}}{\partial w_{ij}^{(l)}} $$

Where:

  • $a_{j}^{(l)}$ is the activation of neuron $j$ in layer $l$
  • $z_{j}^{(l)}$ is the weighted sum of inputs to neuron $j$ in layer $l$

Parameter Update:

The parameters of the network (weights and biases) are updated using an optimization algorithm such as gradient descent. The update rule for a parameter $w_{ij}^{(l)}$ is given by:

$$ w_{ij}^{(l)} = w_{ij}^{(l)} - \alpha \frac{\partial L}{\partial w_{ij}^{(l)}} $$

Where:

  • $\alpha$ is the learning rate

Conclusion

Deep neural networks are powerful models capable of learning complex patterns from data. Backpropagation is a fundamental algorithm for training these networks, allowing them to learn from labeled data by adjusting their parameters to minimize a given loss function. Understanding the concepts and mathematics behind deep neural networks and backpropagation is essential for effectively designing and training neural network models for various tasks.

About

L-layer DNN Implementation with only numpy and the python standard library !

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published