Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added dqn algo #2011

Merged
merged 2 commits into from
Nov 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions docs/machine-learning/deep-q-networks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
id: deep-q-networks
title: Deep Q-Networks (DQN) Algorithm
sidebar_label: Deep Q-Networks
description: "An introduction to Deep Q-Networks, a reinforcement learning technique that combines Q-Learning with deep neural networks to handle complex, high-dimensional state spaces."
tags: [machine learning, reinforcement learning, DQN, deep learning, algorithms]
---

<Ads />

### Definition:
**Deep Q-Networks (DQN)** is a reinforcement learning algorithm that extends Q-Learning by using deep neural networks to approximate the Q-function. This allows DQN to handle high-dimensional state spaces that are not feasible with traditional tabular Q-Learning. The approach was popularized by DeepMind's success in applying DQN to play Atari games at a superhuman level.

### Characteristics:
- **Combines Deep Learning with Reinforcement Learning**:
DQN leverages neural networks to estimate Q-values, enabling agents to make decisions in environments with complex, high-dimensional state representations.

- **Experience Replay**:
To improve training stability, DQN stores experiences (state, action, reward, next state) in a replay buffer and samples mini-batches from this buffer to train the network. This reduces correlations between consecutive experiences.

- **Fixed Target Network**:
DQN uses a separate target network to provide stable Q-value updates. This network is periodically updated with the weights of the main Q-network, preventing harmful feedback loops during training.

<Ads />

### How It Works:
DQN follows the same principles as Q-Learning but uses a deep neural network $ Q(s, a; \theta) $ parameterized by weights $ \theta $ to approximate Q-values. The network is trained to minimize the loss function:

$$
L(\theta) = \mathbb{E}_{(s, a, r, s') \sim \text{ReplayBuffer}} \left[ \left( r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right)^2 \right]
$$

- **$ \theta $**: Weights of the current Q-network
- **$ \theta^- $**: Weights of the target network (held fixed for stability)
- **$ \gamma $**: Discount factor for future rewards

### Steps Involved:
1. **Initialize Replay Buffer and Networks**:
Initialize the replay buffer, the Q-network with weights $ \theta $, and the target network with weights $ \theta^- $ (set $ \theta^- = \theta $).

2. **Choose an Action**:
Select an action using an epsilon-greedy policy based on the Q-values predicted by the Q-network.

3. **Take Action and Store Experience**:
Execute the action, observe the reward, and store the experience (state, action, reward, next state) in the replay buffer.

4. **Sample Mini-Batch**:
Randomly sample a mini-batch of experiences from the replay buffer for training.

5. **Compute Target and Update Weights**:
Compute the target Q-value and update the Q-network weights $ \theta $ by minimizing the loss function.

6. **Update Target Network**:
Periodically update the target network weights $ \theta^- $ to match the Q-network weights $ \theta $.

<Ads />

### Problem Statement:
Develop a DQN agent capable of learning policies in environments where the state space is large or continuous, such as video games or robotics.

### Key Concepts:
- **Neural Network Architecture**:
A multi-layer neural network that takes the state as input and outputs Q-values for each action.

- **Experience Replay Buffer**:
Stores past experiences to break the temporal correlations between consecutive training samples and improve training efficiency.

- **Epsilon-Greedy Strategy**:
Balances exploration and exploitation by selecting random actions with probability $ \epsilon $ and greedy actions based on the Q-network with probability $ 1 - \epsilon $.

### Example:
Consider an agent learning to play a simple game with pixel-based input:

- **States**: Raw pixel data from the game screen.
- **Actions**: Available moves (e.g., left, right, jump).
- **Rewards**: +1 for progressing in the game, -1 for losing a life.

<Ads />

**Training Steps**:
1. Preprocess the state (e.g., resize, grayscale).
2. Feed the state into the Q-network to predict Q-values.
3. Select an action using the epsilon-greedy strategy.
4. Observe the reward and next state.
5. Store the experience in the replay buffer.
6. Sample a mini-batch from the buffer and train the network.

### Python Implementation:
Below is a simplified version of a DQN implementation in Python using **TensorFlow/Keras**:

```python
import numpy as np
import tensorflow as tf
from collections import deque
import random

# Neural network model for the Q-network
def build_model(state_size, action_size):
model = tf.keras.Sequential([
tf.keras.layers.Dense(24, input_dim=state_size, activation='relu'),
tf.keras.layers.Dense(24, activation='relu'),
tf.keras.layers.Dense(action_size, activation='linear')
])
model.compile(optimizer='adam', loss='mse')
return model

# Replay buffer
replay_buffer = deque(maxlen=2000)

# Parameters
state_size = 4 # Example state space size
action_size = 2 # Example action space size
gamma = 0.95 # Discount factor
epsilon = 1.0 # Exploration rate
epsilon_min = 0.01
epsilon_decay = 0.995
batch_size = 32
update_target_frequency = 10

# Initialize models
q_network = build_model(state_size, action_size)
target_network = build_model(state_size, action_size)
target_network.set_weights(q_network.get_weights())

# Training loop
for episode in range(num_episodes):
state = env.reset()
done = False
while not done:
# Choose action using epsilon-greedy strategy
if np.random.rand() < epsilon:
action = env.action_space.sample()
else:
action = np.argmax(q_network.predict(state))

# Take action, observe reward and next state
next_state, reward, done, _ = env.step(action)
replay_buffer.append((state, action, reward, next_state, done))
state = next_state

# Sample mini-batch and train
if len(replay_buffer) > batch_size:
minibatch = random.sample(replay_buffer, batch_size)
for state, action, reward, next_state, done in minibatch:
target = reward
if not done:
target += gamma * np.max(target_network.predict(next_state))
target_f = q_network.predict(state)
target_f[0][action] = target
q_network.fit(state, target_f, epochs=1, verbose=0)

# Update epsilon
if epsilon > epsilon_min:
epsilon *= epsilon_decay

# Update target network
if episode % update_target_frequency == 0:
target_network.set_weights(q_network.get_weights())

print("DQN training completed.")
```

### Conclusion:
DQN has significantly advanced the capabilities of reinforcement learning, making it practical to apply RL algorithms in complex environments with large state spaces. By leveraging deep neural networks, DQN has paved the way for applications ranging from gaming to real-world tasks like robotics.