Skip to content

saugat76/UAV_Subband_Allocation_DQN_Pytorch

Repository files navigation

Distributed User Connectivity Maximization in UAV-Based Communication Networks

Overview

This project implements a multi-agent reinforcement learning (MARL) approach to maximize user connectivity in UAV-based communication networks (UCNs). The project is based on the paper "Distributed User Connectivity Maximization in UAV-Based Communication Networks" by Saugat Tripathi, Ran Zhang, and Miao Wang from the Department of Electrical and Computer Engineering, Miami University, Oxford, US.

Abstract

Multi-agent reinforcement learning has been applied to Unmanned Aerial Vehicle (UAV) based communication networks (UCNs) to effectively solve the problem of time-coupled sequential decision making while achieving scalability. This project studies a distributed user connectivity maximization problem in a UCN, aiming to obtain a trajectory design to optimally guide UAVs’ movements in a time horizon to maximize the accumulated number of connected users.

System Model

Network Model

The network model consists of a set of UAVs flying over a target region at a constant altitude to provide communication services to ground users. Each UAV is equipped with directional antennas, and the ground coverage of each UAV is a disk with a specific radius. Users are distributed in the region, with a percentage being distributed in hot spots and the rest uniformly distributed throughout the region.

Spectrum Access

All UAVs share the same spectrum, and users access the spectrum of UAVs using Orthogonal Frequency Division Multiple Access (OFDMA). The allocable bandwidth of individual UAVs is partitioned into orthogonal resource blocks (RBs). Users are assigned a certain number of RBs based on their throughput requirements and channel conditions.

User Admission and Association

A user is considered connected if it is admitted and assigned RBs by a UAV. A two-stage user association policy is considered, where users first send connection requests to the UAVs providing the best channel gain, and UAVs admit users based on spectrum availability. Unadmitted users then find alternative UAVs to send connection requests.

Problem Formulation

Multi-Agent Deep Q Learning (MA-DQL) Algorithm

State Space

The state space for each UAV is defined by its position in a discretized grid. The positions of UAVs are discretized to grid intersections, and the state space dimension is equal to the number of grid intersections.

Action Space

Each UAV has five possible horizontal movements: forward, backward, right, left, and hover. The action space controls the movements of each UAV over the entire time horizon.

Information Exchange and Reward Function

Four different levels of information exchange are proposed:

Implicit Information Exchange (Level 1): UAVs do not explicitly communicate but interact through user association. Exchange of Individual Reward Information (Level 2): UAVs share their local rewards with others. Exchange of Problem-Specific Information (Level 3): UAVs share their updated positions to calculate individual rewards with distance-based penalties. Exchange of Stepwise State Information (Level 4): UAVs share complete state information with others.

Implementation

The MA-DQL algorithm is a distributed algorithm where each UAV is an agent with its own DQN. During training and policy execution, agents exchange information, update DQNs, and make decisions in a synchronized manner. The individual policy after training is represented by their respective DQN.

Simulation Results

Simulations are conducted to compare the convergence performance of different levels of information exchange in both stationary and dynamic user distributions. The results show that Level 3, which leverages task-specific knowledge, achieves the best convergence performance.

How to Clone, Setup, and Run the Project

Clone the Repository

git clone https://github.com/saugat76/UAV-Subband-Allocation-DQN-Pytorch.git
cd UAV-Subband-Allocation-DQN-Pytorch

Dependencies installation and steps to run on a Conda Environment

1) Installation of Anaconda / Ommit this step if already installed

Install annconda on your machine. For further information follow this link https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html

2) Open anaconda prompt and create a new conda environment using the environment.yml file

conda env create -f environment.yml
conda activate uavenv

3) Install any additional dependencies if required

If there are any additional dependencies not covered in the environment.yml file, you can install them manually.

4) Open your IDE through the conda environment

Preferred Method

If you already have Visual Studio Code. You can enter the following command in the same conda enviroment to open up the IDE.

code

4) Adjust the memory limit

Adjust the memory limit and the configuration of GPU according to you system capability.

5) Run the Project

To run the project, use the following command:

python main.py

Arguments

  • --exp-name: Name of this experiment (default: "madql", choices: ['madql', 'maql', 'sample_limited_madql', 'sample_limited_maql'])
  • --user-distribution: Set the user distribution, static/dynamic user mobility (default: "static", choices: ['static', 'dynamic'])
  • --dynamic-user-step: Step count where user position changes (default: 10, choices: [50, 10])
  • --seed: Seed of experiment to ensure reproducibility (default: 1)
  • --torch-deterministic: If toggled, 'torch-backends.cudnn.deterministic=False' (default: True)
  • --cuda: If toggled, CUDA will be enabled by default (default: True)
  • --wandb-track: If toggled, this experiment will be tracked with Weights and Biases project (default: False)
  • --wandb-name: Project name in Weight and Biases (default: "UAV_Subband_Allocation_DQN_Pytorch")
  • --wandb-entity: Entity (team) for Weights and Biases project (default: None)
  • --env-id: ID of developed custom environment (default: "ma-custom-UAV-connectivity")
  • --num-env: Number of parallel environments (default: 1)
  • --num-episode: Number of episodes (default: 351)
  • --num-steps: Number of steps/epoch used in every episode (default: 100)
  • --learning-rate: Learning rate of the DQL algorithm used by every agent (default: 3.5e-4)
  • --gamma: Discount factor used for the calculation of Q-value, can prioritize future reward if kept high (default: 0.95)
  • --batch-size: Batch sample size used in a training batch (default: 512)
  • --epsilon: Epsilon to set the exploration vs exploitation (default: 0.1)
  • --update-rate: Steps at which the target network updates its parameter from the main network (default: 10)
  • --buffer-size: Size of replay buffer of each individual agent (default: 125000)
  • --epsilon-min: Maximum value of exploration-exploitation parameter, only used when epsilon decay is set to True (default: 0.1)
  • --epsilon-decay: If epsilon decay is used, exploitation is prioritized at early episodes and on later episodes exploitation is prioritized (default: False)
  • --epsilon-decay-steps: Set the rate at which the epsilon is decayed, set value equates number of steps at which the epsilon reaches minimum (default: 1)
  • --layers: Set the number of layers for the target and main neural network (default: 2)
  • --nodes: Set the number of nodes for the target and main neural network layers (default: 400)
  • --covered-user-as-input: If set true, state will include covered user as one additional value and use it as input to the neural network (default: False)
  • --time-as-input: If set true, time will be used as one additional state (default: False)
  • --info-exchange-lvl: Information exchange level between UAVs: 1 -> implicit, 2 -> reward, 3 -> position with distance penalty, 4 -> state (default: 1)
  • --num-user: Number of users in defined environment (default: 100)
  • --num-uav: Number of UAVs for the defined environment (default: 5)
  • --generate-user-distribution: If true generate a new user distribution, set true if changing number of users (default: False)
  • --carrier-freq: Set the frequency of the carrier signal in GHz (default: 2)
  • --coverage-xy: Set the length of target area (square) (default: 1000)
  • --uav-height: Define the altitude for all UAVs (default: 350)
  • --theta: Angle of coverage for a UAV in degree (default: 60)
  • --bw-uav: Actual bandwidth of the UAV (default: 4e6)
  • --bw-rb: Bandwidth of a resource block (default: 180e3)
  • --grid-space: Separating space for grid (default: 100)
  • --uav-dis-th: Distance value that defines which UAV agent shares info (default: 1000)
  • --dist-pri-param: Distance penalty priority parameter used in level 3 info exchange (default: 1/5)

Example:

python main.py --exp-name madql --user-distribution dynamic --dynamic-user-step 50 --seed 42 --cuda True --num-episode 500 --num-steps 200

NOTE : These code use gpu version of pytorch library for faster training.

Citation If you use this code, please cite and credit the original paper: "Distributed User Connectivity Maximization in UAV-Based Communication Networks" by Saugat Tripathi, Ran Zhang, and Miao Wang. Available at IEEE Xplore.

Contact For any questions or inquiries, please contact:

Saugat Tripathi: [email protected], [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages