Skip to content

Commit

Permalink
add back files in CycleMLP
Browse files Browse the repository at this point in the history
  • Loading branch information
skpig committed Dec 9, 2021
1 parent caf5f7d commit c6be596
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 632 deletions.
53 changes: 4 additions & 49 deletions image_classification/CycleMLP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,6 @@ python main_multi_gpu.py \
</details>

## Training
### Training with single GPU
To train the CycleMLP model on ImageNet2012 with single GPUs, run the following script using command line:
```shell
sh run_train.sh
Expand All @@ -141,9 +140,11 @@ python main_single_gpu.py \
-data_path='/dataset/imagenet' \
```

### Training with multi-GPU
Run training using multi-GPUs:
<details>

<summary>
Run training using multi-GPUs:
</summary>


```shell
Expand All @@ -159,55 +160,9 @@ python main_multi_gpu.py \
-data_path='/dataset/imagenet' \
```



### Training with multi-node
PaddleVit also supports multi-node distributed training under collective mode.

Suppose you have 2 hosts (denoted as node) with 4 gpus on each machine.
Nodes IP addresses are `192.168.0.16` and `192.168.0.17`.

Then some lines of `run_train_multi_node.sh` should be modified:
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 # number of gpus

-ips= '192.168.0.16, 192.168.0.17' # seperated by comma
```
Run training script in every node:
```shell
sh run_train_multi.sh
```

<details>
<summary>It is possible to train with multi-node even when you have only one machine</summary>

1. Install docker and paddle. For more details, please refer to
[here](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/fromdocker.html).

2. Create a network between docker containers.
```shell
docker network create -d bridge paddle_net
```
3. Create multiple containers as virtual hosts/nodes. Suppose creating 2 containers
with 2 gpus on each node.
```shell
docker run --name paddle0 -it -d --gpus "device=0,1" --network paddle_net\
paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 /bin/bash
docker run --name paddle1 -it -d --gpus "device=2,3" --network paddle_net\
paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 /bin/bash
```
> Noted:
> 1. One can assign same gpu device to different containers. But it may occur OOM since multiple models will run on the same gpu.
> 2. One should use `-v` to bind PaddleViT repository to container.

4. Modify `run_train_multi_node.sh` as described above and run the training script on every container.

> Noted: One can use `ping` or `ip -a` bash command to check containers' ip addresses.
</details>


## Visualization Attention Map
**(coming soon)**

Expand Down
Loading

0 comments on commit c6be596

Please sign in to comment.