bash scripts/download_data.sh path/to/clip-filtered-dataset
python convert_instructp2p.py --data-dir /path/to/clip-filtered-dataset/ --output-dir /path/to/output-dir/ --num-process 64
wget https://storage.googleapis.com/openimages/2018_04/image_ids_and_rotation.csv
python convert_openimage.py --data-dir /path/to/image_ids_and_rotation.csv --output-dir /path/to/output-dir/ --num-process 8 --cuda_device [0, 1, 2, 3, 4, 5, 6, 7]
if you want to preprocess the data in multiple nodes, you need to specify the --num-machine
and --machine-id
arguments. For example, if you want to preprocess the data in 8 nodes, you can run the following command in node 0:
python convert_openimage.py --data-dir /path/to/image_ids_and_rotation.csv --output-dir /path/to/output-dir/ --num-process 8 --cuda_device [0, 1, 2, 3, 4, 5, 6, 7] --num-machine 8 --machine-id 0
and run the following command in node 1:
python convert_openimage.py --data-dir /path/to/image_ids_and_rotation.csv --output-dir /path/to/output-dir/ --num-process 8 --cuda_device [0, 1, 2, 3, 4, 5, 6, 7] --num-machine 8 --machine-id 1
and so on.