For cuDNN, please use cuDNN v7.0.5 or v7.0.4 instead of cuDNN 7.1. cuDNN 7.1 is not compatible with tensorflow and will cause issue. If you've installed cnDNN 7.1, you can use
sudo apt-get remove libcudnn7
to remove it. Then follow the same instruction to install cuDNN 7.0. Sorry for all the trouble it may cause.
Settings:
- Zone: us-east1-d
- Machine type: click customize
- 8 vCPU
- 30 GB Memory
- CPU platform: Intel Haswell or later
- GPU: 1 NVIDIA Tesla K80
- Boot disk:
- OS images: Ubuntu 16.04 LTS
- Boot disk type: Standard persistent disk
- Size: 200 GB
- Link your billing account to the credit you received (If you use the $300 free trial credit in your account, you will not be able to use GPU)
- In the notifications on top right of your browser, click request increase
- In the Quotas, find NVIDIA K80 GPUs for us-east1. Select it and click EDIT QUOTAS on top.
- Enter your information.
- Enter 1 in the limit
- In the description say you will use the gpu for Columbia CS 4995 Deep learning for Computer Vision Course Project
- The quota will be approved almost instantaneously
- Web terminal
- SSH generate a public private key pair. Use your uni for USERNAME. USERNAME will be used later.
ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
cat ~/.ssh/[KEY_FILENAME].pub
Save it to Metadata
ssh -i ~/.ssh/my-ssh-key [USERNAME]@[EXTERNAL_IP_ADDRESS]
- Cyberduck/WinSCP/Putty
Either with or without GPU is ok. But it is usually trained faster on instances with GPU. Choose as you like.
The following script helps you install all the dependencies for keras.
We highly recommend that you run the following code, manually, line by line
to avoid any problem.
You need to run the code in the given order to avoid dependency issues.
* Linux Essentials
* Nvidia Driver
* CUDA
* cuDNN
* Tensorflow
* Keras
# essential
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential cmake g++ gfortran
sudo apt-get install git pkg-config python-dev
sudo apt-get install software-properties-common wget
sudo apt-get autoremove
sudo rm -rf /var/lib/apt/lists/*
# install driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-375
# reboot your machine
Sudo reboot
# check driver is installed correctly
nvidia-smi
# install cuda
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
sudo chmod +x cuda_9.0.176_384.81_linux-run
sudo sh cuda_9.0.176_384.81_linux-run
'''
Executing cuda installation
1. scoll to the bottom of the license agreement using d
2. Do you accept the previously read EULA? : accept
3. Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?: NO (we've installed manually)
4. Install the CUDA 9.0 Toolkit? : yes
5. Enter Toolkit Location: use default(press enter)
6. Do you want to install a symbolic link at /usr/local/cuda?: yes
7. Install the CUDA 9.0 Samples?: no
'''
# change environment for cuda
echo 'export PATH=/usr/local/cuda-9.0/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
#verify your cuda is installed correctly
nvcc -V
# install cuDNN
# You need to register online and download the file to local machine then upload to the cloud
# https://developer.nvidia.com/cudnn
# download the cuDNN 7.0 for CUDA 9.0 and for Ubuntu 16.04.
sudo apt install ./(your cuDNN file)
# install python stuff
sudo apt-get update
sudo apt-get install git python-dev python3-dev python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev
sudo apt-get install -y libfreetype6-dev libpng12-dev
python3 -m pip install -U pip
pip3 install -U matplotlib ipython[all] jupyter pandas scikit-image
# install tensorflow
pip3 install --upgrade tensorflow-gpu
# keras
pip3 install keras
# pytorch
pip3 install http://download.pytorch.org/whl/cu90/torch-0.3.1-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision
# helpful tools
# tmux: keep session in the background. Keep the session running even the ssh disconnects.
# nohup: similar to tmux. Keep things running. Log the output to nohup.log
wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh
# Select yes to all options during the setup process
# install pytorch
pip3 install http://download.pytorch.org/whl/cpu/torch-0.3.1-cp35-cp35m-linux_x86_64.whl
pip3 install torchvision
- Go to google gloud console, go to 'VPC Network' panel, select 'Fire wall rules'. Add a rule as follows.
tensorboard
Network
default
Priority
1000
Direction
Ingress
Action on match
Allow
Source filters
IP ranges
0.0.0.0/0
Protocols and ports
tcp:6006;udp:6006
-
Then click your own instances, go to 'Edit', check box with 'Enable connecting to serial ports'
-
Run the following code in ssh
# Get the code from tensorflow
git clone https://github.com/tensorflow/tensorflow.git
# Mnist is chosen as demo
cd tensorflow/tensorflow/examples/tutorials/mnist
# run it in the background, output is stored in train.log
nohup python3 mnist_with_summaries.py --max_steps=1000000 > train.log
# close and open another terminal
# run the tensorboard
nohup tensorboard --logdir=/tmp/tensorflow/mnist
-
Close the terminal
-
Open your browser, go to '[external id]: 6006', you should see the tensorflow. ([extenal id] could be found on the VM instance page)
Reference: https://bicepjai.github.io/machine-learning/2016/08/22/tensorboard-on-gcloud.html
A couple of students informed us that after going through the tutorial shown below, they have issue with tensorflow-gpu. After inspection, the problem is due to the outdated Anaconda installed in the tutorial. When installing the Anaconda, make sure you install the newest Anaconda listed on the official website.
See this link. https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52