TensorFlow Engineering with CUDA GPU for Deep Learning

2017-03-13. Category & Tags: Deep Learning, TensorFlow, GPU, CUDA, TF

Install #

Summary: install CUDA first, then TF.
ref TF 1.0 doc
ref nvidia doc, until step 3

requirements #

64-bit Linux
Python 2.7 or 3.3+ (3.5 used in this blog)
NVIDIA CUDA 7.5 (8.0 if Pascal GPU)
NVIDIA cuDNN >v4.0 (v5.1 recommended)
NVIDIA GPU with compute capability >3.0

steps #

1.Manually download “cuDNN v6.0 Library for Linux”.
2.Bash auto Install CUDA in Ubuntu 16.04.2, can combine (&& \) with code below.
3.Install cuDNN, PIP:

sudo apt-get install -y curl git tofrodos dos2unix libcupti-dev
 && \
sudo tar -xvf cudnn-8.0-linux-x64-v5.1.tgz -C /usr/local && \
sudo apt-get install -y python-pip python3-pip python-dev python3-dev && \
pip install --upgrade pip && \
pip3 install --upgrade pip

~~Nvidia also asked for java on its old web (plz ignore).~~

4.Install TF
Method (1): install by pip, py virtual-env (recommended)
OBS: if you don’t want to use pip to install binary version, see: Determine how to install

sudo apt-get install -y python-pip python-dev python-virtualenv && \
virtualenv --system-site-packages ~/tensorflowEnv && \
. ~/tensorflowEnv/bin/activate

sudo pip3 install --upgrade tensorflow-gpu

Method (2): compiling TF from source using Bazel (OBS: dated cmd from Nvidia)

echo "deb http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list && \
curl https://storage.googleapis.com/bazel-apt/doc/apt-key.pub.gpg | sudo apt-key add - && \
sudo apt-get update && \
sudo apt-get install -y bazel && \
git clone https://github.com/tensorflow/tensorflow && \
cd tensorflow  && \
git reset --hard 70de76e && \
dos2unix configure && \
./configure && \
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package && \
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

OBS: method 2 gives a lot of problems due to the files are using win/dos EOL.
Method (3): docker. See TF doc.

5.Validation of installation

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH && \
export CUDA_HOME=/usr/local/cuda-8.0 && \
export PATH=/usr/local/cuda-8.0/bin:$PATH && \
. ~/tensorflowEnv/bin/activate && \
python3

( OBS: “/usr/local/cuda” is different from “/usr/local/cuda-8.0” )

import tensorflow as tf
print(tf.__version__)
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

Usage #

tensor-board
image classifier using inception
//TODO more

TF Serving SavedModel() #

install #

ref

cd
wget https://github.com/bazelbuild/bazel/releases/download/0.5.4/bazel-0.5.4-installer-linux-x86_64.sh
chmod +x bazel-0.5.4-installer-linux-x86_64.sh
./bazel-0.5.4-installer-linux-x86_64.sh --user
export PATH="$PATH:$HOME/bin" # optional, usually already exists
sudo pip install grpcio || pip install --user grpcio
sudo apt-get update && sudo apt-get install -y \
        build-essential \
        curl \
        libcurl3-dev \
        git \
        libfreetype6-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        python-dev \
        python-numpy \
        python-pip \
        software-properties-common \
        swig \
        zip \
        zlib1g-dev
pip install tensorflow-serving-api
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install tensorflow-model-server

q&a during installation:
Q: Bazel “ERROR The ‘build’ command is only supported from within a workspace.”
A: Solution: touch WORKSPACE ref

training & saving #

Use WineQualityClassificationSavedModel.py or its .ipynb to train & save the model in SavedModel() format. (Info: the model contains two Keras Dense Layers, can be seen in WineQualityClassification.py.)

serving #

serve via simple_tensorflow_serving #

(Tested simple_tensorflow_serving v0.5.0 with TF v1.10.0 & cuda v9.0. See this gist for details.)
Recommended. It uses JSON request and JSON response. [ref]

pip2 install simple_tensorflow_serving
simple_tensorflow_serving --model_base_path=/path_to_model_such_as/fdp-tensorflow-python-examples/savedmodels/WineQuality/

Note:

use pip2 to install and make sure seeing py2 message when starting the serving. Running simple_tensorflow_serving in py3 will generate a series of errors, such as “TypeError: the JSON object must be str, not ‘bytes’”. (The response is either HTTP500 or 200 with error info.)
--model_name=WineQuality is optional, model_name="default" by default.

When the server is running in terminal, check localhost:8500 to see its status in GUI. Test JSON is generated by: curl http://localhost:8500/v1/models/default/gen_json as well as test clients. For Insomnia, POST the JSON to http://localhost:8500

q&a for simple_tensorflow_serving:

Q/ERR: tf has no attribute ‘Session’.
A: (seem to be the version imcompatibility of py tf etc.) Solution: pip3 install --upgrade --force-reinstall tensorflow-gpu

Q/ERR: ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
A: locate libcublas.so gives Cuda 9.1 which is too new for TF. Compact code solution (w/ root/sudo) from ref. A reboot is needed.

serve via tensorflow_model_server #

NOT recommended (failed tests), plz use simple_tensorflow_serving.
The gRPC based method is not considered here.
TF starts to support RESTful API along with gRPC since v1.8 with param --rest_api_port.

tensorflow_model_server --port=9000 --rest_api_port=8500 --model_name=WineQuality --model_base_path=/path_to_model_such_as/fdp-tensorflow-python-examples/savedmodels/WineQuality/

ref

consuming the api
General: POST JSON to http://localhost:9000/v1/models/<model_name>:<classify|regress|predict>.
E.g. POST {"instances":[[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0],[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]]} to http://localhost:8500/v1/models/WineQuality:classify.

Note:

Tests failed due to “JSON Value not formatted correctly”.
‘v1’ in the url seems to be the TF’s version (definitely not the models).