1. Home
  2. Help Articles
  3. E2E Networks GPU Instances

E2E Networks GPU Instances

Brief overview to get you up to speed with components used in the G series images

1. What is needed to get Tensorflow working for Machine Learning?

Nvidia drivers

Drivers needed to run the GPU. The proprietary drivers are needed from Nvidia.

Link: http://www.nvidia.com/Download/index.aspx

Tensorflow

Open source machine learning framework. We will be using it’s CUDA backend to process data.

Link: https://www.tensorflow.org/

CUDA

CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. This is needed  to use Tensorflow’s GPU backend.

Link: https://developer.nvidia.com/cuda-zone

CuDNN

Library for DNN(Deep Neural Network) from Nvidia.

Link: https://developer.nvidia.com/cudnn

NOTE: Tensorflow is highly temperamental when it comes to versions of components needed. Refer here for needed versions.

2. Software pre-installed in the E2E Networks G series image and how to check them?

lspci – device identification

# lspci -vvv | grep -i nvidia | grep -A3 VGA

01:01.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])

Subsystem: NVIDIA Corporation Device 11bf

Kernel driver in use: nvidia

Kernel modules: nouveau, nvidia_drm, nvidia

nvidia-smi – GPU stats

# nvidia-smi 

Fri Apr 13 11:43:23 2018       

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 390.30                 Driver Version: 390.30      |

|-------------------------------+----------------------+----------------------+

| GPU  Name     Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0 GeForce GTX 105...  Off | 00000000:01:01.0 Off |                  N/A |

| 20%   30C P0    N/A / 75W |     0MiB / 4040MiB |     0% Default |

+-------------------------------+----------------------+----------------------+                                                                               

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID Type   Process name                            Usage |

|=============================================================================|

|  No running processes found                                                 |

+-----------------------------------------------------------------------------+

Tensorflow

# python -c ‘import tensorflow as tf; print(tf.__version__)’                                                                    

1.7.0

CUDA path

/usr/local/cuda-9.0/

CUDA version – 9.0.176

# cat /usr/local/cuda-9.0/version.txt

9.0.176

CuDNN version – Version 7.1.1 for CUDA 9.0

# cat /usr/local/cuda-9.0/include/cudnn.h | grep CUDNN>

#define CUDNN_MAJOR 7

#define CUDNN_MINOR 1

#define CUDNN_PATCHLEVEL 1

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include “driver_types.h”

3. Diagnostic info about Tensorflow

  • [root@localhost ~]# python
  • Python 2.7.5 (default, Aug  4 2017, 00:39:18)
  • [GCC 4.8.5 20150623 (Red Hat 4.8.516)] on linux2
  • Type “help”, “copyright”, “credits” or “license” for more information.
  • >>> import tensorflow as tf
  • >>> sess = tf.Session()
  • 20180412 06:39:25.131127: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
  • 20180412 06:39:25.719195: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
  • 20180412 06:39:25.720202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
  • name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
  • pciBusID: 0000:01:01.0
  • totalMemory: 3.95GiB freeMemory: 3.89GiB
  • 20180412 06:39:25.720221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
  • 20180412 06:39:25.910375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3633 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:01.0, compute capability: 6.1)

Check out our GPU Computing – Series G plans to get started!

Updated on May 5, 2018

Was this article helpful?

Related Articles

Comments

  1. Guys, any plan on supporting Windows Server in GPU instances? Would love to use your GPU service.

    1. Thanks for the kind words…We are working on the windows images for GPU instances on cloud. In the meanwhile, we can provision a dedicated server with a GPU card with windows server 2016. Our sales team would reach out to you first thing in the morning to assist.

Add A Comment