How to use 2 NVIDIA GPUs to speed Keras/ Tensorflow deep learning training

April 2, 2025

Table of Contents

Introduction to Keras and Tensorflow

KERAS is an open-source neural network library written in Python. This module is built to be modular, fast, and convenient for developers. Keras cannot handle low-level computation.

Keras High-Level API takes care of how we make models, define layers, or set up various input-output models.

Tensorflow, on the other hand, is the most popular deep learning tool developed by Google’s Brain team. It is a Python-friendly open-source library making machine learning faster and easier.

Benefits of Keras

  • User-friendly and fast deployment

Keras is a user-friendly library and makes it easier for developers to create neural network models. It is suitable for implementing various deep learning algorithms and natural language processing.

  • Pre-trained models

Keras comes with various pre-trained models, apart from the pre-trained weights. Users can use these models for fine-tuning, feature extraction, and prediction. While instantiating models, Keras allows weights to download automatically.

  • Composable and modular

While building a Keras model, you need to connect different building blocks. By doing so, work gets uncomplicated and more composable and modular. You can now work with better efficiency and fewer restrictions.

  • Multiple GPU support

Keras lets you train your neural network model on a single GPU or multiple GPUs. It offers built-in support for data parallelism, allowing you to process a massive amount of data in a shorter span of time.

Wondering how to use multiple GPUs to train your model? Read on to find out.

How to use 2 NVIDIA GPUs to speed Keras/ Tensorflow deep learning training?

Deep learning models take too long when we try to train them in a single GPU. That’s why we need multiple GPUs to train them for faster training. 

A researcher reduced the training time of ImageNet dataset from 2 weeks to just 18 minutes by working on multiple GPUs. He also used 100s of GPUs to train the Transformer-XL dataset - reducing the training time from 4 years to 2 weeks. 

We all want to increase our training iteration speeds and reduce training times. To do that, we have to scale up our training to multiple GPUs. Read on to find out how you can use 2 NVIDIA GPUs to speed up Keras/Tensorflow deep learning training.

Let’s first list the available GPUs on the system.

Multiple-GPU support

The following technique lets you train with between 1 and 8 GPUs on a single host. You can also train on a larger number of GPUs on multiple hosts. To do so, you need to adopt a slightly different approach than the one mentioned here.

from tensorflow. Python.client import device_lib

devices = device_lib.list_local_devices()

def sizeof_fmt(num, suffix='B'):

for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:

     if abs(num) < 1024.0:

         return "%3.1f %s%s" % (num, unit, suffix)

     num /= 1024.0

return "%.1f%s%s" % (num, 'Yi', suffix)

for d in devices:

t = d.device_type

name = d.physical_device_desc

l = [item.split(':',1) for item in name.split(", ")]

name_attr = dict([x for x in l if len(x)==2])

dev = name_attr.get('name', 'Unnamed device')

print(f" {d.name} || {dev} || {t} || {sizeof_fmt(d.memory_limit)}")

Obtaining the Dataset

Now you can obtain your dataset (for example, cats vs. dogs dataset) and choose the GPUs (2 in our case):

import tensorflow as tf

import tensorflow_datasets as tfds

BATCH_SIZE = 32

GPUS = ["GPU:0", "GPU:1"]

def process(image, label):

image = tf.image.resize(image, [299, 299]) / 255.0

return image, label

strategy = tf.distribute.MirroredStrategy( GPUS )

print('Number of devices: %d' % strategy.num_replicas_in_sync)

batch_size = BATCH_SIZE * strategy.num_replicas_in_sync

dataset = tfds.load("cats_vs_dogs", split=tfds.Split.TRAIN, as_supervised=True)

dataset = dataset.map(process).shuffle(500).batch(batch_size)

Set up distributed training 

Training with multiple GPUs is quite similar to training with a single GPU. All you need to focus on is wrapping the model creation and compilation with a mirror strategy.

Here is how you can do it:

import tensorflow as tf

import tensorflow_datasets as tfds

import time

# Nicely formatted time string

def hms_string(sec_elapsed):

h = int(sec_elapsed / (60 * 60))

m = int((sec_elapsed % (60 * 60)) / 60)

s = sec_elapsed % 60

return "{}:{:>02}:{:>05.2f}".format(h, m, s)

EPOCHS = 5

LR = 0.001

tf.get_logger().setLevel('ERROR')

start = time.time()

with strategy.scope():

model = tf.keras.applications.InceptionResNetV2(weights=None, classes=2)

    model.compile(

     optimizer=tf.keras.optimizers.Adam(learning_rate=LR),

     loss=tf.keras.losses.sparse_categorical_crossentropy,

     metrics=[tf.keras.metrics.sparse_categorical_accuracy]

)

model.fit(dataset, epochs=EPOCHS)

elapsed = time.time()-start

print (f'Training time: {hms_string(elapsed)}')

Wrapping up

Keras has made it simple to use more than one GPU to train your neural network model faster. Not all models require multiple GPUs. Generally, the models with larger batch sizes and more complex neural networks create the need for training on multiple GPUs.

This article showed how you could train your deep learning model on 2 NVIDIA GPUs and speed up the training process.

I hope you found it useful. 

Kindly signup here for Free trial - https://bit.ly/344Ai4a

Latest Blogs

A vector illustration of a tech city using latest cloud technologies & infrastructure