How to Implement Convolutional Variational Autoencoder in PyTorch with CUDA?

January 17, 2023

Neural networks are remarkably efficient tools to solve a number of really difficult problems. The first application of neural networks usually solves classification problems. How do we encode and decode the image for both operations? We can simply use a feed forward neural network or when we deal with images, we often use a convolutional neural network which usually performs better. 

Autoencoders are becoming increasingly popular in AI and machine learning due to their ability to learn complex representations of data. They are a type of neural network that can learn to compress and decompress data. The autoencoder first encodes the data into a lower dimensional representation, then reconstructs it back to its original form. They can be used for a variety of tasks, such as denoising, anomaly detection, feature extraction & are able to learn features from unlabeled data, becoming popular for unsupervised learning tasks. 

They have also been used for generative models, such as image generation, and for transfer learning. They are an effective way to compress data for storage or for data processing tasks such as feature extraction and dimensionality reduction. Their ability to learn complex nonlinear data distributions makes them particularly useful for tasks such as denoising, anomaly detection, and generative modeling. Additionally, they are relatively easy to implement and can be trained using standard backpropagation techniques.

What are Autoencoders?

Autoencoders are a type of neural network which generates an “n-layer” coding of the given input and attempts to reconstruct the input using the code generated. This Neural Network architecture is divided into the encoder structure, the decoder structure, and the latent space, also known as “bottleneck”.To learn the data representations of the input, the network is trained using Unsupervised data. These compressed, data representations go through a decoding process wherein the input is reconstructed. An autoencoder is a regression task that models an identity function.

Convolutional Variational Autoencoder

A convolutional variational autoencoder (CVAE) is a type of deep generative model that combines the capabilities of a variational autoencoder (VAE) and a convolutional neural network (CNN). The CVAE is a generative model that learns the latent space representation of data by encoding it into a lower-dimensional state space and decoding it back into the original data space. The CVAE can be used for image generation, image reconstruction, and anomaly detection. The main advantage of the CVAE is that it is able to capture the spatial relationships between pixels in an image, which is not possible with a standard VAE. This allows for accurate image reconstruction and image generation.

When we regularize an autoencoder so that its latent representation is not overfitted to a single data point but the entire data distribution, we can perform random sampling from the latent space and hence generate unseen images from the distribution, making our autoencoder ‘variational’. For generating this, we incorporate the idea of KL divergence for our loss function design.

Sample Example

The first application of neural networks usually revolved around classification problems. Classification means that we have an image as an input and the output is let’s say: a simple decision, whether it depicts a cat or a dog. The input will have as many nodes as there are pixels in the input image, and the output will have 2 units and we look at one of these two that fires the most to decide whether it thinks it is a cat or dog.  

Between these two, there are hidden layers where the neural network is asked to build an inner representation of the problem that is efficient at recognizing animals. An autoencoder here is an interesting variant with two important changes; 

  • First, the number of neurons is the same in the input and the output therefore we can expect the output is an image that is not only the same size as the input, but actually is the same image. Now, this normally wouldn’t make any sense, why would we want to invent a neural network to do the job of a copying machine?
  • Second, we have a bottleneck in one of these layers, this means that the number of neurons in that layer is much less than we would normally see, therefore it has to find a way to represent this kind of data with a much smaller number of neurons. If you have a smaller budget, you have to let go of all the fluff and concentrate on the bare essentials, therefore we can’t expect the image to be the same but they are hopefully quite close. These autoencoders are capable of creating sparse representations of the input data and can therefore be used for image compression. Autoencoders offer no tangible advantage over classical image compression algorithms like JPEG. However, as a crumb of comfort, many different variants exist that are useful for different tasks other than compression. 

What is a convolutional encoder?

A convolutional encoder is a type of encoder used for encoding data for transmission or storage. It combines the input data with a set of predetermined values, called the convolutional code, to produce a series of output symbols. Convolutional encoders are widely used in digital communication systems and data storage systems, as they offer good performance with low complexity.


A sample problem: 

Suppose from a large collection of landscapes, can a network learn to generate new landscape pictures?: You have a collection of pictures maybe landscapes of this kind and you want to train a neural network model that can learn from all of these landscape pictures and then not classify not make predictions about them but to actually paint a new landscape of it’s own so you want the neural network to generate new samples of the kind of data that you have actually trained it from so again we have seen neural networks can classify they can perform regression but now what we really want to see is where the neural networks can be trained as generative models where they can generate newer data of the kind that they have seen where the data can be arbitrarily complex like paintings of landscapes but before we even get there the important question is what is a generative model? What are the various things we need to know about generative models? 

Let’s derive generative models:

  • A model for the probability distribution of a data x

Ex: Multinomial, Gaussian etc.

  • Computational equivalent: a model that can be used to “generate” data with a distribution similar to the given data “x”
  1. Typical setting: a box that takes in random seeds and outputs random samples like “x” 
  2. How do we generate random seeds?

Topic: Neural Network Generator’s Model

AIM: Generate samples from the distribution of “landscape” images

Learning a generative model for data:

  • Let’s say you are given a given set of some observed data X= (x)
  • You choose a model P (x,θ) for the distribution of x
  • Θ are the parameters of the model
  • Estimate the θ such that P (x;θ) best fits the observations X=(x)
  • Hoping it will also represent data outside the training set

Ex: Multinomials

Our full Multinomial VAE model is given as follows: 

p(x|η) = Mult(φ(Ψη))  

p(η|z; θdec) = N (W z + µ, σ2 Id−1) 

q(z|x; θenc) = N (FL ΨT (log( f x) − µ)  , D)*

where θdec = {W, σ2} denotes the decoder parameters, θenc = {FL, D} denotes the encoder parameters and µ ∈ R d−1 is a bias parameter. 

Here, q(z|x; θenc) denotes the variational posterior distribution of z given by the encoder represented as an L-layer dense neural network with appropriate activations. This encoder is directly used to evaluate p(η|z; θdec). Furthermore, flat priors are assumed for all variables except z. It is important to note potentially challenging modeling issues when designing the encoder. The ILR transform is not directly applicable to count data, since log(0) is undefined. A common approach to this problem is to introduce a pseudocount before applying a logarithm, which we will denote as log( f x) = log(x + 1). The choice of pseudocount is arbitrary and can introduce biases. To alleviate this issue, we introduce the deep encoder neural network highlighted in above equation*; we expect that the universal approximation theorem would apply here and that the accuracy of estimating the latent representation z will improve with more complex neural networks. This is supported in our simulation benchmarks; more complex encoder architectures can better remove biases induced from the pseudocounts.

Implementing Autoencoders 

1. Prepare the data: Before implementing an autoencoder, it is important to prepare the data for the model. This includes normalizing the data and splitting the data into a training and test set. 

2. Choose an architecture: Autoencoders come in many different architectures, such as convolutional autoencoders, variational autoencoders, and denoising autoencoders. Choose the architecture that best suits your data and problem. 

3. Choose a loss function: Loss functions are used to measure the difference between the predictions of the autoencoder and the true labels. The most commonly used loss functions are mean squared error and binary cross-entropy. 

4. Choose an optimization algorithm: The optimization algorithm is used to update the weights of the model during training. Popular optimization algorithms include stochastic gradient descent and Adam. 

5. Train the model: Train the model using the training data. Monitor the loss to ensure the model is learning properly. 

6. Evaluate the model: Once the model is trained, evaluate it on the test data to see how well it performs. 

In Autoencoder's case, we deal with an input image and we want to encode it to get a low dimensional embedding of the image and after that we want to decode it again and reconstruct the original image as good as possible which is the whole mechanism behind it. So, we deal with an original image, get an encoded image and reconstruct the original image again. 

One of the most used applications is video compression where we want to send the images over the network from one end to another so instead of sending the whole image we could simply send the encoded data only and on the other side we then have a decoder stored & can decode the image again. This would save a lot of cost and could be much faster. 

Types of Autoencoders:

  • Standard Autoencoders: They produce data/images from the latent vector. But they try to replicate or copy the image data while doing so.
  • Variational Autoencoders: They are good at generating new images from the latent vector. Although they generate new data/images, still, those are very similar to the data they are trained on.

How to Implement Autoencoders in PyTorch? 

1. Begin by importing the necessary libraries and modules such as PyTorch, NumPy, Matplotlib, and Scikit-Learn. 

2. Next, define a class for the Autoencoder model. This class should contain the necessary layers and functions for building the model. 

3. Define the forward pass of the model, which should include the encoder, decoder, and loss functions. 

4. After defining the model, create an object of the class and initialize the parameters of the model such as the learning rate and the optimizer. 

5. Next, define the training loop which should include iterating through the dataset, calculating the loss, updating the parameters, and saving the model. 

6. Finally, evaluate the model on the test set and visualize the results.

Follow the steps here:

How to Implement Convolutional Autoencoder in PyTorch with CUDA? 

1. Install all necessary packages for the implementation of Convolutional Autoencoder in PyTorch with CUDA. This includes PyTorch, NumPy, and CUDA Toolkit. 

2. Create a new project directory and change the current working directory to the new project directory.

3. Create the necessary Python files for our Convolutional Autoencoder. This will include a model file, a data loader file, a training script and a main file. 

4. Write the necessary code for the Convolutional Autoencoder model. This should include the definition of the encoder, decoder and autoencoder architectures, as well as the forward pass of the model. 

5. Create the data loader class that will be responsible for loading and preprocessing the data. 

6. Write the training script that will be responsible for training the autoencoder. This should include instantiating the model, data loader, optimizer, and loss function, as well as the training loop. 

7. Create the main file to initialize the training process. 

8. Initialize the CUDA device to utilize the GPU. 

9. Run the main file to start. 


In this article, we discussed autoencoders in the context of using it on PyTorch. We went on to take a look at what exactly a convolutional autoencoder does, and how it does it with a view at developing an intuition of its working principle. Thereafter, we touched on its different sections to share a glimpse to define a custom autoencoder of your own, training it and discussing the results of the model training.

Latest Blogs
This is a decorative image for: A Complete Guide To Customer Acquisition For Startups
October 18, 2022

A Complete Guide To Customer Acquisition For Startups

Any business is enlivened by its customers. Therefore, a strategy to constantly bring in new clients is an ongoing requirement. In this regard, having a proper customer acquisition strategy can be of great importance.

So, if you are just starting your business, or planning to expand it, read on to learn more about this concept.

The problem with customer acquisition

As an organization, when working in a diverse and competitive market like India, you need to have a well-defined customer acquisition strategy to attain success. However, this is where most startups struggle. Now, you may have a great product or service, but if you are not in the right place targeting the right demographic, you are not likely to get the results you want.

To resolve this, typically, companies invest, but if that is not channelized properly, it will be futile.

So, the best way out of this dilemma is to have a clear customer acquisition strategy in place.

How can you create the ideal customer acquisition strategy for your business?

  • Define what your goals are

You need to define your goals so that you can meet the revenue expectations you have for the current fiscal year. You need to find a value for the metrics –

  • MRR – Monthly recurring revenue, which tells you all the income that can be generated from all your income channels.
  • CLV – Customer lifetime value tells you how much a customer is willing to spend on your business during your mutual relationship duration.  
  • CAC – Customer acquisition costs, which tells how much your organization needs to spend to acquire customers constantly.
  • Churn rate – It tells you the rate at which customers stop doing business.

All these metrics tell you how well you will be able to grow your business and revenue.

  • Identify your ideal customers

You need to understand who your current customers are and who your target customers are. Once you are aware of your customer base, you can focus your energies in that direction and get the maximum sale of your products or services. You can also understand what your customers require through various analytics and markers and address them to leverage your products/services towards them.

  • Choose your channels for customer acquisition

How will you acquire customers who will eventually tell at what scale and at what rate you need to expand your business? You could market and sell your products on social media channels like Instagram, Facebook and YouTube, or invest in paid marketing like Google Ads. You need to develop a unique strategy for each of these channels. 

  • Communicate with your customers

If you know exactly what your customers have in mind, then you will be able to develop your customer strategy with a clear perspective in mind. You can do it through surveys or customer opinion forms, email contact forms, blog posts and social media posts. After that, you just need to measure the analytics, clearly understand the insights, and improve your strategy accordingly.

Combining these strategies with your long-term business plan will bring results. However, there will be challenges on the way, where you need to adapt as per the requirements to make the most of it. At the same time, introducing new technologies like AI and ML can also solve such issues easily. To learn more about the use of AI and ML and how they are transforming businesses, keep referring to the blog section of E2E Networks.

Reference Links

This is a decorative image for: Constructing 3D objects through Deep Learning
October 18, 2022

Image-based 3D Object Reconstruction State-of-the-Art and trends in the Deep Learning Era

3D reconstruction is one of the most complex issues of deep learning systems. There have been multiple types of research in this field, and almost everything has been tried on it — computer vision, computer graphics and machine learning, but to no avail. However, that has resulted in CNN or convolutional neural networks foraying into this field, which has yielded some success.

The Main Objective of the 3D Object Reconstruction

Developing this deep learning technology aims to infer the shape of 3D objects from 2D images. So, to conduct the experiment, you need the following:

  • Highly calibrated cameras that take a photograph of the image from various angles.
  • Large training datasets can predict the geometry of the object whose 3D image reconstruction needs to be done. These datasets can be collected from a database of images, or they can be collected and sampled from a video.

By using the apparatus and datasets, you will be able to proceed with the 3D reconstruction from 2D datasets.

State-of-the-art Technology Used by the Datasets for the Reconstruction of 3D Objects

The technology used for this purpose needs to stick to the following parameters:

  • Input

Training with the help of one or multiple RGB images, where the segmentation of the 3D ground truth needs to be done. It could be one image, multiple images or even a video stream.

The testing will also be done on the same parameters, which will also help to create a uniform, cluttered background, or both.

  • Output

The volumetric output will be done in both high and low resolution, and the surface output will be generated through parameterisation, template deformation and point cloud. Moreover, the direct and intermediate outputs will be calculated this way.

  • Network architecture used

The architecture used in training is 3D-VAE-GAN, which has an encoder and a decoder, with TL-Net and conditional GAN. At the same time, the testing architecture is 3D-VAE, which has an encoder and a decoder.

  • Training used

The degree of supervision used in 2D vs 3D supervision, weak supervision along with loss functions have to be included in this system. The training procedure is adversarial training with joint 2D and 3D embeddings. Also, the network architecture is extremely important for the speed and processing quality of the output images.

  • Practical applications and use cases

Volumetric representations and surface representations can do the reconstruction. Powerful computer systems need to be used for reconstruction.

Given below are some of the places where 3D Object Reconstruction Deep Learning Systems are used:

  • 3D reconstruction technology can be used in the Police Department for drawing the faces of criminals whose images have been procured from a crime site where their faces are not completely revealed.
  • It can be used for re-modelling ruins at ancient architectural sites. The rubble or the debris stubs of structures can be used to recreate the entire building structure and get an idea of how it looked in the past.
  • They can be used in plastic surgery where the organs, face, limbs or any other portion of the body has been damaged and needs to be rebuilt.
  • It can be used in airport security, where concealed shapes can be used for guessing whether a person is armed or is carrying explosives or not.
  • It can also help in completing DNA sequences.

So, if you are planning to implement this technology, then you can rent the required infrastructure from E2E Networks and avoid investing in it. And if you plan to learn more about such topics, then keep a tab on the blog section of the website

Reference Links

This is a decorative image for: Comprehensive Guide to Deep Q-Learning for Data Science Enthusiasts
October 18, 2022

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

For all data science enthusiasts who would love to dig deep, we have composed a write-up about Q-Learning specifically for you all. Deep Q-Learning and Reinforcement learning (RL) are extremely popular these days. These two data science methodologies use Python libraries like TensorFlow 2 and openAI’s Gym environment.

So, read on to know more.

What is Deep Q-Learning?

Deep Q-Learning utilizes the principles of Q-learning, but instead of using the Q-table, it uses the neural network. The algorithm of deep Q-Learning uses the states as input and the optimal Q-value of every action possible as the output. The agent gathers and stores all the previous experiences in the memory of the trained tuple in the following order:

State> Next state> Action> Reward

The neural network training stability increases using a random batch of previous data by using the experience replay. Experience replay also means the previous experiences stocking, and the target network uses it for training and calculation of the Q-network and the predicted Q-Value. This neural network uses openAI Gym, which is provided by taxi-v3 environments.

Now, any understanding of Deep Q-Learning   is incomplete without talking about Reinforcement Learning.

What is Reinforcement Learning?

Reinforcement is a subsection of ML. This part of ML is related to the action in which an environmental agent participates in a reward-based system and uses Reinforcement Learning to maximize the rewards. Reinforcement Learning is a different technique from unsupervised learning or supervised learning because it does not require a supervised input/output pair. The number of corrections is also less, so it is a highly efficient technique.

Now, the understanding of reinforcement learning is incomplete without knowing about Markov Decision Process (MDP). MDP is involved with each state that has been presented in the results of the environment, derived from the state previously there. The information which composes both states is gathered and transferred to the decision process. The task of the chosen agent is to maximize the awards. The MDP optimizes the actions and helps construct the optimal policy.

For developing the MDP, you need to follow the Q-Learning Algorithm, which is an extremely important part of data science and machine learning.

What is Q-Learning Algorithm?

The process of Q-Learning is important for understanding the data from scratch. It involves defining the parameters, choosing the actions from the current state and also choosing the actions from the previous state and then developing a Q-table for maximizing the results or output rewards.

The 4 steps that are involved in Q-Learning:

  1. Initializing parameters – The RL (reinforcement learning) model learns the set of actions that the agent requires in the state, environment and time.
  2. Identifying current state – The model stores the prior records for optimal action definition for maximizing the results. For acting in the present state, the state needs to be identified and perform an action combination for it.
  3. Choosing the optimal action set and gaining the relevant experience – A Q-table is generated from the data with a set of specific states and actions, and the weight of this data is calculated for updating the Q-Table to the following step.
  4. Updating Q-table rewards and next state determination – After the relevant experience is gained and agents start getting environmental records. The reward amplitude helps to present the subsequent step.  

In case the Q-table size is huge, then the generation of the model is a time-consuming process. This situation requires Deep Q-learning.

Hopefully, this write-up has provided an outline of Deep Q-Learning and its related concepts. If you wish to learn more about such topics, then keep a tab on the blog section of the E2E Networks website.

Reference Links

This is a decorative image for: GAUDI: A Neural Architect for Immersive 3D Scene Generation
October 13, 2022

GAUDI: A Neural Architect for Immersive 3D Scene Generation

The evolution of artificial intelligence in the past decade has been staggering, and now the focus is shifting towards AI and ML systems to understand and generate 3D spaces. As a result, there has been extensive research on manipulating 3D generative models. In this regard, Apple’s AI and ML scientists have developed GAUDI, a method specifically for this job.

An introduction to GAUDI

The GAUDI 3D immersive technique founders named it after the famous architect Antoni Gaudi. This AI model takes the help of a camera pose decoder, which enables it to guess the possible camera angles of a scene. Hence, the decoder then makes it possible to predict the 3D canvas from almost every angle.

What does GAUDI do?

GAUDI can perform multiple functions –

  • The extensions of these generative models have a tremendous effect on ML and computer vision. Pragmatically, such models are highly useful. They are applied in model-based reinforcement learning and planning world models, SLAM is s, or 3D content creation.
  • Generative modelling for 3D objects has been used for generating scenes using graf, pigan, and gsn, which incorporate a GAN (Generative Adversarial Network). The generator codes radiance fields exclusively. Using the 3D space in the scene along with the camera pose generates the 3D image from that point. This point has a density scalar and RGB value for that specific point in 3D space. This can be done from a 2D camera view. It does this by imposing 3D datasets on those 2D shots. It isolates various objects and scenes and combines them to render a new scene altogether.
  • GAUDI also removes GANs pathologies like mode collapse and improved GAN.
  • GAUDI also uses this to train data on a canonical coordinate system. You can compare it by looking at the trajectory of the scenes.

How is GAUDI applied to the content?

The steps of application for GAUDI have been given below:

  • Each trajectory is created, which consists of a sequence of posed images (These images are from a 3D scene) encoded into a latent representation. This representation which has a radiance field or what we refer to as the 3D scene and the camera path is created in a disentangled way. The results are interpreted as free parameters. The problem is optimized by and formulation of a reconstruction objective.
  • This simple training process is then scaled to trajectories, thousands of them creating a large number of views. The model samples the radiance fields totally from the previous distribution that the model has learned.
  • The scenes are thus synthesized by interpolation within the hidden space.
  • The scaling of 3D scenes generates many scenes that contain thousands of images. During training, there is no issue related to canonical orientation or mode collapse.
  • A novel de-noising optimization technique is used to find hidden representations that collaborate in modelling the camera poses and the radiance field to create multiple datasets with state-of-the-art performance in generating 3D scenes by building a setup that uses images and text.

To conclude, GAUDI has more capabilities and can also be used for sampling various images and video datasets. Furthermore, this will make a foray into AR (augmented reality) and VR (virtual reality). With GAUDI in hand, the sky is only the limit in the field of media creation. So, if you enjoy reading about the latest development in the field of AI and ML, then keep a tab on the blog section of the E2E Networks website.

Reference Links

Build on the most powerful infrastructure cloud

A vector illustration of a tech city using latest cloud technologies & infrastructure