Generate Your Own Dataset Using GAN

A GAN or a generative adversarial network belongs to a machine learning framework class. This network was designed by Ian Goodfellow and his coworkers in 2014. What is there in a GAN? Two neural networks (NNs) take part in a contest with each other. Most of the time, this contest is in the form of a game. The competitors are given training set data. With the help of this procedure, the training dataset/machine learning model learns to produce new data using the same set of statistics as the training set.

Applications of Generative Adversarial Network

There is specific usage of GAN. These include –

Generating videos
Generating voice
Generation of images

Generator

The generator is considered the heart of the dataset. It’s a model for a generation of datasets that help achieve really high performance after the training procedure. The objective of the generator is that it should be able to produce mock datasets from a particular input. The generator will produce different datasets on every instance from an input which contains different random values. These sets of random values are also called the noise vector.

Let’s take an example in this regard if the noise vector is 2, 1, 1.5, 5, 5, 5, 2, which has been fed in as input. After the training process is over, the generator model generates new samples, which are permutations and combinations of the above-given set. New sets could be created by performing mathematical operations between the elements of the noise vector.

Discriminator

The discriminator is nothing else but a tool for classification. The discriminator aims to distinguish between authentic or counterfeit generated data. This type of classifier can help in distinguishing images and videos. These classifiers help determine these features based on their features, and they produce better examples over time.

How to generate your own dataset?

A programmer can produce their dataset by using GAN. Since a reference dataset is required for this tutorial, it could be a dataset for images, videos or numbers, whatever you want. For this example, you could use Jupyter as your IDE for this tutorial.

Given below are some of the packages that will be used to apply a basic GAN system in Python/Keras.

Mount the code given below in the IDE of your choice. For convenience we are using Google Colab along with Google Drive.

These are the constants that define how the GANs will be created for this example. The higher the resolution, the more memory will be needed. Higher resolution will also result in longer run times. Note that the resolution is specified as a multiple of 32. So GENERATE_RES of 1 is 32, 2 is 64, etc.

To run this, you will need training data. The training data can be any collection of images. I suggest using training data from the following two locations. Simply unzip and combine to a common directory. The constant DATA_PATH defines where these images are stored.

Next, the images will be loaded and preprocessed which takes some time. The processed file is stored as a binary form of data. In this way, we can easily reload the processed training data and use it quickly.

‍

We will be making the use of a TensorFlow Dataset object which will act as a container for holding the images. This data can be shuffled and divided into the suitable batch sizes that can be trained.

The next step is building the generator and the discriminator, both. Here we are training them with the Adam optimizer.

As we go through the training dataset (images in this case), a new dataset will be produced which will indicate the progress. The rendered image data will show that the capability of the generator has advanced.

‍

This result will be obtained in tensors. You need to develop loss functions as well, because that allows the discriminator and generator to be trained in an adversarial manner. As the two neural networks are trained on independent datasets, the training must be done in different passes as well.

‍

As mentioned above, the generator and discriminator, both use Adam. The momentum of the GAN and learning rate are also the same. You can also use a GENERATE_RES greater than 3 for having quicker hyper parameters.

The following function is where most of the training takes place for both the discriminator and the generator. This function was based on the GAN provided by the TensorFlow Keras examples documentation. This function has been interpreted with the tf.function annotation. As a result, the function is precompiled. This helps to improve the performance of the GAN.

‍

Beginning the training of the GAN

After this, you can get the output.

All this code has been written in Python. You can mount the code on Jupyter or on Visual Studio as well.

To sum up, hopefully, you now have a clear idea of the process of generating your own dataset using the GAN. Now, if you wish to learn more, or need any infrastructural support, then visit the official website of E2E Networks for more details.

‍

Reference Links

https://www.toptal.com/machine-learning/generative-adversarial-networks

https://www.topbots.com/step-by-step-implementation-of-gans-part-2/

https://www.analyticsvidhya.com/blog/2021/04/generate-your-own-dataset-using-gan/

https://github.com/BakingBrains/Generation_of_Data_using_GAN

‍