Machine Learning Models: Unveiling Security Vulnerabilities and Fortifying Robustness

August 7, 2023

Introduction: Machine Learning in a Security Context

Our capabilities are improving across a variety of industries, including healthcare, automobile, law and finance, thanks to machine learning, a key component of the current wave of digital transformation. But along with its impressive development come a number of complex security worries. The purpose of this piece is to identify the main weaknesses and suggest solutions for making machine learning models robust.

As machine learning models become more sophisticated, they also become more vulnerable to attack. This is because machine learning models are trained on data, and if that data is corrupted or manipulated, the model can be tricked into making incorrect predictions. The security risks which we are discussing in this article can have serious consequences, such as financial losses, identity theft, and even physical harm. It is therefore essential to take steps to secure machine learning models.

The vulnerabilities in these machine learning models are primarily because of the dataset on which they are trained. Data, in essence, mirrors our society; and thus, inevitably absorbs the biases that permeate it. Given that datasets are human constructs - collected, labelled, and applied by us - they become a prism through which our implicit and explicit biases are refracted. Bias may seep into the data collection process when we subconsciously select or exclude certain pieces of information. Similarly, during data labelling, our perspectives and prejudices can influence the ways we categorise and classify data. Moreover, the applications we choose for these datasets can also reflect our personal or societal biases, as we might favour certain outcomes over others. Therefore, it is crucial to remember that no dataset is a perfect, impartial snapshot of reality. Each carries the traces of human bias, underlining the importance of diversity, equity, and transparency in all stages of data handling.

Common Security Vulnerabilities in ML Models

Let's probe further into this issue. Among the prevalent security risks associated with machine learning models are data poisoning, adversarial attacks, and model inversion and extraction. In this part, our goal is to grasp these terms' meanings and operational procedures more intensively. We will also examine their various types.

Data Poisoning

Data poisoning poses a significant risk to machine learning models. A data poisoning attack refers to a scenario where the learning data used by these models is deliberately tampered with. In essence, this threat operates by modifying or adding data to the training set, leading the model to internalise incorrect or biassed information. Consequently, the model may make inaccurate or misleading predictions. To illustrate, consider a scenario where a machine learning model is trained to distinguish between cats and dogs using a dataset of images. An attacker, however, could alter this dataset by including photoshopped images of cats appearing like dogs or removing some dog images altogether. When the model trains on this manipulated dataset, its ability to accurately differentiate between cats and dogs diminishes—illustrating successful data poisoning. This kind of attack can potentially result in severe consequences. Hence, the importance of robust data validation and sanitization processes cannot be overstated in mitigating such threats to machine learning models. We will be discussing data validation and other solutions for making machine learning models robust in depth later in this article.


In the image above which is taken from 'Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks' by Ali Shafahi et al., an example data poisoning scheme is illustrated. When the machine learning model is trained on this poisoned data, it will learn to classify the poisoned emails as non-spam. This means that similar spam emails will be able to pass through the filter in the future. The test email is a spam email, but the model classifies it as non-spam because it has learned to classify the poisoned emails as non-spam. This is an example of how a data poisoning attack can be used to fool a spam filtering model.

There are two types of data poisoning attacks, direct and indirect. Let us go through the definition of those to understand how they work and what is the difference between both of them.

Direct Poisoning Attacks

In this kind of attack, the attacker intentionally introduces detrimental data into the training dataset in an effort to change the model's final result.

For Example: A machine learning model designed to filter out spam emails can be tricked by an attacker who adds carefully crafted spam emails to the training set. These spam emails look like regular emails, so the model will learn to classify them as non-spam. This means that similar spam emails will be able to pass through the filter in the future.

Indirect Poisoning Attacks

In case of Indirect poisoning attacks, the attacker adjusts the data distribution across the complete training set, thereby eventually swaying the decisions made by the model.

For Example: Consider a model trained to suggest personalised movie recommendations to users. An attacker, aiming to promote a particular movie, could subtly manipulate the data distribution by adding numerous slightly altered user profiles showing a strong preference for that movie. Over time, this skews the model's understanding of general user preference, leading it to recommend that particular movie more frequently, even to users with different movie tastes.

Adversarial Attacks

Consider you're in your school's photography club and you're learning how to edit pictures using software. Now, suppose your mischievous friend decides to play a prank on you and subtly alters some pixels in one of your photos. At first, the changes are so minor that you don't even notice them with your naked eyes. But when you submit it to a photo recognition contest, the recognition model used by judges, because of those few changed pixels, identifies your picture of a dog as a cat. This can be really frustrating, right? This is very similar to an adversarial attack in machine learning. An attacker makes small changes that are almost imperceptible but can make a highly accurate model fail at its task, like mistaking a dog for a cat.

In critical situations, this could be more than just an annoyance; it could have severe consequences. In other words, a scenario where an attacker introduces meticulously crafted noise into the input data. This can be also compared to a targeted digital manipulation of an image, changing just enough pixels to cause an otherwise accurate image recognition model to misinterpret the image. These adversarial inputs are designed to exploit the model's decision boundaries, making the model classify them into the wrong category. In non-critical applications, the consequences might be minimal. However, when such models are utilised in crucial sectors like healthcare for disease diagnostics or in the automotive industry for autonomous vehicles, the results could be catastrophic. Therefore, understanding and mitigating these adversarial attacks should be a high priority for those in the cybersecurity field.

The above image illustrates one of the most famous adversarial attacks, FGSM. Advis.js is a platform which is the first to bring adversarial example generation and dynamic visualisation to the browser for real-time exploration. 

White Box Attacks

White-box attacks are the most powerful type of adversarial attack because the adversary has complete knowledge of the model, including its architecture, parameters, training method, and data. This allows them to craft the most effective adversarial examples, which are small, imperceptible perturbations to the input data that can cause the model to misclassify the data. Examples of white-box attacks include FGSM and JSMA. 

In the example above, the adversarial attack is FGSM. It is a white-box adversarial attack that is used to fool machine learning models. It works by adding small, imperceptible perturbations to an image, which causes the model to misclassify the image. In the example shown above, we start with an image of a panda. If you feed this image to a machine learning model, the model will correctly classify it as a panda. However, if you use FGSM to add small perturbations to the image, the model may misclassify the image as a gibbon. The perturbations that are added to the image are calculated using the gradient of the loss function of the machine learning model. The gradient tells you how much the loss function will change if you change the input image. By adding perturbations in the direction of the gradient, you can make the loss function increase, which causes the model to misclassify the image. The perturbations that are added to the image are very small, so they are not visible to the naked eye. However, they are enough to cause the machine learning model to misclassify the image.

Black Box Attacks

Black-box attacks are a type of adversarial attack where the adversary does not have complete knowledge of the model, such as its architecture, parameters, or training data. This means that they cannot use the same methods as white-box attacks to craft adversarial examples. Instead, they need to rely on trial and error, or on techniques that exploit the transferability of adversarial examples. Transferability of adversarial examples is the idea that an adversarial example created for one model can often fool another model. This is because both models are likely to be vulnerable to the same types of perturbations. This makes black-box attacks more challenging to defend against, as the adversary does not need to know the specific details of the target model. 

An illustration of a black-box attack is the zeroth-order optimization attack. This method functions by perpetually seeking an adversarial example that reduces the loss function of the intended model. The attacker lacks knowledge of the gradient of the loss function, requiring them to employ a zeroth-order approximation. While this characteristic makes the attack less speedy compared to white-box attacks, it remains feasible to locate potent adversarial examples.

Grey Box Attacks

Grey Box attacks fall between white and black-box attacks. The attacker has some knowledge about the model, but not complete information.

For those keen to explore more about adversarial attacks, the research paper titled 'Explaining and Harnessing Adversarial Examples' by Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy’ come highly recommended. The authors in this paper argue that this vulnerability is not primarily due to nonlinearity or overfitting of the model, as had been previously thought. Instead, they suggest that the fundamental cause of this vulnerability is the inherent linearity of these models. The paper supports this argument with quantitative results. Moreover, it provides an explanation for why these adversarial perturbations are effective even when applied to different model architectures and training sets.

Model Inversion & Extraction

Model inversion and extraction attacks represent a significant threat to machine learning models, primarily because they can compromise the privacy of the data and violate intellectual property rights. We will be discussing their types separately. Inversion attacks are primarily of two types while extraction attacks are primarily of three types. Let us now try to understand how all these attacks function.

Inversion Attacks

Model Inversion attacks were proposed by Fredrikson et al. in their paper - 'Model Inversion Attacks That Exploit Confidence Information and Basic Countermeasures'. The attack exploits a trained classification model as a tool to retrieve and recreate the data representations that were utilised during the model's training process. By doing so, it provides an opportunity to gain deep insights into the original training data, effectively bypassing privacy and security barriers. This advanced approach allows for substantial understanding and potentially unauthorised utilisation of the initial data, opening up a broad spectrum of security and privacy considerations.


The image above presents an engaging and insightful illustration, unravelling the intricacies of image reconstruction via the lens of both baseline and XAI-aware (Explainable Artificial Intelligence) inversion attack models. The focus here is on the CelebA dataset, a massive compendium of celebrity faces, a staple in the machine learning domain for an array of applications. At the heart of this demonstration lies the 'target task,' which is identification. This refers to the model's training to discern specific individuals within the extensive CelebA dataset. The 'attack task,' on the other hand, pertains to the inversion attack, characterised by its intent to reconstruct the original input images using solely the outputs of the model. Through a comparative exploration between the baseline model - a model devoid of specific defences or enhancements against inversion attacks - and the XAI-aware inversion attack models, the demonstration offers a vivid portrayal of the vulnerability of machine learning models in the face of such attacks. Simultaneously, it spotlights the burgeoning importance of Explainable AI. The aim here is to render AI models increasingly transparent and interpretable, fostering an environment of trust around AI. This visual representation of image reconstruction, realised through varying models, lends a palpable comprehension of the capacity of inversion attacks to leverage the innate vulnerabilities within machine learning models. It offers a comparative perspective, illuminating the potential impact of integrating Explainable AI tactics within inversion attack models. Ultimately, this exposition accentuates the pressing need for continual progress and enhancement in the spheres of AI security and explainability.

For a more comprehensive and in-depth exploration of the image discussed above, consider delving into the research paper 'Exploiting Explanations for Model Inversion Attacks' authored by Xuejun Zhao and his team.

Black Box Inversion Attacks

These kinds of attacks see the adversary leveraging the outputs of the model (such as prediction probabilities) to reconstruct the inputs initially used for training. Crucially, in this scenario, the attacker does not have the benefit of accessing the model parameters or its architecture.

White Box Inversion Attacks

Contrasting with the black box variant, white box model inversion attacks see the attacker equipped with complete access to both the model parameters and its architecture. This additional information can aid in achieving a more precise reconstruction of the training data.

Extraction Attacks

Model extraction attacks are a type of attack where an adversary tries to steal the functionality of a machine learning model without having access to the model's parameters or training data. This is done by making queries to the model and observing the corresponding responses. The attacker can then use this information to train a replica of the model.

API-based Model Extraction Attacks

In this category of attack, the adversary continuously interacts with the model through an API. They utilise the subsequent responses to engineer a surrogate model that closely emulates the behavioural patterns of the original model. An example would be an attacker querying a language translation model offered as a service by a company, to generate an imitation model that behaves similarly.

Membership Inference Attacks

While not strictly a model extraction attack, it does share a close relationship. Here, the attacker manipulates the model's outputs to discern whether a specific data point was incorporated within the training set, thereby potentially breaching the privacy of individuals. For instance, an attacker could infer whether a certain medical record was included in the training data of a health prediction model.

Model Stealing Attacks

Under this scenario, the attacker attempts to replicate the structure of the model and the parameters it was trained on, despite not having immediate access to them. Often, this is achieved by using a series of input-output pairs to reverse-engineer the model. A classic example would be an attacker querying a facial recognition model with various images and using the received predictions to build a similar model.

Real Life Case Studies: Exploitations & Solutions

Tesla's Autopilot Adversarial Attack

In 2019, a team of researchers from Tencent's Keen Security Lab conducted an adversarial attack on Tesla's Autopilot system. They achieved this by strategically placing small stickers on the road with specific patterns, causing the Autopilot system to misinterpret lane markings and make unexpected lane changes. This exploit raised concerns as it could potentially jeopardise safety during real-world driving scenarios.

If you are interested in gaining more insight into the entire experiment's methodology, you can consult the report titled 'Experimental Security Research of Tesla Autopilot' conducted by Tencent Keen Security Lab.

Google's Cloud Vision API Vulnerability

Google's Cloud Vision API is a machine learning system that can classify and label images. In 2017, Hossein Hosseini, Baicen Xiao, Radha Poovendran in their paper 'Google's Cloud Vision API Is Not Robust to Noise' demonstrated that by adding carefully crafted noise to an image, they could trick the API into misclassifying objects with high confidence. 

For a more in-depth exploration of the Google Vision API vulnerability, you can consult the same paper mentioned earlier.

Strategies for Robust Machine Learning Models 

Following our exploration of various types of security vulnerabilities and an analysis of real-world instances, it becomes clear that the quest for improvement is relentless and ongoing.

At this juncture of our discussion, we now shift our focus to understanding the various strategies that can be deployed to develop more robust machine learning models.

Mitigating Data Poisoning

To thwart data poisoning attacks, the application of rigorous data validation and anomaly detection mechanisms is essential. Such procedures proficiently spot and discard tainted data. For instance, an anomaly detection system might flag data points that deviate significantly from the norm, indicating potential poisoning. Furthermore, leaning on dependable and secure data sources along with the use of robust encryption methodologies can significantly reduce the susceptibility to data poisoning. An example of data validation could be checking the integrity of data through checksums or other verification techniques before using it for model training.

Defending Against Adversarial Attacks

To fortify models against adversarial attacks, several methods can be implemented. One such method is adversarial training, where the model learns from a mixture of ordinary and adversarial examples, thereby enhancing its capacity to withstand such attacks. Consider an unusual example of a model learning to recognize different bird species. In adversarial training, alongside the regular images of birds, the model is also fed subtly modified images (which may slightly alter colour patterns or shapes but are still visually indistinguishable for humans) to help it learn to identify the species even under deceptive conditions. 

Other effective techniques include defensive distillation and gradient masking. Defensive distillation trains the model to predict the likelihood of different classes, thus improving its interpretability and robustness. On the other hand, gradient masking aims to obscure the model's gradients, making it more challenging for adversaries to craft efficient adversarial inputs.

Preventing Model Inversion & Extraction

Shielding against model inversion and extraction attacks requires an amalgamation of data privacy safeguards and model defence strategies. By applying differential privacy, a layer of protection is added to sensitive data by introducing noise into the model's outputs, thereby shielding individual data elements. Employing techniques like homomorphic encryption can assure data safety while still permitting computations on encrypted data. Regarding model protection, strategies such as model hardening and obfuscation can be leveraged to deter unauthorised extraction of the model.


While machine learning models unfurl vast potential and open the gates for unprecedented innovation, they simultaneously usher in substantial security challenges with far-reaching impact. It is through the prism of an all-encompassing understanding of these vulnerabilities, coupled with assertive and resilient security practices, that we can effectively leverage the dynamism of machine learning in a secure and accountable fashion. As we navigate the complex maze of machine learning security, we must persistently endeavour to forge a pathway that leads us towards a fortified, secure digital tomorrow.


  1. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks 
  2. Advis.js
  3. Explaining and Harnessing Adversarial Examples- Ian Goodfellow et al.
  4. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures- Fredrikson et al.
  5. Exploiting Explanations for Model Inversion Attacks - Xuejun Zhao
  6. Experimental Security Research of Tesla Autopilot - Tencent Keen Security Lab
  7. Google's Cloud Vision API Is Not Robust To Noise - Hossein Hosseini, Baicen Xiao, Radha Poovendran
  8. Further Exploration: Amazon's Alexa Skill Evasion
  9. For additional exploration in this field, see these further references-
  1. Awesome Model Inversion Attack Repository

Latest Blogs
This is a decorative image for: A Complete Guide To Customer Acquisition For Startups
October 18, 2022

A Complete Guide To Customer Acquisition For Startups

Any business is enlivened by its customers. Therefore, a strategy to constantly bring in new clients is an ongoing requirement. In this regard, having a proper customer acquisition strategy can be of great importance.

So, if you are just starting your business, or planning to expand it, read on to learn more about this concept.

The problem with customer acquisition

As an organization, when working in a diverse and competitive market like India, you need to have a well-defined customer acquisition strategy to attain success. However, this is where most startups struggle. Now, you may have a great product or service, but if you are not in the right place targeting the right demographic, you are not likely to get the results you want.

To resolve this, typically, companies invest, but if that is not channelized properly, it will be futile.

So, the best way out of this dilemma is to have a clear customer acquisition strategy in place.

How can you create the ideal customer acquisition strategy for your business?

  • Define what your goals are

You need to define your goals so that you can meet the revenue expectations you have for the current fiscal year. You need to find a value for the metrics –

  • MRR – Monthly recurring revenue, which tells you all the income that can be generated from all your income channels.
  • CLV – Customer lifetime value tells you how much a customer is willing to spend on your business during your mutual relationship duration.  
  • CAC – Customer acquisition costs, which tells how much your organization needs to spend to acquire customers constantly.
  • Churn rate – It tells you the rate at which customers stop doing business.

All these metrics tell you how well you will be able to grow your business and revenue.

  • Identify your ideal customers

You need to understand who your current customers are and who your target customers are. Once you are aware of your customer base, you can focus your energies in that direction and get the maximum sale of your products or services. You can also understand what your customers require through various analytics and markers and address them to leverage your products/services towards them.

  • Choose your channels for customer acquisition

How will you acquire customers who will eventually tell at what scale and at what rate you need to expand your business? You could market and sell your products on social media channels like Instagram, Facebook and YouTube, or invest in paid marketing like Google Ads. You need to develop a unique strategy for each of these channels. 

  • Communicate with your customers

If you know exactly what your customers have in mind, then you will be able to develop your customer strategy with a clear perspective in mind. You can do it through surveys or customer opinion forms, email contact forms, blog posts and social media posts. After that, you just need to measure the analytics, clearly understand the insights, and improve your strategy accordingly.

Combining these strategies with your long-term business plan will bring results. However, there will be challenges on the way, where you need to adapt as per the requirements to make the most of it. At the same time, introducing new technologies like AI and ML can also solve such issues easily. To learn more about the use of AI and ML and how they are transforming businesses, keep referring to the blog section of E2E Networks.

Reference Links

This is a decorative image for: Constructing 3D objects through Deep Learning
October 18, 2022

Image-based 3D Object Reconstruction State-of-the-Art and trends in the Deep Learning Era

3D reconstruction is one of the most complex issues of deep learning systems. There have been multiple types of research in this field, and almost everything has been tried on it — computer vision, computer graphics and machine learning, but to no avail. However, that has resulted in CNN or convolutional neural networks foraying into this field, which has yielded some success.

The Main Objective of the 3D Object Reconstruction

Developing this deep learning technology aims to infer the shape of 3D objects from 2D images. So, to conduct the experiment, you need the following:

  • Highly calibrated cameras that take a photograph of the image from various angles.
  • Large training datasets can predict the geometry of the object whose 3D image reconstruction needs to be done. These datasets can be collected from a database of images, or they can be collected and sampled from a video.

By using the apparatus and datasets, you will be able to proceed with the 3D reconstruction from 2D datasets.

State-of-the-art Technology Used by the Datasets for the Reconstruction of 3D Objects

The technology used for this purpose needs to stick to the following parameters:

  • Input

Training with the help of one or multiple RGB images, where the segmentation of the 3D ground truth needs to be done. It could be one image, multiple images or even a video stream.

The testing will also be done on the same parameters, which will also help to create a uniform, cluttered background, or both.

  • Output

The volumetric output will be done in both high and low resolution, and the surface output will be generated through parameterisation, template deformation and point cloud. Moreover, the direct and intermediate outputs will be calculated this way.

  • Network architecture used

The architecture used in training is 3D-VAE-GAN, which has an encoder and a decoder, with TL-Net and conditional GAN. At the same time, the testing architecture is 3D-VAE, which has an encoder and a decoder.

  • Training used

The degree of supervision used in 2D vs 3D supervision, weak supervision along with loss functions have to be included in this system. The training procedure is adversarial training with joint 2D and 3D embeddings. Also, the network architecture is extremely important for the speed and processing quality of the output images.

  • Practical applications and use cases

Volumetric representations and surface representations can do the reconstruction. Powerful computer systems need to be used for reconstruction.

Given below are some of the places where 3D Object Reconstruction Deep Learning Systems are used:

  • 3D reconstruction technology can be used in the Police Department for drawing the faces of criminals whose images have been procured from a crime site where their faces are not completely revealed.
  • It can be used for re-modelling ruins at ancient architectural sites. The rubble or the debris stubs of structures can be used to recreate the entire building structure and get an idea of how it looked in the past.
  • They can be used in plastic surgery where the organs, face, limbs or any other portion of the body has been damaged and needs to be rebuilt.
  • It can be used in airport security, where concealed shapes can be used for guessing whether a person is armed or is carrying explosives or not.
  • It can also help in completing DNA sequences.

So, if you are planning to implement this technology, then you can rent the required infrastructure from E2E Networks and avoid investing in it. And if you plan to learn more about such topics, then keep a tab on the blog section of the website

Reference Links

This is a decorative image for: Comprehensive Guide to Deep Q-Learning for Data Science Enthusiasts
October 18, 2022

A Comprehensive Guide To Deep Q-Learning For Data Science Enthusiasts

For all data science enthusiasts who would love to dig deep, we have composed a write-up about Q-Learning specifically for you all. Deep Q-Learning and Reinforcement learning (RL) are extremely popular these days. These two data science methodologies use Python libraries like TensorFlow 2 and openAI’s Gym environment.

So, read on to know more.

What is Deep Q-Learning?

Deep Q-Learning utilizes the principles of Q-learning, but instead of using the Q-table, it uses the neural network. The algorithm of deep Q-Learning uses the states as input and the optimal Q-value of every action possible as the output. The agent gathers and stores all the previous experiences in the memory of the trained tuple in the following order:

State> Next state> Action> Reward

The neural network training stability increases using a random batch of previous data by using the experience replay. Experience replay also means the previous experiences stocking, and the target network uses it for training and calculation of the Q-network and the predicted Q-Value. This neural network uses openAI Gym, which is provided by taxi-v3 environments.

Now, any understanding of Deep Q-Learning   is incomplete without talking about Reinforcement Learning.

What is Reinforcement Learning?

Reinforcement is a subsection of ML. This part of ML is related to the action in which an environmental agent participates in a reward-based system and uses Reinforcement Learning to maximize the rewards. Reinforcement Learning is a different technique from unsupervised learning or supervised learning because it does not require a supervised input/output pair. The number of corrections is also less, so it is a highly efficient technique.

Now, the understanding of reinforcement learning is incomplete without knowing about Markov Decision Process (MDP). MDP is involved with each state that has been presented in the results of the environment, derived from the state previously there. The information which composes both states is gathered and transferred to the decision process. The task of the chosen agent is to maximize the awards. The MDP optimizes the actions and helps construct the optimal policy.

For developing the MDP, you need to follow the Q-Learning Algorithm, which is an extremely important part of data science and machine learning.

What is Q-Learning Algorithm?

The process of Q-Learning is important for understanding the data from scratch. It involves defining the parameters, choosing the actions from the current state and also choosing the actions from the previous state and then developing a Q-table for maximizing the results or output rewards.

The 4 steps that are involved in Q-Learning:

  1. Initializing parameters – The RL (reinforcement learning) model learns the set of actions that the agent requires in the state, environment and time.
  2. Identifying current state – The model stores the prior records for optimal action definition for maximizing the results. For acting in the present state, the state needs to be identified and perform an action combination for it.
  3. Choosing the optimal action set and gaining the relevant experience – A Q-table is generated from the data with a set of specific states and actions, and the weight of this data is calculated for updating the Q-Table to the following step.
  4. Updating Q-table rewards and next state determination – After the relevant experience is gained and agents start getting environmental records. The reward amplitude helps to present the subsequent step.  

In case the Q-table size is huge, then the generation of the model is a time-consuming process. This situation requires Deep Q-learning.

Hopefully, this write-up has provided an outline of Deep Q-Learning and its related concepts. If you wish to learn more about such topics, then keep a tab on the blog section of the E2E Networks website.

Reference Links

This is a decorative image for: GAUDI: A Neural Architect for Immersive 3D Scene Generation
October 13, 2022

GAUDI: A Neural Architect for Immersive 3D Scene Generation

The evolution of artificial intelligence in the past decade has been staggering, and now the focus is shifting towards AI and ML systems to understand and generate 3D spaces. As a result, there has been extensive research on manipulating 3D generative models. In this regard, Apple’s AI and ML scientists have developed GAUDI, a method specifically for this job.

An introduction to GAUDI

The GAUDI 3D immersive technique founders named it after the famous architect Antoni Gaudi. This AI model takes the help of a camera pose decoder, which enables it to guess the possible camera angles of a scene. Hence, the decoder then makes it possible to predict the 3D canvas from almost every angle.

What does GAUDI do?

GAUDI can perform multiple functions –

  • The extensions of these generative models have a tremendous effect on ML and computer vision. Pragmatically, such models are highly useful. They are applied in model-based reinforcement learning and planning world models, SLAM is s, or 3D content creation.
  • Generative modelling for 3D objects has been used for generating scenes using graf, pigan, and gsn, which incorporate a GAN (Generative Adversarial Network). The generator codes radiance fields exclusively. Using the 3D space in the scene along with the camera pose generates the 3D image from that point. This point has a density scalar and RGB value for that specific point in 3D space. This can be done from a 2D camera view. It does this by imposing 3D datasets on those 2D shots. It isolates various objects and scenes and combines them to render a new scene altogether.
  • GAUDI also removes GANs pathologies like mode collapse and improved GAN.
  • GAUDI also uses this to train data on a canonical coordinate system. You can compare it by looking at the trajectory of the scenes.

How is GAUDI applied to the content?

The steps of application for GAUDI have been given below:

  • Each trajectory is created, which consists of a sequence of posed images (These images are from a 3D scene) encoded into a latent representation. This representation which has a radiance field or what we refer to as the 3D scene and the camera path is created in a disentangled way. The results are interpreted as free parameters. The problem is optimized by and formulation of a reconstruction objective.
  • This simple training process is then scaled to trajectories, thousands of them creating a large number of views. The model samples the radiance fields totally from the previous distribution that the model has learned.
  • The scenes are thus synthesized by interpolation within the hidden space.
  • The scaling of 3D scenes generates many scenes that contain thousands of images. During training, there is no issue related to canonical orientation or mode collapse.
  • A novel de-noising optimization technique is used to find hidden representations that collaborate in modelling the camera poses and the radiance field to create multiple datasets with state-of-the-art performance in generating 3D scenes by building a setup that uses images and text.

To conclude, GAUDI has more capabilities and can also be used for sampling various images and video datasets. Furthermore, this will make a foray into AR (augmented reality) and VR (virtual reality). With GAUDI in hand, the sky is only the limit in the field of media creation. So, if you enjoy reading about the latest development in the field of AI and ML, then keep a tab on the blog section of the E2E Networks website.

Reference Links

Build on the most powerful infrastructure cloud

A vector illustration of a tech city using latest cloud technologies & infrastructure