Step-by-Step Guide for E-Commerce Startups to Create 3D Product Catalogs Using E2E Cloud

Introduction

The recent integration of artificial intelligence with computer graphics has resulted in significant breakthroughs in the realm of digital content creation. Two of the most notable innovations in this field are Neuralangelo and NeRF (Neural Radiance Fields). These cutting-edge technologies have revolutionized our approach to image synthesis and the capture of 3D scenes, reshaping our understanding of these processes.

Neuralangelo

Neuralangelo, named after the legendary Renaissance artist Michelangelo, represents a blend of artistic insight with computational capability. Located at the intersection of deep learning and art, this system uses generative adversarial networks (GANs) as well as other neural network architectures to create visually stunning realistic images, paintings, or even sculptures. Through machine learning’s tremendous capabilities, artists and designers can now explore new frontiers in creativity – catalyzing a blurring of distinctions between human imagination and digital representation.

NeRF

Neural Radiance Fields (NeRF) are a type of fully connected neural network that can generate new perspectives of complex 3D scenes from a subset of 2D images. They are trained to reproduce the appearance of a scene as seen in the input views by using a rendering loss function. To render a complete scene, NeRF interpolates between these input images, which represent different views of the scene. This makes NeRF a powerful tool for image creation in artificial intelligence.

NeRF networks use volume rendering to produce new views, and they are trained to map a 5D input (consisting of viewing direction and spatial location) to a 4D output (color and opacity). However, NeRF is a computationally intensive algorithm, and rendering complex scenes can take several hours or even days. Despite this, recent developments in algorithms have significantly improved its efficiency.

Synthetic views are generated by querying 5D coordinates along the paths of camera rays. The resulting colors and densities are then projected into an image using conventional volume rendering methods. The primary requirement for optimizing our representation is a collection of images accompanied by their known camera poses, as volume rendering inherently allows for differentiation. By effectively optimizing neural radiance fields, we demonstrate the ability to render new, photorealistic views of scenes with intricate geometry and appearance. This approach surpasses previous achievements in neural rendering and view synthesis in terms of results.

Our Problem Statement

A three-dimensional product catalog is a sophisticated way for customers to interact with products in an online store. It presents each item in three dimensions so they can view it from different perspectives. A 3D product catalog, in contrast to traditional catalogs (with its static images and simple videos), immerses customers in a more dynamic and engaging shopping experience.

The use of 3D models – digital representations of real objects made with computer graphics techniques – is the primary characteristic of a 3D product catalog. These models are extremely realistic in their portrayal of the form, feel, and look of products in a virtual setting. When it comes to product presentation, 3D models provide more flexibility and versatility than just using traditional techniques like photography.

In this blog post, we’ll convert 2D product images into 3D by using NeRF on E2E’s Cloud GPU.

E2E Networks: Leveraging Its Cloud GPU

Running the Neural Radiance Fields (NeRF) model, or any other computationally intensive deep learning model, on local computers can be challenging, often necessitating the use of cloud-based GPU resources.

The necessity for high-powered GPUs in operating NeRF models stems from the model's architecture and training process, which involve extensive computational demands. A dedicated, high-powered GPU is essential to efficiently handle these requirements.

A typical GPU architecture is shown in the figure below. However, instead of buying advanced GPUs, developers can get access to the same capabilities through a cloud GPU platform.

E2E Networks is a leading hyperscaler from India that focuses on advanced Cloud GPU infrastructure. E2E provides accelerated cloud computing solutions, including cutting-edge Cloud GPUs like A100/H100 and the AI Supercomputer HGX 8xH100 GPUs. We offer a range of advanced cloud GPUs at extremely competitive rates. To learn about the products provided by E2E Networks, visit here. As for the best GPU for Stable Diffusion model implementation, it largely depends on your specific requirements and budget. I used a GPU dedicated compute with A100–80 GB.

The best cloud GPU architectures allow you to access the capabilities offered by the GPU stack, which includes GPU clusters, faster bandwidth, and memory efficiency.

To proceed with E2E Networks, add your SSH key by going to Settings.

Then create a node by going to Compute.

Launch Visual Studio Code and download the Remote Explorer and Remote SSH extensions. Launch a fresh terminal. To gain access to your local system, just enter the code below:

ssh root@

SSH will be used to log you in remotely on your local computer. Let's begin putting the code into practice now.

Implementation with Nerf Model: Generating 3D Model Product Videos for E-Commerce

Let’s download a dataset from Kaggle using the Opendatasets library. It will require your Kaggle Username and API key, which you can access through your Kaggle account by going to Settings.

%pip install opendatasets

import opendatasets as od
od.download("https://www.kaggle.com/datasets/vikashrajluhaniwal/fashion-images")

This command installs the latest version of PyTorch, Torchvision, and Matplotlib.

The torch is used because it is an open-source deep-learning framework that provides tensor computation and GPU acceleration.

%pip install torch torchvision matplotlib

In our VS Code, the Python environment does not have the libraries that we want to use installed. So we’ll start installing all the important libraries.

# Imports

import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from PIL import Image
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

# Define a Neural Radiance Field (NeRF) model

class ComplexNeRF(nn.Module):
    def init(self, in_features=6, hidden_features=256, out_features=3):
        super(ComplexNeRF, self).init()
        self.fc1 = nn.Linear(in_features, hidden_features)
        self.fc2 = nn.Linear(hidden_features, hidden_features)
        self.fc3 = nn.Linear(hidden_features, hidden_features)
        self.fc4 = nn.Linear(hidden_features, out_features)
        
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = self.fc4(x)
        return x

The below-described procedures are followed in this implementation, which yields a dictionary with the image, RGB values, and 3D points for every sample.

# Loading Synthetic Dataset (Customize based on your dataset)

class CustomDataset(Dataset):
    def init(self, data_folder, transform=None):
        self.data_folder = data_folder
        self.transform = transform
        self.image_list = os.listdir(data_folder)

   def len(self):
        return len(self.image_list)

   def getitem(self, idx):
        img_path = os.path.join(self.data_folder, self.image_list[idx])
        image = Image.open(img_path).convert('RGB')

       # Generate random 3D points for each pixel in the image
        height, width = image.size
        points_3d = np.column_stack((np.random.rand(height, width, 1), np.random.rand(height, width, 1), np.random.rand(height, width, 1)))

       # Generate random RGB values for each pixel
        rgb_values = np.random.randint(0, 256, size=(height, width, 3))

       sample = {'points_3d': points_3d, 'rgb_values': rgb_values, 'image': image}

       if self.transform:
            sample = self.transform(sample)
        return sample

This is the sample we received as output.

After completing the data processing, we need to develop a 360-degree video transformation feature for this e-commerce product.

The essential actions needed to carry out the Rescale transformation include emphasis on returning the transformed sample and resizing the image while maintaining the aspect ratio.

# Data processing Transformations

class Rescale(object):
    def init(self, output_size):
        assert isinstance(output_size, (int, tuple))
        self.output_size = output_size

   def call(self, sample):
        image, points_3d, rgb_values = sample['image'], sample['points_3d'], sample['rgb_values']

       h, w = image.size[:2]
        if isinstance(self.output_size, int):
            if h > w:
                new_h, new_w = self.output_size, int(self.output_size * w / h)
            else:
                new_h, new_w = int(self.output_size * h / w), self.output_size
        else:
            new_h, new_w = self.output_size
        new_h, new_w = int(new_h), int(new_w)
        img = transforms.Resize((new_h, new_w))(image)
        return {'image': img, 'points_3d': points_3d, 'rgb_values': rgb_values}

This hint gives instructions on how to use the ToTensor transformation to create PyTorch tensors from the image, RGB values, and 3D points.

class ToTensor(object):
    def call(self, sample):
        image, points_3d, rgb_values = sample['image'], sample['points_3d'], sample['rgb_values']
        img = transforms.ToTensor()(image)

       return {'image': img, 'points_3d': torch.tensor(points_3d, dtype=torch.float32),
                'rgb_values': torch.tensor(rgb_values, dtype=torch.float32)}

By minimizing the MSE loss between the predicted and ground truth 3D points, this function trains the model. It optimizes using the Adam optimizer.

# Training function

def train_complex_nerf(model, train_loader, num_epochs=10, lr=0.001):
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    for epoch in range(num_epochs):
        model.train()
        
        for batch in train_loader:
            inputs, targets = batch['image'], batch['points_3d']
            outputs = model(inputs)
            loss = criterion(outputs, targets)

           optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

This feature indicates how well the model uses the input images to reconstruct the 3D scene.

# Visualization function

def visualize_3d_reconstruction(model, test_loader):
    model.eval()
    with torch.no_grad():
        for batch in test_loader:
            inputs, targets = batch['image'], batch['points_3d']
            outputs = model(inputs)

           # Visualizing the 3D reconstruction
            fig = plt.figure()
            ax = fig.add_subplot(111, projection='3d')
            ax.scatter(targets[:, 0], targets[:, 1], targets[:, 2], c='r', marker='o', label='Ground Truth')
            ax.scatter(outputs[:, 0], outputs[:, 1], outputs[:, 2], c='b', marker='s', label='Reconstruction')
            ax.set_xlabel('X')
            ax.set_ylabel('Y')
            ax.set_zlabel('Z')
            ax.legend()
            plt.show()

It shows how to load and prepare datasets, train NeRF, and view the results of the 3D reconstruction.

# Data processing, applying transformations

data_transform = transforms.Compose([Rescale(256), ToTensor()])

# Load train dataset

synthetic_dataset = CustomDataset(data_folder='/NERF/fashion-images/data/Apparel/Boys/Images/images_with_product_ids', transform=data_transform)
train_loader = DataLoader (synthetic_dataset, batch_size=64, shuffle=True)

# Load test dataset

test_dataset = CustomDataset(data_folder='/NERF/fashion-images/data/Apparel/Girls/Images/images_with_product_ids', transform=data_transform)
test_loader = DataLoader (test_dataset, batch_size=1, shuffle=False)

# Create and train a NeRF model

complex_nerf_model = ComplexNeRF()
train_complex_nerf(complex_nerf_model, train_loader)

# Visualize 3D reconstruction on the test dataset

visualize_3d_reconstruction(complex_nerf_model, test_loader)

Voila! The following are the 3D videos as sample outputs.

Product 1 - Rotating 3D video of a pair of trousers.

Product 2 - Rotating 3D video of a t-shirt.

This process can be used by any e-commerce firm to convert still images to engaging 3D videos.

Conclusion

In conclusion, the Stable Diffusion model's fine-tuning for e-commerce image generation was greatly improved by integrating E2E Networks' A100–80 GB GPU dedicated compute. The computational power of the A100 GPU effectively handled complex model operations, leading to faster training and seamless processing.

The versatility of the A100 allowed for quick experimentation and effective model customization through fine-tuning unique datasets. The A100 GPU guaranteed responsiveness for real-time image generation, cutting down on training times and improving user experience.

In summary, the synergistic environment that was created by the partnership between E2E Networks’ A100 GPU and Stable Diffusion model’s fine-tuning was marked by accessibility, computational efficiency, and accelerated model training, making the process of creating 3D content for e-commerce both efficient and pleasurable.

Step-by-Step Guide for E-Commerce Startups to Create 3D Product Catalogs Using E2E Cloud

Introduction

Neuralangelo

NeRF

Our Problem Statement

E2E Networks: Leveraging Its Cloud GPU

Implementation with Nerf Model: Generating 3D Model Product Videos for E-Commerce

Conclusion

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

Company

Legal & Policies

Investor Relations

Resources