Deep learning approaches have been relatively very successful in addressing problems of varying scales. Deep learning has demonstrated its capability and produced astounding outcomes in the realm of image and video super-resolution as well.
In this blog, we will go over what image super-resolution is and various other aspects revolving around it.
What is Image Resolution?
The resolution of an image is the number of pixels displayed per square inch (PPI) of a digital image. The level of detail in an image is described by its resolution; a higher resolution indicates more image detail.
In digital imaging, the resolution is frequently expressed as a pixel count. A pixel (short for picture element) is a single point or small square in a graphic image recorded in a rectangular grid. It is the tiniest component of a digital image. The greater the number of pixels employed to represent an image, the more closely the outcome might match the analogue original.
Figure: Image Resolution
Image Super-Resolution
The technique of recovering high-resolution (HR) images from low-resolution (LR) photos is known as image super-resolution (SR). It is a significant class of image processing algorithms in computer vision and image processing, with several real-world applications including medical imaging, satellite imaging, surveillance and security, and astronomical imaging.
Deep learning-based Super Resolution models have been intensively researched in recent years as deep learning techniques have advanced, and they frequently achieve state-of-the-art performance on several SR benchmarks. Deep learning methods ranging from the early Convolutional Neural Networks (CNN)-based method to recent promising Generative Adversarial Nets-based SR approaches have been used to address SR tasks.
Why Image Super Resolution?
Uses of Image Super Resolution
Image Super-Resolution helps in increasing the resolution of an image from low-resolution (LR) to high-resolution (HR) which is highly demanded in many different companies across the industries.
It is commonly employed in the following applications:
Surveillance: The detection, identification, and recognition of faces in low-resolution images received from security cameras.
Figure: Super-Resolution in IR Surveillance Videos
Media: Super-resolution can be used to reduce server costs, as media can be sent at a lower resolution and upscaled on the fly.
Figure: Super Resolution in Broadcast Media
Medical: Obtaining high-resolution MRI pictures can be difficult due to scan time, spatial coverage, and signal-to-noise ratio considerations (SNR). Super-resolution helps resolve this by generating high-resolution MRI from otherwise low-resolution MRI images.
Figure: (a) Image before ISR, (b) Image after ISR
Mathematical Representation for ISR
The following formula can be used to model low-resolution images from high-resolution photographs:
D is the degradation function, Iy is the high-resolution image, Ix is the low-resolution image, and σ is the noise. Only the high-resolution image and the equivalent low-resolution image are provided; the degradation parameters D and σ are unknown. The neural network's job is to identify the inverse function of deterioration using just the HR and LR image data.
Upsampling
Before we can grasp the rest of the theory behind super-resolution, we must first understand upsampling.
Upsampling is the process of increasing the spatial resolution of images or simply the number of pixel rows/columns or both in the image.
Learning Strategies for Super Resolution
Because image super-resolution is an ill-posed problem, the essential issue is how to execute upsampling (i.e., generating high-resolution output from low-resolution input). Based on the upsampling processes(Interpolation and Learning based ) used and their positions in the model, there are primarily three model frameworks.
- Pre-upsampling
The LR input image is first upsampled to suit the dimensions of the desired HR output in this family of algorithms. The upscaled LR image is then processed using a Deep Learning model. VDSR (Very Deep Super Resolution) is an early SR attempt that employs the Pre Upsampling approach. The VDSR network is based on an extremely deep (20 weight layers) convolutional network inspired by VGG networks.
Deep networks typically converge slowly when learning rates are modest. However, increasing convergence and learning rates may result in ballooning gradients. To solve these challenges, residual learning and gradient clipping have been utilised in VDSR. Furthermore, VDSR solves multi-scaled SR problems with a single network.
- Post-upsampling
Increasing the resolution of the LR pictures before the image enhancement phase (in Pre Upsampling approaches) increases the computational cost. This is especially problematic for convolutional networks, whose processing speed is directly proportional to the resolution of the input image. Second, traditional interpolation methods, such as bicubic interpolation, do not provide new information to answer the ill-posed reconstruction problem.
Thus, in the Post Upsampling class of SR approaches, the LR image is first enhanced using a deep model, then upscaled to fulfil the HR image dimension constraints using classic techniques such as bicubic interpolation.
- Progressive-upsampling
Both pre and post-up sampling procedures are useful. However, for instances where LR images must be upscaled by huge factors (say, 8x), the results will be unsatisfactory regardless of whether the upsampling is performed before or after passing through the deep SR network.
In such circumstances, rather than upscaling by 8x in one shot, it makes more sense to gradually upscale the LR image until it meets the spatial dimension parameters of the HR output. Progressive Upsampling methods are those that employ this learning strategy.
The LapSRN, or Laplacian Pyramid Super-Resolution Network architecture, is one such model that progressively reconstructs the sub-band residuals of HR pictures. Sub-band residuals are the discrepancies between the upsampled image and the ground truth HR image at each network level.
Popular Architecture
Several Deep Learning-based models have been presented to handle the SR problem over the years, some of which were innovative at the time and served as stepping stones for future study in SR technology. Let us now look at some of the most prevalent SR architectures.
SRCNN
SRCNN (Super-Resolution Convolutional Neural Network) is a simple CNN architecture with three layers: one for patch extraction, one for non-linear mapping, and one for reconstruction. The patch extraction layer extracts dense patches from the input and uses convolutional filters to represent them. The non-linear mapping layer is made up of 11 convolutional filters that are used to modify the number of channels and introduce non-linearity. The final reconstruction layer, as you might expect, reconstructs the high-resolution image.
SRGAN
To generate visually appealing images, SRGAN employs a GAN-based architecture. It exploits a multi-task loss to improve the findings and uses the SRResnet network architecture as a backend.
The loss is composed of three terms:
- Pixel similarity is captured by MSE loss.
- A deep network is used to capture high-level information via perceptual similarity loss.
- The discriminator's adversarial loss
Although the produced findings had lower PSNR (peak signal-to-noise ratio) values, the model achieved more MOS, implying a higher perceptual quality in the data.
ESPCN
ESPCN or Efficient Sub-Pixel CNN is made up of feature extraction convolutional layers followed by sub-pixel convolution for upsampling.
Sub-pixel convolution works by translating depth to space. In a high-resolution image, pixels from numerous channels in a low-resolution image are rearranged to form a single channel. To illustrate, a 54-pixel input image can be used to rearrange the pixels in the final four channels into a single channel, yielding a 10X10 HR image.
Conclusion
Image Super-Resolution, which seeks to improve the resolution of a degraded/noisy image, is a critical Computer Vision task because of its numerous applications in health, astronomy, and security. Deep Learning has significantly contributed to the advancement of SR technology to its current status.
While we have already produced fantastic outcomes using SR technology, the majority of them have been obtained by fully Supervised Learning, which includes training a deep model with a massive amount of labeled data. Large amounts of data may not be readily available, particularly in applications such as medical imaging, where only qualified doctors may annotate the data. As a result, contemporary SR research has focused on decreasing, if not completely eliminating, supervision from SR tasks.