The receptive field of a convolutional neural network is a crucial topic to remember while constructing new models or attempting to comprehend old ones. Knowing about it enables us to go further into the inner workings of the neuronal architecture we're interested in and helps us to consider potential enhancements.
In this blog, we'll go over what the receptive field is and will touch on its various other aspects.
Precisely this blog will take you through:
- What is a Receptive Field?
- Why is the Receptive field important?
- Receptive Field of Deep Convolutional Network
- Ways to increase receptive field
- Effective Receptive Field
- Conclusion
What is a Receptive Field?
The receptive field, or field of vision, of a unit in a certain layer of the network, is a fundamental notion in deep CNNs. In contrast to fully connected networks, where the value of each unit is determined by the complete input to the network, a unit in convolutional networks is determined by a subset of the input. This region in the input is the unit's receptive field.
In other terms, The Receptive Field (RF) in deep learning is defined as the size of the area in the input that creates the feature. It is essentially a measure of the relationship of an output feature (of any layer) with the input area (patch).
Why is the Receptive field important?
Comprehension and diagnosing how deep CNNs operate requires an understanding of the idea of the receptive field. Because everything in an input picture beyond the receptive field of a unit has no effect on its value, the receptive field must be carefully controlled to ensure that it covers the whole relevant image region.
Many tasks, particularly dense prediction tasks such as semantic picture segmentation, stereo, and optical flow estimation, require that each output pixel have a large receptive field so that no significant information is missed when generating the prediction.
Receptive Field of Deep Convolutional Network
A convolutional unit is only affected by a small section (patch) of the input. Because each unit has access to the whole input region, we never refer to the RF on fully linked layers.
Convolutional neural networks (CNNs) have a reduced receptive field when shallow architectures are assumed. The receptive field of CNN's expands exponentially one layer at a time. The pixels near the center of a receptive field have a substantial influence on the output of CNNs. The center pixels can transmit information to the output by many distinct channels in the forward pass, but the boundary pixels have extremely few paths to transport their values. As a result, the center pixels have a considerably bigger gradient magnitude from that output during a backward pass.
To calculate the closed-form receptive field for single-path networks below are the mathematical equations:
For two consecutive convolutional layers f2 and f1 with kernel size k, stride s, and receptive field r:
Alternatively, in a broader sense:
This equation appears to be generalizable into a very compact equation that just applies this procedure repeatedly for L layers. We may build a closed-form solution that only depends on the convolutional parameters of kernels and strides by further examining the recursive equation.
Where r0 represented the architecture's intended RF.
Ways to increase receptive field
In essence, there are several methods and tactics for increasing the RF, which may be described as follows: Increase the number of convolutional layers (make the network deeper), Increase the number of pooling layers or stride convolutions (sub-sampling), make use of dilated convolutions, and depthwise convolution.
Increase the number of convolutional layers: As each extra layer is added, the receptive field size is increased linearly by the kernel size. Furthermore, it has been empirically proven that as the theoretical receptive field increases, the effective (actual) receptive field decreases. RF stands for radio frequency, whereas ERF stands for effective radio frequency.
Subsampling: Subsampling methods such as pooling, on the other hand, double the receptive field size. These approaches are combined in modern designs such as ResNet.
Dilated Convolutions: The RF is increased exponentially by successively placing dilated convolutions. In a convolutional kernel, dilations create "holes". The "holes" effectively define a space between the kernel values. So, while the amount of weights in the kernel remains constant, the weights are no longer applied to samples that are physically close. Dilating a kernel by a factor of r introduces r striding.
Depth-wise convolutions: This tactic enhances the receptive field with a tiny computing footprint, making it a compact technique to expand the receptive field with fewer parameters. The channel-wise spatial convolution is depth wise convolution. It is important to note, however, that depth-wise convolutions do not immediately enhance the receptive field.
However, because we utilize fewer parameters and do more compact computations, we can add additional layers. As a result, we may have a larger receptive field with nearly the same number of parameters.
Effective Receptive Field
Not all pixels in a receptive field contribute equally to the response of an output unit. Obviously, not all pixels inside the receptive field have the same effect on the output feature. Intuitively, pixels in the center of a receptive field appear to have a much bigger influence on output because they have more "paths" to contribute to the output.
As a result, the effective receptive field (ERF) of the feature may be defined as the relative relevance of each input pixel. In other words, the effective receptive field of a central output unit is defined by ERF as the region containing any input pixel having a non-negligible impact on that unit.
The contribution of center pixels in the forward and backward pass is intuitively recognized as: "In the forward pass, core pixels may transmit information to the output through many distinct channels, whereas pixels in the receptive field's outside region have extremely few paths to disseminate their influence." Gradients from an output unit are transmitted throughout all routes in the backward pass, so the center pixels have a significantly bigger magnitude for the gradient from that output.
Conclusion
Till now we learned about the receptive field of a convolutional neural network and why knowing its size and various other aspects is important. Finally, understanding RF in convolutional neural networks is an open research issue that will give many insights into why deep convolutional networks operate so well.