About RetinaNet
RetinaNet is considered a single-stage computer vision model. It is used extensively at the time of training a model. RetinaNet takes care of the class imbalance with the help of a focal loss function. The work of the focal loss is to implement a modulation term which in turn will help the cross-entropy loss to concentrate on hard negative specimens.
Retina-Net is an integrated single network that has two duty-specific subnetworks and a primary back network. The work of the back network is to provide a convolutional characteristic map in the entire input image while the network itself is an unattached convolutional network.
On the other hand, the primary subnetwork conducts a convolutional object classification on the output received from the back network. The secondary subnetwork executes convolutional bounds regression on the same. The designs of these two secondary subnetworks are comparatively simple and that is why it is specifically recommended for one-step dense detection.
The RetinaNet computer vision model has been created by implementing two upgrades in the already existing single-layered object detection models. These two major improvements are - Focal Loss and FPN or Feature Pyramid Networks.
What is an FPN or Feature Pyramid Network?
The featured pyramids constructed upon image pyramids are known as featured image pyramids. Whenever computer vision is used to distinguish objects of assorted scales in any images, computer vision makes use of the featured image pyramids. Multiple architectures make use of the same pyramid structure.
In this process when an input (image) is received it is subsampled into small-sized images with lower resolutions (while forming a pyramid). From every layer, the hand-engineered specifications are then extracted to disclose the objects from the pyramid. Although this particular process is memory intensive and requires enough computing power, the pyramids become scale-invariant.
The hand-engineered features of the feature pyramid network were later on succeeded by the CNN or convolutional neural networks. The best part of using convolutional neural networks is that the size of the output decreases significantly whenever a convolutional block is formed and simultaneously a pyramidal structure is also formed as well.
Components of RetinaNet architecture
In a Retina architecture model, you will be able to find four primary elements. They are:
- Regression subnetwork
For every ground truth object, the regression subnetwork reverts the offset from the anchor boxes to the bounding boxes.
- Bottom-up pathway
The bottom-up pathway is considered the major network responsible for measuring the feature maps at various calibrations without being dependent on the backbone or input size images.
- Classification subnetwork
The classification subnetwork anticipates the probability of an object staying at every spatial location in the case of object class and anchor box.
- Lateral connections and top-down pathway
The job of the top-down pathway is to illustrate the dimensionally rough feature maps from top pyramid levels and then the lateral connections consolidate both bottom-up layers and top-down layers in the identical dimension size.
To conclude
The field of advanced computer vision is vast and with the help of RetinaNet, we have become one step closer to receiving better results. With the help of advanced models like RetinaNet, object detection has become much more simple, easier, and more approachable.
Reference links:
https://stackabuse.com/retinanet-object-detection-with-pytorch-and-torchvision/