detrs
DETRs, or Detection Transformers, are a family of end-to-end object detectors that apply transformer architectures to visual data. They were introduced by Carion and colleagues in 2020 in the paper End-to-End Object Detection with Transformers, and they represented a shift toward single-stage, post-processing-free object detection.
In a typical DETR, a convolutional neural network backbone (such as ResNet with a feature pyramid) extracts
Several variants have been proposed to improve training efficiency and performance. Deformable DETR introduces deformable attention
Applications of DETRs include general object detection and related tasks such as instance segmentation and panoptic
DETRs represent a notable development in computer vision, illustrating how transformer architectures can be integrated into