PerTransformer
PerTransformer refers to a class of deep learning models that extend the Transformer architecture by incorporating perceptual elements or focusing on perceptual tasks. These models aim to leverage the self-attention mechanisms of Transformers while enhancing their ability to process and understand sensory information, such as images, audio, or video, in a way that aligns with human perception.
One common approach involves adapting the Transformer's input representation. Instead of processing raw pixel values or
Another direction involves modifying the attention mechanism itself. Some PerTransformer variants might introduce attention heads or
Applications of PerTransformer models span various domains, including image captioning, visual question answering, audio event detection,