fasiformer
Fasiformer is a family of transformer-based neural networks designed to deliver fast inference on resource-constrained hardware while preserving accuracy. The aim is to enable real-time or near real-time processing on devices such as smartphones, embedded processors, and edge servers. Fasiformer architectures typically incorporate efficient attention mechanisms and memory-conscious training techniques.
Core design principles include substituting standard quadratic self-attention with linear-time or kernel-based approximations, such as kernelized
In practice, Fasiformer architectures come in encoder, decoder, or encoder-decoder variants. They use the usual transformer
Applications include natural language processing tasks such as translation and summarization, real-time transcription, voice assistants, and
Related topics include efficient transformers, Linformer, Reformer, and Longformer.