cover

Optimizing Language Models: Decoding Griffin’s Local Attention and Memory Efficiency

6 May 2025

Explores how Griffin’s local attention and recurrent layers outperform traditional Transformers, improving language modeling at scale and faster inference.

cover

Overcoming HBM-VMEM Bottlenecks in TPU-v3 Recurrent Workloads

6 May 2025

Novel recurrence gates and complex-valued units boost stability and efficiency in linear recurrent models, optimized for TPU-v3 hardware.

cover

Hawk and Griffin: Efficient RNN Models Redefining AI Performance

14 Jan 2025

This research introduces Hawk and Griffin models, efficient RNN alternatives to Transformers, with reduced latency and strong long-sequence performance.

cover

RNNs vs. Transformers: Innovations in Scalability and Efficiency

14 Jan 2025

This research explores scalable RNN and SSM innovations, comparing their efficiency and performance to Transformers and linear attention techniques.

cover

Hawk and Griffin: Mastering Long-Context Extrapolation in AI

14 Jan 2025

This research shows Hawk and Griffin models excel at long-context extrapolation, predicting tokens for sequences 4x longer than training.

cover

Griffin Model: Advancing Copying and Retrieval in AI Tasks

14 Jan 2025

This research shows Griffin excels in copying and retrieval tasks, outperforming Hawk and Transformers in extrapolation for longer sequences.

cover

Hawk and Griffin Models: Superior Latency and Throughput in AI Inference

14 Jan 2025

This research shows Hawk and Griffin outperform MQA Transformers in latency and throughput, excelling in long-sequence and large-batch inference.

cover

Recurrent Models: Enhancing Latency and Throughput Efficiency

14 Jan 2025

This research shows recurrent models reduce cache size, improving latency and throughput over Transformers for long sequences.

cover

Recurrent Models: Decoding Faster with Lower Latency and Higher Throughput

14 Jan 2025

This research shows recurrent models excel in decoding, offering lower latency and higher throughput than Transformers, especially for long sequences.