Autoregressive Model
A type of generative model that produces output sequentially, using each generated element as input for predicting the next one.
Also known as: Autoregressive Language Model, AR Model, Causal Model
Category: AI
Tags: ai, machine-learning, models, generation, architectures
Explanation
An autoregressive model generates sequences one element at a time, where each new element is conditioned on all previously generated elements. This sequential, left-to-right generation is the dominant paradigm for modern large language models.
**How Autoregressive Generation Works**:
The term 'autoregressive' comes from statistics: the model 'regresses' on its own previous outputs. For text:
1. Given a prompt, generate token 1
2. Given prompt + token 1, generate token 2
3. Given prompt + token 1 + token 2, generate token 3
4. Continue until a stopping condition
Each step uses the full preceding context, meaning the model's output at step n depends on all tokens from steps 1 through n-1.
**Autoregressive vs. Other Architectures**:
| Model Type | Generation | Example Models |
|------------|-----------|----------------|
| Autoregressive | Sequential, left-to-right | GPT, Claude, LLaMA |
| Masked Language Model | Fills in blanks | BERT, RoBERTa |
| Encoder-Decoder | Encode input, then decode output | T5, BART |
| Diffusion | Iterative denoising | Stable Diffusion, DALL-E 3 |
**Strengths**:
- **Natural for generation**: The sequential process mirrors how humans write and speak
- **Flexible output length**: Can generate any length of text up to the context window limit
- **Strong coherence**: Each token considers all previous context
- **Scalable**: The transformer architecture makes autoregressive models highly parallelizable during training
- **Emergent abilities**: Larger autoregressive models develop capabilities like in-context learning and reasoning
**Limitations**:
- **Sequential generation is slow**: Each token requires a full forward pass through the model — cannot be parallelized during inference
- **Error accumulation**: A poor early token can cascade, affecting all subsequent tokens
- **Exposure bias**: During training, the model sees ground-truth previous tokens; during inference, it sees its own (potentially flawed) outputs
- **Left-to-right constraint**: The model cannot revise earlier tokens based on later context (though techniques like self-consistency and chain-of-thought help mitigate this)
**Optimization Techniques**:
- **KV-cache**: Stores intermediate computations to avoid redundant work across generation steps
- **Speculative decoding**: Uses a smaller draft model to propose multiple tokens, verified by the main model
- **Batching**: Groups multiple requests to amortize the cost of model computation
- **Quantization**: Reduces numerical precision to speed up each forward pass
**Beyond Text**:
Autoregressive models are used beyond language — for music generation (treating notes as tokens), code completion, protein sequence design, and even image generation (treating image patches as a sequence).
Related Concepts
← Back to all concepts