Model Parameters
The learned numerical values (weights and biases) within a neural network that determine how the model transforms inputs into outputs.
Also known as: Weights, Model Weights, Neural Network Parameters, Learned Parameters
Category: AI
Tags: ai, machine-learning, neural-networks, fundamentals, models
Explanation
Model parameters are the internal numerical values that a neural network learns during training. They are the 'knowledge' of the model — encoding everything it has learned about language, reasoning, and the world into billions of numbers.
**What Parameters Are**:
Parameters are the adjustable values in a neural network:
- **Weights**: Numbers that determine how strongly one neuron's output influences another. They control what information flows through the network and how it's transformed.
- **Biases**: Offset values that shift the activation of neurons, allowing the model to better fit the data.
Every connection between neurons in the network has an associated weight. A model with 70 billion parameters has 70 billion of these learned values.
**Parameter Count and Model Scale**:
| Model | Parameters | Approximate Size |
|-------|-----------|------------------|
| GPT-2 | 1.5B | 3 GB |
| LLaMA 2 | 7B–70B | 14–140 GB |
| GPT-4 | Estimated 1.7T (MoE) | Not disclosed |
| Claude | Not disclosed | Not disclosed |
More parameters generally means:
- Greater capacity to learn and represent complex patterns
- Better performance on diverse tasks
- Higher computational cost for both training and inference
- More memory required to store and run the model
**How Parameters Are Learned**:
1. **Initialize**: Parameters start as small random values
2. **Forward pass**: Input flows through the network, transformed by parameters at each layer
3. **Loss computation**: Compare the model's output to the expected output
4. **Backpropagation**: Calculate how each parameter contributed to the error
5. **Update**: Adjust parameters slightly to reduce the error (using an optimizer like Adam)
6. **Repeat**: Process billions of training examples, gradually refining all parameters
**Parameters vs. Hyperparameters**:
- **Parameters**: Learned automatically during training (weights, biases)
- **Hyperparameters**: Set manually before training (learning rate, number of layers, batch size, context window size)
**Parameter Efficiency**:
Not all parameters contribute equally. Techniques to manage parameter count include:
- **Model pruning**: Removing parameters that contribute little to performance
- **Quantization**: Representing parameters with fewer bits (e.g., 16-bit or 4-bit instead of 32-bit)
- **Mixture of Experts (MoE)**: Only activating a subset of parameters for each input
- **LoRA/QLoRA**: Fine-tuning only a small number of additional parameters instead of all of them
**Why Parameter Count Matters to Users**:
- Indicates approximate model capability (though architecture and training data matter more)
- Determines hardware requirements for running the model locally
- Affects inference speed and cost
- Larger isn't always better — a well-trained smaller model can outperform a poorly trained larger one
Related Concepts
← Back to all concepts