Model Quantization - Graph View A technique for reducing the numerical precision of a neural network's weights and activations to decrease model size, memory usage, and inference latency. View concept details Related ConceptsAI Inference Model Pruning Knowledge Distillation Edge AI Deep Learning Neural Networks ← Back to full graph