Model Quantization - Graph View

A technique for reducing the numerical precision of a neural network's weights and activations to decrease model size, memory usage, and inference latency.