Speculative Decoding - Graph View
An inference acceleration technique where a smaller draft model proposes multiple tokens that a larger target model verifies in parallel, speeding up generation without changing output quality.
View concept details
Related Concepts
← Back to full graph