What is Inference?

Inference is the process in which a trained (or fine-tuned) LLM makes a prediction for a given input.

Inference diagram showing a prompt phase followed by decode iterations that generate tokens one at a time until EOS.

EOS: end of sequence

Previous
Next
© 2026 The Rector and Visitors of the University of Virginia