Wrap Up

Recap

Modern LLMs cannot fit into a single GPU and thus multi-GPU jobs are needed for LLM inference.

UVA HPC provides a variety of GPUs as well as the NVIDIA BasePOD (H200s were recently added).

Plan how many and which GPUs you will need before running your LLM inference job.

Use a well-established framework to implement model parallelism.

Previous
Next
RC Logo RC Logo © 2025 The Rector and Visitors of the University of Virginia