Wrap Up

Recap

Modern LLMs cannot fit into a single GPU and thus multi-GPU jobs are needed for LLM inference.

UVA HPC provides a variety of GPUs as well as the NVIDIA BasePOD (H200s were recently added).

Plan how many and which GPUs you will need before running your LLM inference job.

Use a well-established framework to implement model parallelism.

Last updated on Jul 8, 2025