UVA BasePOD
UVA HPC – NVIDIA DGX BasePOD on Rivanna/Afton
UVA’s DGX BasePOD is a shared high-performance GPU resource available on both the Rivanna and Afton clusters. It can be used to support large deep-learning models.
The BasePOD includes 18 DGX A100 nodes with:
-
2TB of RAM memory per node
-
8 A100s per node
-
80 GB of GPU memory per GPU device
Advanced Features (compared to regular GPU nodes):
-
NVLink for fast multi-GPU communication
-
GPUDirect RDMA Peer Memory for fast multi-node multi-GPU communication
-
GPUDirect Storage with 200 TB IBM ESS3200 (NVMe) SpectrumScale storage array
Ideal Scenarios:
-
Job needs multiple GPUs on a single node or even multiple nodes
-
Job (single or multi-GPU) is I/O intensive
If you have ever used an A100 with 80 GB on our system, you were using a POD node!
More info: https://www.rc.virginia.edu/userinfo/hpc/basepod/