Key Tools and Frameworks

Hugging Face Accelerate is an easy interface with transformers and PyTorch.

DeepSpeed is good for large scale (thousand+ GPUs) well-optimized training and inference.

vLLM focuses on high-speed, efficient LLM inference.

Megatron is a low-level, open source API for custom development frameworks.

RC’s resources are best used for Accelerate, unless there are specific inference/training-centric memory management or speed-ups required from DeepSpeed or vLLM. Accelerate will set up model parallelism for you. Code examples will be provided.

More information: https://huggingface.co/docs/accelerate/en/concept_guides/big_model_inference

Last updated on Jul 8, 2025