References

Hugging Face: https://huggingface.co/docs

Accelerate: https://huggingface.co/docs/accelerate

DeepSpeed: https://www.deepspeed.ai/inference/

vLLM: https://docs.vllm.ai/en/latest/

GPU Memory allocation: https://ksingh7.medium.com/calculate-how-much-gpu-memory-you-need-to-serve-any-llm-67301a844f21

QKV Cache Memory allocation: https://unfoldai.com/gpu-memory-requirements-for-llms/

Last updated on Jul 8, 2025