References
Hugging Face: https://huggingface.co/docs
Accelerate: https://huggingface.co/docs/accelerate
DeepSpeed: https://www.deepspeed.ai/inference/
vLLM: https://docs.vllm.ai/en/latest/
GPU Memory allocation: https://ksingh7.medium.com/calculate-how-much-gpu-memory-you-need-to-serve-any-llm-67301a844f21
QKV Cache Memory allocation: https://unfoldai.com/gpu-memory-requirements-for-llms/