Supported Models
Last updated
Last updated
The following models are currently supported for LLM inference deployment in Dynamiq:
Model | Parameters | Size | Architecture | Minimum AWS instance |
---|---|---|---|---|
A good rule of thumb for VRAM requirements for FP16 (or half-precision models) is to have at least 2x the model size. For example, a model with 8B parameters would require at least 16GB of VRAM. However, the actual VRAM requirements may vary depending on the model and the specific task.
A better and more accurate way to determine the VRAM requirements is to use the following formula:
Where:
M
is the required GPU memory in GB
P
is the number of parameters in billions (e.g., 8 for meta-llama/Meta-Llama-3.1-8B
or 70 for meta-llama/Meta-Llama-3.1-70B
)
Q
is the precision in bits (e.g., 16 for FP16 or half precision, 8 for 8-bit quantization, or 4 for 4-bit quantization, and so on)
1.2
is a safety factor to account for additional memory requirements (cache overhead, etc.)
Therefore, for a model like meta-llama/Meta-Llama-3.1-8B
with 8B parameters and FP16 precision, the minimum VRAM requirement would be:
meta-llama/Meta-Llama-3.1-8B
8.03B
16.07 GB
LlamaForCausalLM
g5.2xlarge
meta-llama/Meta-Llama-3.1-8B-Instruct
8.03B
16.07 GB
LlamaForCausalLM
g5.2xlarge
meta-llama/Meta-Llama-3.1-70B
70.6B
141.1 GB
LlamaForCausalLM
g5.48xlarge
meta-llama/Meta-Llama-3.1-70B-Instruct
70.6B
141.1 GB
LlamaForCausalLM
g5.48xlarge
meta-llama/Llama-3.2-1B
1.24B
2.47 GB
LlamaForCausalLM
g5.xlarge
meta-llama/Llama-3.2-1B-Instruct
1.24B
2.47 GB
LlamaForCausalLM
g5.xlarge
meta-llama/Llama-3.2-3B
3.21B
6.43 GB
LlamaForCausalLM
g5.xlarge
meta-llama/Llama-3.2-3B-Instruct
3.21B
6.43 GB
LlamaForCausalLM
g5.xlarge
meta-llama/Llama-2-7b-hf
6.74B
13.49 GB
LlamaForCausalLM
g5.2xlarge
meta-llama/Llama-2-7b-chat-hf
6.74B
13.49 GB
LlamaForCausalLM
g5.2xlarge
meta-llama/Llama-2-13b-hf
13B
26.03 GB
LlamaForCausalLM
g5.12xlarge
meta-llama/Llama-2-13b-chat-hf
13B
26.03 GB
LlamaForCausalLM
g5.12xlarge
meta-llama/Llama-2-70b-hf
69B
137.96 GB
LlamaForCausalLM
g5.48xlarge
meta-llama/Llama-2-70b-chat-hf
69B
137.96 GB
LlamaForCausalLM
g5.48xlarge
meta-llama/CodeLlama-7b-hf
6.74B
13.48 GB
LlamaForCausalLM
g5.2xlarge
meta-llama/CodeLlama-7b-Instruct-hf
6.74B
13.48 GB
LlamaForCausalLM
g5.2xlarge
meta-llama/CodeLlama-13b-Instruct-hf
13B
26.03 GB
LlamaForCausalLM
g5.12xlarge
meta-llama/CodeLlama-70b-Instruct-hf
69B
137.96 GB
LlamaForCausalLM
g5.48xlarge
mistralai/Mistral-Nemo-Base-2407
12.2B
24.51 GB
MistralForCausalLM
g5.12xlarge
mistralai/Mistral-Nemo-Instruct-2407
12.2B
24.51 GB
MistralForCausalLM
g5.12xlarge
mistralai/Mistral-7B-v0.1
7.24B
14.48 GB
MistralForCausalLM
g5.2xlarge
mistralai/Mistral-7B-v0.3
7.25B
14.5 GB
MistralForCausalLM
g5.2xlarge
mistralai/Mistral-7B-Instruct-v0.1
7.24B
14.48 GB
MistralForCausalLM
g5.2xlarge
mistralai/Mistral-7B-Instruct-v0.2
7.24B
14.48 GB
MistralForCausalLM
g5.2xlarge
mistralai/Mistral-7B-Instruct-v0.3
7.25B
14.5 GB
MistralForCausalLM
g5.2xlarge
HuggingFaceH4/zephyr-7b-beta
7.24B
14.48 GB
MistralForCausalLM
g5.2xlarge
microsoft/phi-3-mini-4k-instruct
3.82B
7.64GB
Phi3ForCausalLM
g5.2xlarge
microsoft/Phi-3.5-mini-instruct
3.82B
7.64GB
Phi3ForCausalLM
g5.2xlarge
google/gemma-2b
2.51B
5.01 GB
GemmaForCausalLM
g5.xlarge
google/gemma-1.1-2b-it
2.51B
5.01 GB
GemmaForCausalLM
g5.xlarge
google/gemma-7b
8.54B
17.08 GB
GemmaForCausalLM
g5.2xlarge
google/gemma-1.1-7b-it
8.54B
17.08 GB
GemmaForCausalLM
g5.2xlarge
google/gemma-2-2b
2.61B
5.23 GB
Gemma2ForCausalLM
g5.xlarge
google/gemma-2-2b-it
2.61B
5.23 GB
Gemma2ForCausalLM
g5.xlarge
google/gemma-2-9b
9.24B
18.48 GB
Gemma2ForCausalLM
g5.2xlarge
google/gemma-2-9b-it
9.24B
18.48 GB
Gemma2ForCausalLM
g5.2xlarge
google/gemma-2-27b
27.2B
54.45 GB
Gemma2ForCausalLM
g5.12xlarge
google/gemma-2-27b-it
27.2B
54.45 GB
Gemma2ForCausalLM
g5.12xlarge
Qwen/Qwen2.5-1.5B
1.54B
3.09 GB
Qwen2ForCausalLM
g5.xlarge
Qwen/Qwen2.5-1.5B-Instruct
1.54B
3.09 GB
Qwen2ForCausalLM
g5.xlarge
Qwen/Qwen2.5-3B
3.09B
6.17 GB
Qwen2ForCausalLM
g5.xlarge
Qwen/Qwen2.5-3B-Instruct
3.09B
6.17 GB
Qwen2ForCausalLM
g5.xlarge
Qwen/Qwen2.5-7B
7.62B
15.23 GB
Qwen2ForCausalLM
g5.2xlarge
Qwen/Qwen2.5-7B-Instruct
7.62B
15.23 GB
Qwen2ForCausalLM
g5.2xlarge
Qwen/Qwen2.5-14B
14.8B
29.57 GB
Qwen2ForCausalLM
g5.12xlarge
Qwen/Qwen2.5-14B-Instruct
14.8B
29.57 GB
Qwen2ForCausalLM
g5.12xlarge
Qwen/Qwen2.5-32B
32.8B
65.52 GB
Qwen2ForCausalLM
g5.12xlarge
Qwen/Qwen2.5-32B-Instruct
32.8B
65.52 GB
Qwen2ForCausalLM
g5.12xlarge
Qwen/Qwen2.5-72B
72.7B
145.41 GB
Qwen2ForCausalLM
g5.48xlarge
Qwen/Qwen2.5-72B-Instruct
72.7B
145.41 GB
Qwen2ForCausalLM
g5.48xlarge
Qwen/Qwen2-7B
7.62B
15.23 GB
Qwen2ForCausalLM
g5.2xlarge
Qwen/Qwen2-7B-Instruct
7.62B
15.23 GB
Qwen2ForCausalLM
g5.2xlarge
Qwen/Qwen2-72B
72.7B
145.41 GB
Qwen2ForCausalLM
g5.48xlarge
Qwen/Qwen2-72B-Instruct
72.7B
145.41 GB
Qwen2ForCausalLM
g5.48xlarge