Supported Models

The following models are currently supported for LLM inference deployment in Dynamiq:

Model

Parameters

Size

Architecture

Minimum AWS instance

meta-llama/Meta-Llama-3.1-8B

8.03B

16.07 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Meta-Llama-3.1-8B-Instruct

8.03B

16.07 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Meta-Llama-3.1-70B

70.6B

141.1 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/Meta-Llama-3.1-70B-Instruct

70.6B

141.1 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/Llama-3.2-1B

1.24B

2.47 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-3.2-1B-Instruct

1.24B

2.47 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-3.2-3B

3.21B

6.43 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-3.2-3B-Instruct

3.21B

6.43 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-2-7b-hf

6.74B

13.49 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Llama-2-7b-chat-hf

6.74B

13.49 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Llama-2-13b-hf

13B

26.03 GB

LlamaForCausalLM

g5.12xlarge

meta-llama/Llama-2-13b-chat-hf

13B

26.03 GB

LlamaForCausalLM

g5.12xlarge

meta-llama/Llama-2-70b-hf

69B

137.96 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/Llama-2-70b-chat-hf

69B

137.96 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/CodeLlama-7b-hf

6.74B

13.48 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/CodeLlama-7b-Instruct-hf

6.74B

13.48 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/CodeLlama-13b-Instruct-hf

13B

26.03 GB

LlamaForCausalLM

g5.12xlarge

meta-llama/CodeLlama-70b-Instruct-hf

69B

137.96 GB

LlamaForCausalLM

g5.48xlarge

mistralai/Mistral-Nemo-Base-2407

12.2B

24.51 GB

MistralForCausalLM

g5.12xlarge

mistralai/Mistral-Nemo-Instruct-2407

12.2B

24.51 GB

MistralForCausalLM

g5.12xlarge

mistralai/Mistral-7B-v0.1

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-v0.3

7.25B

14.5 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-Instruct-v0.1

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-Instruct-v0.2

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-Instruct-v0.3

7.25B

14.5 GB

MistralForCausalLM

g5.2xlarge

HuggingFaceH4/zephyr-7b-beta

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

microsoft/phi-3-mini-4k-instruct

3.82B

7.64GB

Phi3ForCausalLM

g5.2xlarge

microsoft/Phi-3.5-mini-instruct

3.82B

7.64GB

Phi3ForCausalLM

g5.2xlarge

google/gemma-2b

2.51B

5.01 GB

GemmaForCausalLM

g5.xlarge

google/gemma-1.1-2b-it

2.51B

5.01 GB

GemmaForCausalLM

g5.xlarge

google/gemma-7b

8.54B

17.08 GB

GemmaForCausalLM

g5.2xlarge

google/gemma-1.1-7b-it

8.54B

17.08 GB

GemmaForCausalLM

g5.2xlarge

google/gemma-2-2b

2.61B

5.23 GB

Gemma2ForCausalLM

g5.xlarge

google/gemma-2-2b-it

2.61B

5.23 GB

Gemma2ForCausalLM

g5.xlarge

google/gemma-2-9b

9.24B

18.48 GB

Gemma2ForCausalLM

g5.2xlarge

google/gemma-2-9b-it

9.24B

18.48 GB

Gemma2ForCausalLM

g5.2xlarge

google/gemma-2-27b

27.2B

54.45 GB

Gemma2ForCausalLM

g5.12xlarge

google/gemma-2-27b-it

27.2B

54.45 GB

Gemma2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-1.5B

1.54B

3.09 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-1.5B-Instruct

1.54B

3.09 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-3B

3.09B

6.17 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-3B-Instruct

3.09B

6.17 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-7B

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2.5-7B-Instruct

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2.5-14B

14.8B

29.57 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-14B-Instruct

14.8B

29.57 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-32B

32.8B

65.52 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-32B-Instruct

32.8B

65.52 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-72B

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

Qwen/Qwen2.5-72B-Instruct

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

Qwen/Qwen2-7B

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2-7B-Instruct

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2-72B

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

Qwen/Qwen2-72B-Instruct

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

VRAM Requirements

A good rule of thumb for VRAM requirements for FP16 (or half-precision models) is to have at least 2x the model size. For example, a model with 8B parameters would require at least 16GB of VRAM. However, the actual VRAM requirements may vary depending on the model and the specific task.

A better and more accurate way to determine the VRAM requirements is to use the following formula:

M = (P x (Q/8)) x 1.2

Where:

M is the required GPU memory in GB
P is the number of parameters in billions (e.g., 8 for meta-llama/Meta-Llama-3.1-8B or 70 for meta-llama/Meta-Llama-3.1-70B)
Q is the precision in bits (e.g., 16 for FP16 or half precision, 8 for 8-bit quantization, or 4 for 4-bit quantization, and so on)
1.2 is a safety factor to account for additional memory requirements (cache overhead, etc.)

Therefore, for a model like meta-llama/Meta-Llama-3.1-8B with 8B parameters and FP16 precision, the minimum VRAM requirement would be:

M = (8 x (16/8)) x 1.2 = 16 x 1.2 = 19.2 GB

PreviousFine-tuned Adapters NextVector Databases

Last updated 8 months ago