# Supported Models

The following models are currently supported for LLM inference deployment in Dynamiq:

| Model                                  | Parameters | Size      | Architecture       | Minimum AWS instance |
| -------------------------------------- | ---------- | --------- | ------------------ | -------------------- |
| meta-llama/Meta-Llama-3.1-8B           | 8.03B      | 16.07 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Meta-Llama-3.1-8B-Instruct  | 8.03B      | 16.07 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Meta-Llama-3.1-70B          | 70.6B      | 141.1 GB  | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/Meta-Llama-3.1-70B-Instruct | 70.6B      | 141.1 GB  | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/Llama-3.2-1B                | 1.24B      | 2.47 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-3.2-1B-Instruct       | 1.24B      | 2.47 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-3.2-3B                | 3.21B      | 6.43 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-3.2-3B-Instruct       | 3.21B      | 6.43 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-2-7b-hf               | 6.74B      | 13.49 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Llama-2-7b-chat-hf          | 6.74B      | 13.49 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Llama-2-13b-hf              | 13B        | 26.03 GB  | LlamaForCausalLM   | g5.12xlarge          |
| meta-llama/Llama-2-13b-chat-hf         | 13B        | 26.03 GB  | LlamaForCausalLM   | g5.12xlarge          |
| meta-llama/Llama-2-70b-hf              | 69B        | 137.96 GB | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/Llama-2-70b-chat-hf         | 69B        | 137.96 GB | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/CodeLlama-7b-hf             | 6.74B      | 13.48 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/CodeLlama-7b-Instruct-hf    | 6.74B      | 13.48 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/CodeLlama-13b-Instruct-hf   | 13B        | 26.03 GB  | LlamaForCausalLM   | g5.12xlarge          |
| meta-llama/CodeLlama-70b-Instruct-hf   | 69B        | 137.96 GB | LlamaForCausalLM   | g5.48xlarge          |
| mistralai/Mistral-Nemo-Base-2407       | 12.2B      | 24.51 GB  | MistralForCausalLM | g5.12xlarge          |
| mistralai/Mistral-Nemo-Instruct-2407   | 12.2B      | 24.51 GB  | MistralForCausalLM | g5.12xlarge          |
| mistralai/Mistral-7B-v0.1              | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-v0.3              | 7.25B      | 14.5 GB   | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-Instruct-v0.1     | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-Instruct-v0.2     | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-Instruct-v0.3     | 7.25B      | 14.5 GB   | MistralForCausalLM | g5.2xlarge           |
| HuggingFaceH4/zephyr-7b-beta           | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| microsoft/phi-3-mini-4k-instruct       | 3.82B      | 7.64GB    | Phi3ForCausalLM    | g5.2xlarge           |
| microsoft/Phi-3.5-mini-instruct        | 3.82B      | 7.64GB    | Phi3ForCausalLM    | g5.2xlarge           |
| google/gemma-2b                        | 2.51B      | 5.01 GB   | GemmaForCausalLM   | g5.xlarge            |
| google/gemma-1.1-2b-it                 | 2.51B      | 5.01 GB   | GemmaForCausalLM   | g5.xlarge            |
| google/gemma-7b                        | 8.54B      | 17.08 GB  | GemmaForCausalLM   | g5.2xlarge           |
| google/gemma-1.1-7b-it                 | 8.54B      | 17.08 GB  | GemmaForCausalLM   | g5.2xlarge           |
| google/gemma-2-2b                      | 2.61B      | 5.23 GB   | Gemma2ForCausalLM  | g5.xlarge            |
| google/gemma-2-2b-it                   | 2.61B      | 5.23 GB   | Gemma2ForCausalLM  | g5.xlarge            |
| google/gemma-2-9b                      | 9.24B      | 18.48 GB  | Gemma2ForCausalLM  | g5.2xlarge           |
| google/gemma-2-9b-it                   | 9.24B      | 18.48 GB  | Gemma2ForCausalLM  | g5.2xlarge           |
| google/gemma-2-27b                     | 27.2B      | 54.45 GB  | Gemma2ForCausalLM  | g5.12xlarge          |
| google/gemma-2-27b-it                  | 27.2B      | 54.45 GB  | Gemma2ForCausalLM  | g5.12xlarge          |
| Qwen/Qwen2.5-1.5B                      | 1.54B      | 3.09 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-1.5B-Instruct             | 1.54B      | 3.09 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-3B                        | 3.09B      | 6.17 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-3B-Instruct               | 3.09B      | 6.17 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-7B                        | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2.5-7B-Instruct               | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2.5-14B                       | 14.8B      | 29.57 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-14B-Instruct              | 14.8B      | 29.57 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-32B                       | 32.8B      | 65.52 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-32B-Instruct              | 32.8B      | 65.52 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-72B                       | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |
| Qwen/Qwen2.5-72B-Instruct              | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |
| Qwen/Qwen2-7B                          | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2-7B-Instruct                 | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2-72B                         | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |
| Qwen/Qwen2-72B-Instruct                | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |

### VRAM Requirements

A good rule of thumb for VRAM requirements for FP16 (or half-precision models) is to have at least 2x the model size. For example, a model with 8B parameters would require at least 16GB of VRAM. However, the actual VRAM requirements may vary depending on the model and the specific task.

A better and more accurate way to determine the VRAM requirements is to use the following formula:

```
M = (P x (Q/8)) x 1.2
```

Where:

* `M` is the required GPU memory in GB
* `P` is the number of parameters in billions (e.g., 8 for `meta-llama/Meta-Llama-3.1-8B` or 70 for `meta-llama/Meta-Llama-3.1-70B`)
* `Q` is the precision in bits (e.g., 16 for FP16 or half precision, 8 for 8-bit quantization, or 4 for 4-bit quantization, and so on)
* `1.2` is a safety factor to account for additional memory requirements (cache overhead, etc.)

Therefore, for a model like `meta-llama/Meta-Llama-3.1-8B` with 8B parameters and FP16 precision, the minimum VRAM requirement would be:

```
M = (8 x (16/8)) x 1.2 = 16 x 1.2 = 19.2 GB
```
