# Supported Models

The following models are currently supported for LLM inference deployment in Dynamiq:

| Model                                  | Parameters | Size      | Architecture       | Minimum AWS instance |
| -------------------------------------- | ---------- | --------- | ------------------ | -------------------- |
| meta-llama/Meta-Llama-3.1-8B           | 8.03B      | 16.07 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Meta-Llama-3.1-8B-Instruct  | 8.03B      | 16.07 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Meta-Llama-3.1-70B          | 70.6B      | 141.1 GB  | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/Meta-Llama-3.1-70B-Instruct | 70.6B      | 141.1 GB  | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/Llama-3.2-1B                | 1.24B      | 2.47 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-3.2-1B-Instruct       | 1.24B      | 2.47 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-3.2-3B                | 3.21B      | 6.43 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-3.2-3B-Instruct       | 3.21B      | 6.43 GB   | LlamaForCausalLM   | g5.xlarge            |
| meta-llama/Llama-2-7b-hf               | 6.74B      | 13.49 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Llama-2-7b-chat-hf          | 6.74B      | 13.49 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/Llama-2-13b-hf              | 13B        | 26.03 GB  | LlamaForCausalLM   | g5.12xlarge          |
| meta-llama/Llama-2-13b-chat-hf         | 13B        | 26.03 GB  | LlamaForCausalLM   | g5.12xlarge          |
| meta-llama/Llama-2-70b-hf              | 69B        | 137.96 GB | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/Llama-2-70b-chat-hf         | 69B        | 137.96 GB | LlamaForCausalLM   | g5.48xlarge          |
| meta-llama/CodeLlama-7b-hf             | 6.74B      | 13.48 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/CodeLlama-7b-Instruct-hf    | 6.74B      | 13.48 GB  | LlamaForCausalLM   | g5.2xlarge           |
| meta-llama/CodeLlama-13b-Instruct-hf   | 13B        | 26.03 GB  | LlamaForCausalLM   | g5.12xlarge          |
| meta-llama/CodeLlama-70b-Instruct-hf   | 69B        | 137.96 GB | LlamaForCausalLM   | g5.48xlarge          |
| mistralai/Mistral-Nemo-Base-2407       | 12.2B      | 24.51 GB  | MistralForCausalLM | g5.12xlarge          |
| mistralai/Mistral-Nemo-Instruct-2407   | 12.2B      | 24.51 GB  | MistralForCausalLM | g5.12xlarge          |
| mistralai/Mistral-7B-v0.1              | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-v0.3              | 7.25B      | 14.5 GB   | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-Instruct-v0.1     | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-Instruct-v0.2     | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| mistralai/Mistral-7B-Instruct-v0.3     | 7.25B      | 14.5 GB   | MistralForCausalLM | g5.2xlarge           |
| HuggingFaceH4/zephyr-7b-beta           | 7.24B      | 14.48 GB  | MistralForCausalLM | g5.2xlarge           |
| microsoft/phi-3-mini-4k-instruct       | 3.82B      | 7.64GB    | Phi3ForCausalLM    | g5.2xlarge           |
| microsoft/Phi-3.5-mini-instruct        | 3.82B      | 7.64GB    | Phi3ForCausalLM    | g5.2xlarge           |
| google/gemma-2b                        | 2.51B      | 5.01 GB   | GemmaForCausalLM   | g5.xlarge            |
| google/gemma-1.1-2b-it                 | 2.51B      | 5.01 GB   | GemmaForCausalLM   | g5.xlarge            |
| google/gemma-7b                        | 8.54B      | 17.08 GB  | GemmaForCausalLM   | g5.2xlarge           |
| google/gemma-1.1-7b-it                 | 8.54B      | 17.08 GB  | GemmaForCausalLM   | g5.2xlarge           |
| google/gemma-2-2b                      | 2.61B      | 5.23 GB   | Gemma2ForCausalLM  | g5.xlarge            |
| google/gemma-2-2b-it                   | 2.61B      | 5.23 GB   | Gemma2ForCausalLM  | g5.xlarge            |
| google/gemma-2-9b                      | 9.24B      | 18.48 GB  | Gemma2ForCausalLM  | g5.2xlarge           |
| google/gemma-2-9b-it                   | 9.24B      | 18.48 GB  | Gemma2ForCausalLM  | g5.2xlarge           |
| google/gemma-2-27b                     | 27.2B      | 54.45 GB  | Gemma2ForCausalLM  | g5.12xlarge          |
| google/gemma-2-27b-it                  | 27.2B      | 54.45 GB  | Gemma2ForCausalLM  | g5.12xlarge          |
| Qwen/Qwen2.5-1.5B                      | 1.54B      | 3.09 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-1.5B-Instruct             | 1.54B      | 3.09 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-3B                        | 3.09B      | 6.17 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-3B-Instruct               | 3.09B      | 6.17 GB   | Qwen2ForCausalLM   | g5.xlarge            |
| Qwen/Qwen2.5-7B                        | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2.5-7B-Instruct               | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2.5-14B                       | 14.8B      | 29.57 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-14B-Instruct              | 14.8B      | 29.57 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-32B                       | 32.8B      | 65.52 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-32B-Instruct              | 32.8B      | 65.52 GB  | Qwen2ForCausalLM   | g5.12xlarge          |
| Qwen/Qwen2.5-72B                       | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |
| Qwen/Qwen2.5-72B-Instruct              | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |
| Qwen/Qwen2-7B                          | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2-7B-Instruct                 | 7.62B      | 15.23 GB  | Qwen2ForCausalLM   | g5.2xlarge           |
| Qwen/Qwen2-72B                         | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |
| Qwen/Qwen2-72B-Instruct                | 72.7B      | 145.41 GB | Qwen2ForCausalLM   | g5.48xlarge          |

### VRAM Requirements

A good rule of thumb for VRAM requirements for FP16 (or half-precision models) is to have at least 2x the model size. For example, a model with 8B parameters would require at least 16GB of VRAM. However, the actual VRAM requirements may vary depending on the model and the specific task.

A better and more accurate way to determine the VRAM requirements is to use the following formula:

```
M = (P x (Q/8)) x 1.2
```

Where:

* `M` is the required GPU memory in GB
* `P` is the number of parameters in billions (e.g., 8 for `meta-llama/Meta-Llama-3.1-8B` or 70 for `meta-llama/Meta-Llama-3.1-70B`)
* `Q` is the precision in bits (e.g., 16 for FP16 or half precision, 8 for 8-bit quantization, or 4 for 4-bit quantization, and so on)
* `1.2` is a safety factor to account for additional memory requirements (cache overhead, etc.)

Therefore, for a model like `meta-llama/Meta-Llama-3.1-8B` with 8B parameters and FP16 precision, the minimum VRAM requirement would be:

```
M = (8 x (16/8)) x 1.2 = 16 x 1.2 = 19.2 GB
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.getdynamiq.ai/deployments/llms/supported-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
