Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  1. Deployments
  2. LLMs

Supported Models

The following models are currently supported for LLM inference deployment in Dynamiq:

Model
Parameters
Size
Architecture
Minimum AWS instance

meta-llama/Meta-Llama-3.1-8B

8.03B

16.07 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Meta-Llama-3.1-8B-Instruct

8.03B

16.07 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Meta-Llama-3.1-70B

70.6B

141.1 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/Meta-Llama-3.1-70B-Instruct

70.6B

141.1 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/Llama-3.2-1B

1.24B

2.47 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-3.2-1B-Instruct

1.24B

2.47 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-3.2-3B

3.21B

6.43 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-3.2-3B-Instruct

3.21B

6.43 GB

LlamaForCausalLM

g5.xlarge

meta-llama/Llama-2-7b-hf

6.74B

13.49 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Llama-2-7b-chat-hf

6.74B

13.49 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/Llama-2-13b-hf

13B

26.03 GB

LlamaForCausalLM

g5.12xlarge

meta-llama/Llama-2-13b-chat-hf

13B

26.03 GB

LlamaForCausalLM

g5.12xlarge

meta-llama/Llama-2-70b-hf

69B

137.96 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/Llama-2-70b-chat-hf

69B

137.96 GB

LlamaForCausalLM

g5.48xlarge

meta-llama/CodeLlama-7b-hf

6.74B

13.48 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/CodeLlama-7b-Instruct-hf

6.74B

13.48 GB

LlamaForCausalLM

g5.2xlarge

meta-llama/CodeLlama-13b-Instruct-hf

13B

26.03 GB

LlamaForCausalLM

g5.12xlarge

meta-llama/CodeLlama-70b-Instruct-hf

69B

137.96 GB

LlamaForCausalLM

g5.48xlarge

mistralai/Mistral-Nemo-Base-2407

12.2B

24.51 GB

MistralForCausalLM

g5.12xlarge

mistralai/Mistral-Nemo-Instruct-2407

12.2B

24.51 GB

MistralForCausalLM

g5.12xlarge

mistralai/Mistral-7B-v0.1

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-v0.3

7.25B

14.5 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-Instruct-v0.1

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-Instruct-v0.2

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

mistralai/Mistral-7B-Instruct-v0.3

7.25B

14.5 GB

MistralForCausalLM

g5.2xlarge

HuggingFaceH4/zephyr-7b-beta

7.24B

14.48 GB

MistralForCausalLM

g5.2xlarge

microsoft/phi-3-mini-4k-instruct

3.82B

7.64GB

Phi3ForCausalLM

g5.2xlarge

microsoft/Phi-3.5-mini-instruct

3.82B

7.64GB

Phi3ForCausalLM

g5.2xlarge

google/gemma-2b

2.51B

5.01 GB

GemmaForCausalLM

g5.xlarge

google/gemma-1.1-2b-it

2.51B

5.01 GB

GemmaForCausalLM

g5.xlarge

google/gemma-7b

8.54B

17.08 GB

GemmaForCausalLM

g5.2xlarge

google/gemma-1.1-7b-it

8.54B

17.08 GB

GemmaForCausalLM

g5.2xlarge

google/gemma-2-2b

2.61B

5.23 GB

Gemma2ForCausalLM

g5.xlarge

google/gemma-2-2b-it

2.61B

5.23 GB

Gemma2ForCausalLM

g5.xlarge

google/gemma-2-9b

9.24B

18.48 GB

Gemma2ForCausalLM

g5.2xlarge

google/gemma-2-9b-it

9.24B

18.48 GB

Gemma2ForCausalLM

g5.2xlarge

google/gemma-2-27b

27.2B

54.45 GB

Gemma2ForCausalLM

g5.12xlarge

google/gemma-2-27b-it

27.2B

54.45 GB

Gemma2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-1.5B

1.54B

3.09 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-1.5B-Instruct

1.54B

3.09 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-3B

3.09B

6.17 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-3B-Instruct

3.09B

6.17 GB

Qwen2ForCausalLM

g5.xlarge

Qwen/Qwen2.5-7B

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2.5-7B-Instruct

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2.5-14B

14.8B

29.57 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-14B-Instruct

14.8B

29.57 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-32B

32.8B

65.52 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-32B-Instruct

32.8B

65.52 GB

Qwen2ForCausalLM

g5.12xlarge

Qwen/Qwen2.5-72B

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

Qwen/Qwen2.5-72B-Instruct

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

Qwen/Qwen2-7B

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2-7B-Instruct

7.62B

15.23 GB

Qwen2ForCausalLM

g5.2xlarge

Qwen/Qwen2-72B

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

Qwen/Qwen2-72B-Instruct

72.7B

145.41 GB

Qwen2ForCausalLM

g5.48xlarge

VRAM Requirements

A good rule of thumb for VRAM requirements for FP16 (or half-precision models) is to have at least 2x the model size. For example, a model with 8B parameters would require at least 16GB of VRAM. However, the actual VRAM requirements may vary depending on the model and the specific task.

A better and more accurate way to determine the VRAM requirements is to use the following formula:

M = (P x (Q/8)) x 1.2

Where:

  • M is the required GPU memory in GB

  • P is the number of parameters in billions (e.g., 8 for meta-llama/Meta-Llama-3.1-8B or 70 for meta-llama/Meta-Llama-3.1-70B)

  • Q is the precision in bits (e.g., 16 for FP16 or half precision, 8 for 8-bit quantization, or 4 for 4-bit quantization, and so on)

  • 1.2 is a safety factor to account for additional memory requirements (cache overhead, etc.)

Therefore, for a model like meta-llama/Meta-Llama-3.1-8B with 8B parameters and FP16 precision, the minimum VRAM requirement would be:

M = (8 x (16/8)) x 1.2 = 16 x 1.2 = 19.2 GB
PreviousFine-tuned AdaptersNextVector Databases

Last updated 6 months ago