Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  1. Deployments

LLMs

PreviousTracing Workflow ExecutionNextFine-tuned Adapters

Last updated 6 months ago

The Dynamiq platform allows you to deploy and fine-tune open-source large language models (LLMs) such as Meta’s Llama. Follow these steps to deploy an LLM and make it accessible via API.

  1. Navigate to the Deployments tab:

    • Navigate to the Deployments section on the Dynamiq platform dashboard.

    • Click on Add new deployment in the upper-right corner.

    • Choose LLM from the list of deployment types.

  2. Configure the LLM deployment:

    • Name: Enter a unique name for the deployment.

    • Description: Optionally, provide a description to help identify this deployment.

    • Resource profile: Select the desired instance type for your deployment from the available options (e.g., g5.2xlarge, g5.4xlarge, etc.). The chosen profile determines the computational resources allocated, including GPU, CPU, and memory specifications.

    • Model: Choose the model you wish to deploy (e.g., Meta-Llama 3.1-8B Instruct). A range of models, such as Llama, Mistral, and Microsoft Phi, are available.

    • Advanced configuration (optional):

      The advanced configuration section allows you to fine-tune the behavior and performance of your LLM deployment based on your workload and resource requirements. Here’s a breakdown of each option:

      • Replica Autoscaling

        • Min / Max Replicas: Set minimum and maximum replicas to scale based on load. More replicas improve availability; fewer save costs.

      • Max Batch Pre-fill Tokens

        • Purpose: Number of tokens prefetched for batching, improving response time.

        • Default: 1024. Higher values may improve performance but increase memory use.

      • Max Batch Total Tokens

        • Purpose: Total tokens queued in a batch before processing. Higher values improve throughput but may add latency.

        • Default: 4096.

      • Max Tokens (per query)

        • Purpose: Limits tokens per query response to control memory use.

        • Default: 1024.

      • Max Input Length (per query)

        • Purpose: Maximum input tokens per query, affecting memory and processing needs.

        • Default: 2048.

      • Quantization

        • Purpose: Reduces model size for efficiency, with slight accuracy trade-offs.

  3. Click Create to initiate the deployment.

Once the deployment begins, it will initially display a Pending status. During this phase, the platform is allocating resources and preparing the deployment. If the deployment is successful, the status updates to Running, signalling that the LLM is available and ready to handle requests. If an error occurs during deployment, the status changes to Failed, meaning something went wrong and the deployment was unsuccessful.

Using Deployed LLMs

Once your LLM deployment is in the Running status, it is ready to handle API requests. You can find a code example for calling the deployed model directly in the Endpoint section of the deployment details page on the Dynamiq platform.