LLMs
Last updated
Last updated
The Dynamiq platform allows you to deploy and fine-tune open-source large language models (LLMs) such as Meta’s Llama. Follow these steps to deploy an LLM and make it accessible via API.
Navigate to the Deployments tab:
Navigate to the Deployments section on the Dynamiq platform dashboard.
Click on Add new deployment in the upper-right corner.
Choose LLM from the list of deployment types.
Configure the LLM deployment:
Name: Enter a unique name for the deployment.
Description: Optionally, provide a description to help identify this deployment.
Resource profile: Select the desired instance type for your deployment from the available options (e.g., g5.2xlarge, g5.4xlarge, etc.). The chosen profile determines the computational resources allocated, including GPU, CPU, and memory specifications.
Model: Choose the model you wish to deploy (e.g., Meta-Llama 3.1-8B Instruct). A range of models, such as Llama, Mistral, and Microsoft Phi, are available.
Advanced configuration (optional):
The advanced configuration section allows you to fine-tune the behavior and performance of your LLM deployment based on your workload and resource requirements. Here’s a breakdown of each option:
Replica Autoscaling
Min / Max Replicas: Set minimum and maximum replicas to scale based on load. More replicas improve availability; fewer save costs.
Max Batch Pre-fill Tokens
Purpose: Number of tokens prefetched for batching, improving response time.
Default: 1024. Higher values may improve performance but increase memory use.
Max Batch Total Tokens
Purpose: Total tokens queued in a batch before processing. Higher values improve throughput but may add latency.
Default: 4096.
Max Tokens (per query)
Purpose: Limits tokens per query response to control memory use.
Default: 1024.
Max Input Length (per query)
Purpose: Maximum input tokens per query, affecting memory and processing needs.
Default: 2048.
Quantization
Purpose: Reduces model size for efficiency, with slight accuracy trade-offs.
Click Create to initiate the deployment.
Once the deployment begins, it will initially display a Pending status. During this phase, the platform is allocating resources and preparing the deployment. If the deployment is successful, the status updates to Running, signalling that the LLM is available and ready to handle requests. If an error occurs during deployment, the status changes to Failed, meaning something went wrong and the deployment was unsuccessful.
Once your LLM deployment is in the Running status, it is ready to handle API requests. You can find a code example for calling the deployed model directly in the Endpoint section of the deployment details page on the Dynamiq platform.