Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  • Evaluating Your AI Agents and RAG Applications with Dynamiq
  • Why Evaluation Matters
  • Simplified Evaluations with Dynamiq
  • Seamless Integration into Your Workflow
  • Explore, Create, and Evaluate with Ease

Evaluations

PreviousKnowledge BasesNextMetrics

Last updated 1 month ago

Evaluating Your AI Agents and RAG Applications with Dynamiq

Evaluations are crucial for gauging the performance and quality of your AI-driven solutions. In the fast-paced world of AI, it's essential to ensure that your outputs consistently meet standards of accuracy, relevance, and reliability.

Dynamiq offers a seamless, powerful, and flexible evaluation framework designed for your Agents and Retrieval-Augmented Generation (RAG) workflows.

Why Evaluation Matters

Regular evaluations are vital for maintaining the integrity of your AI solutions. Without ongoing assessments, models may produce outputs that are inaccurate, irrelevant, or even harmful, which can erode user trust and diminish effectiveness.

By systematically evaluating your models, you can:

  • Quickly identify and address weaknesses.

  • Compare different methods, agents, or configurations.

  • Enhance user satisfaction through continuous improvements.

Simplified Evaluations with Dynamiq

With Dynamiq, evaluating your AI agents and RAG applications is straightforward thanks to three types of metrics:

1. LLM-as-a-Judge Metrics

Customize your evaluations by defining LLM prompts tailored to specific goals. This allows you to assess various criteria, such as:

  • Toxicity: Is the response toxic or inappropriate?

  • Politeness: Does the answer maintain a courteous tone?

  • Relevance: Is the response relevant to the query?

  • Accuracy: Does the answer align with the provided context?

You can also create many more metrics tailored specifically to your needs, enhancing the versatility of your evaluations.

Additionally, we have prepared example templates for LLM-as-a-Judge metrics to help you get started quickly.

2. Predefined Metrics

Dynamiq also offers a range of complex, predefined metrics specifically designed for evaluating answer quality in RAG and agentic applications. Key metrics include:

  • Faithfulness: How well does the answer reflect the provided context?

  • Answer Correctness: How accurately does the answer match the ground truth?

  • Context Precision & Recall: How complete and useful is the contextual information for addressing the question?

These metrics and others are readily available for your use, providing a solid foundation for your evaluations.

3. Custom Python Code Metrics

For more advanced users, Dynamiq allows you to define custom metrics using Python code. This feature enables you to create tailored evaluation methods that fit your specific needs.

We have also prepared example templates for these custom Python metrics to facilitate your implementation.

Seamless Integration into Your Workflow

Dynamiq makes it easy to integrate its evaluation framework with your Agentic or RAG applications. The platform streamlines the process of evaluating, measuring, and comparing different workflows, helping you quickly identify the best-performing solution and make iterative improvements.

Explore, Create, and Evaluate with Ease

In the upcoming sections, we’ll provide step-by-step guidance on how to:

  • Create and customize metrics to suit your requirements.

  • Build robust evaluation datasets.

  • Conduct complete evaluation runs and analyze results.

Additionally, we’ll present real end-to-end examples to help you effectively utilize Dynamiq.


In a time when quality is critical for adoption, Dynamiq empowers you to confidently deliver reliable and high-performing AI applications. Stay tuned for detailed guides that will enhance your understanding and use of our evaluation framework!