Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
  • Support Center
Powered by GitBook
On this page
  • Creating an Evaluation Run for Workflows
  • Steps to Create an Evaluation Run
  • Running and Reviewing an Evaluation
  • Reviewing Evaluation Results
  • Conclusion
  1. Evaluations

Evaluation Runs

PreviousDatasetsNextExamples

Last updated 1 month ago

Creating an Evaluation Run for Workflows

Now that we have our workflows and metrics set up, it's time to create an evaluation run. This will allow us to assess the performance of our workflows using the metrics we previously defined.

Steps to Create an Evaluation Run

  1. Navigate to Evaluations:

    • In the Dynamiq portal, go to the Evaluations section.

  2. Create New Evaluation Run:

    • Click on the New Evaluation Run button to start setting up your evaluation.

  3. Configure Evaluation Run:

    • Name: Enter a descriptive name for your evaluation run.

    • Dataset: Select the dataset you prepared earlier. Ensure you choose the correct version.

  4. Add Workflows:

    • Click on Add Workflow.

    • Select the workflows you want to evaluate (e.g., "accurate-workflow" and "inaccurate-workflow").

    • Choose the appropriate workflow version.

  5. Input Mappings:

    • Map the dataset fields to the workflow inputs. For example:

      • Context: Map to $.dataset.context

      • Question: Map to $.dataset.question

  6. Add Metrics:

    • Click on Add Metric.

    • Select the metrics you want to use for evaluation (e.g., Factual Accuracy, Completeness).

    • Map the metric inputs to the appropriate fields:

      • Question: Map to $.dataset.question

      • Answer: Map to $.workflow.answer

      • Ground Truth: Map to $.dataset.groundTruthAnswer

  7. Create Evaluation Run:

    • Once all configurations are set, click the Create button to initiate the evaluation run.

Running and Reviewing an Evaluation

After setting up your evaluation run, you can quickly assess the performance of your workflows using the selected metrics. Here’s how to execute and review an evaluation run.

Steps to Execute an Evaluation Run

  1. Initiate Evaluation Run:

    • After configuring your evaluation settings, click Create to start the evaluation job. The system will begin processing the workflows with the selected metrics.

  2. Monitor Evaluation Status:

    • In the Evaluations section, you can see the status of your evaluation runs. The status will initially show as "Running" and will change to "Succeeded" once completed.

  3. Review Results:

    • Once the evaluation is complete, you can review the answers and their corresponding metrics.

Reviewing Evaluation Results

  • Evaluation Runs Overview: The main screen will list all evaluation runs, showing their names, statuses, and creators. Successful runs will be marked as "Succeeded."

  • Detailed Results: Click on an evaluation run to see detailed results. You will find:

    • Context and Question: The input data used for generating answers.

    • Ground Truth Answer: The correct answer for comparison.

    • Workflow Outputs: Answers generated by each workflow version.

    • Metrics Scores: Scores for each metric, such as Clarity and Coherence, Ethical Compliance, Language Quality, and Factual Accuracy.

Conclusion

By running and reviewing evaluation runs, you can effectively measure the quality of your workflows. This process provides valuable insights into how well your workflows perform and where improvements can be made, ensuring high-quality outputs from your AI systems.