Predefined metrics
Last updated
Last updated
Creating Predefined Metrics in Dynamiq: A Step-by-Step Guide
Building on our previous guide on setting up evaluations in Dynamiq, this article focuses on creating predefined metrics to streamline your evaluation process. Predefined metrics such as Faithfulness, ContextPrecision, ContextRecall, FactualCorrectness, and AnswerCorrectness provide standardized ways to assess your AI workflows effectively. This guide will recap the essential steps from the main article and detail how to create and utilize these predefined metrics.
Before diving into predefined metrics, let's briefly revisit the key steps covered in the first article:
Creating Custom Metrics:
Navigate to Evaluations -> Metrics -> Create a Metric
.
Explore existing templates or add a new metric by specifying details like Name, Instructions, LLM provider, and Temperature.
Creating an Evaluation Dataset:
Go to Evaluations -> Datasets
.
Add and upload a dataset in JSON format, ensuring it includes fields like question
, context
, and ground_truth_answer
.
Creating Workflows:
Navigate to Workflows
.
Create and deploy workflows with prompts tailored for accurate and inaccurate answers to demonstrate metric effectiveness.
Running and Reviewing Evaluations:
Set up an evaluation run by selecting datasets, workflows, and metrics.
Execute the run and review detailed results to assess workflow performance.
Predefined metrics in Dynamiq offer a convenient way to implement standardized evaluation criteria without the need to craft custom instructions. Follow these steps to create predefined metrics:
1. Navigate to the Metrics Creation Interface
Path: Evaluations -> Metrics -> Create a Metric
.
2. Select the Predefined Tab
In the metrics creation interface, locate and click on the Predefined tab. This section contains a list of ready-to-use metrics designed to evaluate various aspects of AI-generated content.
3. Choose Your Desired Predefined Metric
From the preset options, select the metrics you wish to incorporate into your evaluation. The available predefined metrics include:
Faithfulness: Measures how accurately the generated answer reflects the source context without introducing hallucinations or fabrications.
ContextPrecision: Assesses the precision with which the answer utilizes the provided context, ensuring relevant information is highlighted.
ContextRecall: Evaluates the comprehensiveness of the answer in covering all pertinent details from the context.
FactualCorrectness: Checks the factual accuracy of the information presented in the answer against verified sources.
AnswerCorrectness: Determines whether the answer correctly addresses the posed question based on the ground truth.
4. Configure Metric Details
Name: Assign a descriptive name to each metric for easy identification (e.g., "Faithfulness Metric").
Description: (Optional) Provide a brief description to clarify the purpose and application of the metric.
Other fields such as LLM Provider, Model, and Temperature may auto-populate based on predefined settings but can be adjusted if necessary.
5. Establish Connection and Settings
LLM Selection: Ensure the appropriate Language Model (LLM) provider and model are selected. Dynamiq seamlessly integrates with providers like OpenAI.
Connection: Verify that the connection to the selected LLM model is active.
Temperature: Adjust the temperature setting to control the variability of the metric's output. Lower values yield more deterministic results, while higher values increase randomness.
6. Finalize and Create the Metric
Once all details are configured, click the Create button to add the predefined metric to your evaluation toolkit.
After creating your predefined metrics, you can manage them alongside custom metrics:
View Metrics: Access all your metrics under Evaluations -> Metrics
to monitor and modify as needed.
Edit Metrics: Update metric configurations by selecting a metric and modifying its details.
Delete Metrics: Remove any metrics that are no longer required to keep your evaluation framework streamlined.
With your predefined metrics in place, incorporate them into your evaluation runs to assess your workflows effectively:
Navigate to Evaluation Runs: Go to Evaluations -> Evaluation Runs
.
Create a New Evaluation Run: Click on the New Evaluation Run button.
Configure the Run:
Name: Assign a descriptive name to your evaluation run.
Dataset: Select the evaluation dataset you prepared earlier.
Add Workflows: Choose the workflows (e.g., "accurate-workflow" and "inaccurate-workflow") you wish to evaluate.
Add Metrics: Select your predefined metrics (e.g., Faithfulness, FactualCorrectness) to include in the evaluation.
Map Inputs: Ensure that dataset fields (Context, Question) and workflow outputs (Answer) are correctly mapped to the metric inputs.
Initiate the Run: Click Create to start the evaluation. Monitor the status and review results upon completion to gain insights into your workflows' performance.
Utilizing predefined metrics in Dynamiq simplifies the evaluation process, allowing you to leverage standardized criteria to assess your AI workflows comprehensively. By following this guide, you can efficiently create and manage metrics like Faithfulness, ContextPrecision, ContextRecall, FactualCorrectness, and AnswerCorrectness, ensuring robust and consistent evaluations. Integrate these metrics into your evaluation runs to gain valuable insights and continuously improve the quality of your AI-generated content.
For more detailed information on managing datasets, workflows, and reviewing evaluation results, refer to the previous guide.