Predefined metrics
Last updated
Last updated
In this guide, we will focus on creating predefined metrics in Dynamiq to streamline your evaluation processes. Predefined metrics such as Faithfulness, Context Precision, Context Recall, Factual Correctness, and Answer Correctness provide standardized ways to effectively assess your AI workflows. This guide reiterates essential steps from our previous article and details how to create and utilize these predefined metrics.
Before diving into the creation of predefined metrics, let’s briefly revisit the key steps covered in the earlier article:
Creating Custom Metrics:
Navigate to Evaluations -> Metrics -> Create a Metric.
Explore existing templates or specify a new metric by entering details like Name, Instructions, LLM provider, and Temperature.
Creating an Evaluation Dataset:
Go to Evaluations -> Datasets.
Add and upload a dataset in JSON format, ensuring it includes fields like question, context, and ground_truth_answer.
Creating Workflows:
Navigate to Workflows.
Create and deploy workflows with prompts tailored for accurate and inaccurate answers to demonstrate metric effectiveness.
Running and Reviewing Evaluations:
Set up an evaluation run by selecting datasets, workflows, and metrics.
Execute the run and review detailed results to assess workflow performance.
Predefined metrics in Dynamiq allow for easy implementation of standardized evaluation criteria. Follow these steps to create them:
Navigate to the Metrics Creation Interface:
Go to Evaluations -> Metrics -> Create a Metric.
Select the Predefined Tab:
In the metrics creation interface, click on the Predefined tab to access a list of ready-to-use metrics.
Choose Your Desired Predefined Metric:
Select the metrics you wish to incorporate into your evaluation. The available predefined metrics include:
Faithfulness: Measures how accurately the generated answer reflects the source context without introducing inaccuracies.
Context Precision: Assesses how precisely the answer utilizes the provided context, ensuring relevant information is highlighted.
Context Recall: Evaluates how comprehensively the answer covers all pertinent details from the context.
Factual Correctness: Checks the factual accuracy of the information presented in the answer against verified sources.
Answer Correctness: Determines whether the answer correctly addresses the posed question based on the ground truth.
Configure Metric Details:
Name: Assign a descriptive name for each metric (e.g., "Faithfulness Metric").
Description: (Optional) Provide a brief description clarifying the purpose of the metric.
Other fields such as LLM Provider, Model, and Temperature may auto-populate but can be adjusted if needed.
Establish Connection and Settings:
LLM Selection: Ensure the appropriate LLM provider and model are selected. Dynamiq integrates seamlessly with providers like OpenAI.
Connection: Verify that the connection to the selected LLM model is active.
Temperature: Adjust the temperature setting to control output variability—lower values yield more deterministic results, while higher values introduce randomness.
Finalize and Create the Metric:
Once all details are configured, click the Create button to add the predefined metric to your evaluation toolkit.
With your predefined metrics in place, incorporate them into your evaluation runs to effectively assess your workflows:
Navigate to Evaluation Runs: Go to Evaluations -> Evaluation Runs.
Create a New Evaluation Run: Click on the New Evaluation Run button.
Configure the Run:
Name: Assign a descriptive name to your evaluation run.
Dataset: Select the evaluation dataset you prepared earlier.
Add Workflows: Choose the workflows you wish to evaluate.
Add Metrics: Select your predefined metrics to include in the evaluation.
Map Inputs: Ensure dataset fields (Context, Question) and workflow outputs (Answer) are correctly mapped to the metric inputs.
Initiate the Run: Click Create to start the evaluation. Monitor the status and review results upon completion to gain insights into your workflows' performance.
Utilizing predefined metrics in Dynamiq simplifies the evaluation process, allowing you to leverage standardized criteria for a comprehensive assessment of your AI workflows. By following this guide, you can efficiently create and manage metrics like Faithfulness, Context Precision, Context Recall, Factual Correctness, and Answer Correctness. Integrate these metrics into your evaluation runs to gain valuable insights and continuously improve the quality of your AI-generated content.
For more detailed information on managing datasets, workflows, and reviewing evaluation results, refer to the previous guide.