Build Accurate vs. Inaccurate Workflows
Last updated
Last updated
In this section, we will demonstrate the effectiveness of evaluation metrics by creating two distinct workflows: one that generates accurate answers and another that produces incorrect answers. This comparison will highlight how various metrics can differentiate between high-quality and low-quality outputs.
We will create the following two workflows:
Accurate Workflow: This workflow generates correct, precise answers to questions.
Inaccurate Workflow: This workflow intentionally includes errors and irrelevant information in its responses.
Instructions for the Accurate Workflow
Prompt 2: Inaccurate Assistant
Navigate to Workflows: In the Dynamiq portal, go to the Workflows section.
Create New Workflow: Click on the Create button to start a new workflow.
Configure Workflow:
Name: Assign a descriptive name to each workflow (e.g., "Accurate Workflow" and "Inaccurate Workflow").
Prompt: Use the templates provided above for each workflow.
LLM Selection: Choose the appropriate LLM provider and model (if applicable) for generating responses.
Deploy Workflows: Once configured, deploy the workflows to start generating answers based on the provided prompts.
By setting up these workflows, you can clearly observe how various evaluation metrics can distinguish between accurate and inaccurate responses, demonstrating their effectiveness in evaluating AI-generated content.
After setting up your evaluation runs, follow these steps to assess the performance of your workflows using the evaluation metrics you defined.
Initiate Evaluation Run: After configuring your evaluation settings, click Create to start the evaluation job. The system will begin processing the workflows with the selected metrics.
Monitor Evaluation Status: In the Evaluations section, you can check the status of your evaluation runs. It will initially show as "Running" and change to "Succeeded" once completed.
Review Results: Once the evaluation is complete, you can review the answers and their corresponding metrics.
Evaluation Runs Overview: The main screen will list all evaluation runs, showing their names, statuses, and creators. Successful runs will be marked as "Succeeded."
Detailed Results: Click on an evaluation run to see more detailed insights, such as:
Context and Question: The input data used for generating answers.
Ground Truth Answer: The correct answer for comparison.
Workflow Outputs: The answers generated by each workflow version.
Metrics Scores: The scores for each metric, including but not limited to Clarity, Coherence, Ethical Compliance, Language Quality, and Factual Accuracy.
By following these steps to create accurate and inaccurate workflows, you will gain a comprehensive understanding of how various evaluation metrics can be applied to evaluate AI-generated content. This process not only highlights the effectiveness of the metrics but also helps in identifying areas for improvement in your AI workflows.