Build Accurate vs. Inaccurate Workflows

Creating Example Workflows to Showcase Metrics Power

In this section, we will demonstrate the effectiveness of evaluation metrics by creating two distinct workflows: one that generates accurate answers and another that produces incorrect answers. This comparison will highlight how various metrics can differentiate between high-quality and low-quality outputs.

Workflow Overview

We will create the following two workflows:

Accurate Workflow: This workflow generates correct, precise answers to questions.
Inaccurate Workflow: This workflow intentionally includes errors and irrelevant information in its responses.

Prompt 1: Accurate Answers

Instructions for the Accurate Workflow

You are an expert assistant providing precise and accurate answers to questions. 
Ensure that your answers are correct, concise, and, where appropriate, include brief explanations to enhance understanding.

Instructions:
- Provide accurate information in response to the question.
- Keep the answer clear and concise.
- Include a brief explanation if it adds value.
- Do not include irrelevant information.
- Use proper grammar and spelling.
- Maintain a professional tone.

Question: {{question}}
Context: {{context}}

Answer:

Prompt 2: Inaccurate Assistant

You are an assistant providing answers to questions, but you often make mistakes and include irrelevant information.

Instructions:
- Provide an answer to the question and include:
- Major and minor factual errors.
- Incomplete or insufficient information.
- Irrelevant or off-topic details.
- Grammatical and spelling mistakes
- Aim for a casual tone


Question: {{question}}
Context: {{context}}

Answer:

Creating and Deploying the Workflows

Steps to Create the Workflows

Navigate to Workflows: In the Dynamiq portal, go to the Workflows section.
Create New Workflow: Click on the Create button to start a new workflow.
Configure Workflow:
- Name: Assign a descriptive name to each workflow (e.g., "Accurate Workflow" and "Inaccurate Workflow").
- Prompt: Use the templates provided above for each workflow.
- LLM Selection: Choose the appropriate LLM provider and model (if applicable) for generating responses.
Deploy Workflows: Once configured, deploy the workflows to start generating answers based on the provided prompts.

By setting up these workflows, you can clearly observe how various evaluation metrics can distinguish between accurate and inaccurate responses, demonstrating their effectiveness in evaluating AI-generated content.

Running and Reviewing an Evaluation

After setting up your evaluation runs, follow these steps to assess the performance of your workflows using the evaluation metrics you defined.

Steps to Execute an Evaluation Run

Initiate Evaluation Run: After configuring your evaluation settings, click Create to start the evaluation job. The system will begin processing the workflows with the selected metrics.
Monitor Evaluation Status: In the Evaluations section, you can check the status of your evaluation runs. It will initially show as "Running" and change to "Succeeded" once completed.
Review Results: Once the evaluation is complete, you can review the answers and their corresponding metrics.

Reviewing Evaluation Results

Evaluation Runs Overview: The main screen will list all evaluation runs, showing their names, statuses, and creators. Successful runs will be marked as "Succeeded."
Detailed Results: Click on an evaluation run to see more detailed insights, such as:
- Context and Question: The input data used for generating answers.
- Ground Truth Answer: The correct answer for comparison.
- Workflow Outputs: The answers generated by each workflow version.
- Metrics Scores: The scores for each metric, including but not limited to Clarity, Coherence, Ethical Compliance, Language Quality, and Factual Accuracy.

Conclusion

By following these steps to create accurate and inaccurate workflows, you will gain a comprehensive understanding of how various evaluation metrics can be applied to evaluate AI-generated content. This process not only highlights the effectiveness of the metrics but also helps in identifying areas for improvement in your AI workflows.

PreviousExamples NextExamples

Last updated 3 months ago