Workflows
Section 3: Implementing the Workflows
In this section, we'll set up the two workflows that will generate answers for evaluation using the dataset we prepared in Section 2. These workflows are essential for testing the effectiveness of our evaluation metrics and demonstrating how different approaches can impact the quality of the answers.
Overview of the Workflows
We'll implement two distinct workflows:
Workflow A: Perfect Answering Workflow
Designed to produce accurate, high-quality answers that closely align with the ground truth.
Mimics an ideal AI assistant.
Workflow B: Imperfect Answering Workflow
Intentionally introduces mistakes and suboptimal answers.
Helps test the evaluation metrics' ability to detect and score various issues.
By comparing the outputs of these workflows, we'll be able to assess how effectively our evaluation framework differentiates between high-quality and lower-quality answers.
Implementing Workflow A: Perfect Answering Workflow
Objective
To generate precise, accurate, and well-articulated answers that align closely with the ground truth.
Approach
Provide clear instructions to the AI model to ensure high-quality responses.
Encourage accuracy, completeness, and clarity in the answers.
Workflow A Prompt
Here's the prompt that will guide the AI model in Workflow A:
Notes:
Variable Placeholder: Replace
{{question}}with the actual question from the dataset.Consistency: Maintain a consistent format for all inputs to ensure uniform outputs.
Tone and Style: Encourage professionalism and clarity in the answers.
Implementation Steps
Prepare the Input:
For each question in your dataset, create an input to the AI model using the Workflow A prompt.
Ensure the
{{question}}placeholder is correctly replaced.
Run the Workflow:
Input the prepared prompt into the AI model (e.g., GPT-based model) to generate the answer.
Collect the generated answer for evaluation.
Example
Input Prompt:
Generated Answer:
"Tokyo."
Implementing Workflow B: Imperfect Answering Workflow
Objective
To generate answers with intentional mistakes or issues, such as factual errors, incomplete information, or unclear language.
To test the evaluation metrics' ability to detect and score these imperfections.
Approach
Provide instructions that allow the AI model to include errors or suboptimal content in the answers.
Simulate potential real-world mistakes that an AI assistant might make.
Workflow B Prompt
Here's the prompt that will guide the AI model in Workflow B:
Notes:
Variable Placeholder: Replace
{{question}}with the actual question from the dataset.Intentional Imperfections: The prompt allows the AI model to introduce mistakes, helping to test the evaluation framework.
Implementation Steps
Prepare the Input:
For each question in your dataset, create an input to the AI model using the Workflow B prompt.
Ensure the
{{question}}placeholder is correctly replaced.
Run the Workflow:
Input the prepared prompt into the AI model to generate the answer.
Collect the generated answer for evaluation.
Automate the Process (Optional):
Utilize a tool like Dynamiq to automate the process for all questions in the dataset.
Example
Input Prompt:
Generated Answer:
"The capital of Japan is Kyoto, a beautiful city with lots of history."
Running the Workflows
Preparation
Ensure Access to AI Model: Make sure you have access to an AI language model capable of generating the answers (e.g., OpenAI's GPT-3.5-Turbo or GPT-4).
Next Steps:
Proceed to Section 4: Evaluating the Workflow Outputs, where we'll assess the generated answers using our evaluation metrics.
Prepare the collected answers and ensure they're ready for the evaluation process.
Excited to see how our workflows perform under evaluation? Let's move forward and find out! 🚀
Last updated