Workflows

Section 3: Implementing the Workflows

In this section, we'll set up the two workflows that will generate answers for evaluation using the dataset we prepared in Section 2. These workflows are essential for testing the effectiveness of our evaluation metrics and demonstrating how different approaches can impact the quality of the answers.


Overview of the Workflows

We'll implement two distinct workflows:

  1. Workflow A: Perfect Answering Workflow

    • Designed to produce accurate, high-quality answers that closely align with the ground truth.

    • Mimics an ideal AI assistant.

  2. Workflow B: Imperfect Answering Workflow

    • Intentionally introduces mistakes and suboptimal answers.

    • Helps test the evaluation metrics' ability to detect and score various issues.

By comparing the outputs of these workflows, we'll be able to assess how effectively our evaluation framework differentiates between high-quality and lower-quality answers.


Implementing Workflow A: Perfect Answering Workflow

Objective

  • To generate precise, accurate, and well-articulated answers that align closely with the ground truth.

Approach

  • Provide clear instructions to the AI model to ensure high-quality responses.

  • Encourage accuracy, completeness, and clarity in the answers.

Workflow A Prompt

Here's the prompt that will guide the AI model in Workflow A:

Notes:

  • Variable Placeholder: Replace {{question}} with the actual question from the dataset.

  • Consistency: Maintain a consistent format for all inputs to ensure uniform outputs.

  • Tone and Style: Encourage professionalism and clarity in the answers.

Implementation Steps

  1. Prepare the Input:

    • For each question in your dataset, create an input to the AI model using the Workflow A prompt.

    • Ensure the {{question}} placeholder is correctly replaced.

  2. Run the Workflow:

    • Input the prepared prompt into the AI model (e.g., GPT-based model) to generate the answer.

    • Collect the generated answer for evaluation.

Example

Input Prompt:

Generated Answer:

"Tokyo."


Implementing Workflow B: Imperfect Answering Workflow

Objective

  • To generate answers with intentional mistakes or issues, such as factual errors, incomplete information, or unclear language.

  • To test the evaluation metrics' ability to detect and score these imperfections.

Approach

  • Provide instructions that allow the AI model to include errors or suboptimal content in the answers.

  • Simulate potential real-world mistakes that an AI assistant might make.

Workflow B Prompt

Here's the prompt that will guide the AI model in Workflow B:

Notes:

  • Variable Placeholder: Replace {{question}} with the actual question from the dataset.

  • Intentional Imperfections: The prompt allows the AI model to introduce mistakes, helping to test the evaluation framework.

Implementation Steps

  1. Prepare the Input:

    • For each question in your dataset, create an input to the AI model using the Workflow B prompt.

    • Ensure the {{question}} placeholder is correctly replaced.

  2. Run the Workflow:

    • Input the prepared prompt into the AI model to generate the answer.

    • Collect the generated answer for evaluation.

  3. Automate the Process (Optional):

    • Utilize a tool like Dynamiq to automate the process for all questions in the dataset.

Example

Input Prompt:

Generated Answer:

"The capital of Japan is Kyoto, a beautiful city with lots of history."


Running the Workflows

Preparation

  • Ensure Access to AI Model: Make sure you have access to an AI language model capable of generating the answers (e.g., OpenAI's GPT-3.5-Turbo or GPT-4).


Next Steps:

  • Proceed to Section 4: Evaluating the Workflow Outputs, where we'll assess the generated answers using our evaluation metrics.

  • Prepare the collected answers and ensure they're ready for the evaluation process.


Excited to see how our workflows perform under evaluation? Let's move forward and find out! 🚀

Last updated