# Dataset creation

## **Section 2: Preparing the Evaluation Dataset**

In this section, we'll focus exclusively on creating a robust evaluation dataset that will effectively test our workflows and the evaluation metrics we've established. A well-prepared dataset is crucial for assessing the performance of AI workflows and ensuring that the evaluation metrics capture the nuances of different answers.

***

### **Why Prepare an Evaluation Dataset?**

* **Diversity of Inputs:** A varied dataset challenges the workflows with different types of questions and answers.
* **Testing Metrics Effectiveness:** Helps verify that the evaluation metrics accurately assess answers of varying quality.
* **Benchmarking Performance:** Establishes a baseline to compare different workflows or iterations.
* **Identifying Weaknesses:** Uncovers areas where workflows may struggle, providing opportunities for improvement.

***

### **Components of the Dataset**

Our evaluation dataset will consist of:

* **Questions:** A mix of factual questions covering various topics, including general knowledge and AI-related subjects.
* **Ground Truth Answers:** The correct answers to the questions, serving as the reference for evaluations.

***

#### **Understanding Ground Truth**

The **ground truth** is the accurate and authoritative answer to each question. It serves as a benchmark against which we'll compare the outputs from our workflows. By having clear ground truth answers, we can effectively measure the accuracy and quality of the workflow-generated answers.

***

### **Dataset Format**

We'll structure the dataset in JSON format for ease of use and integration. The data will be organized as a list of objects, each containing a `"question"` and its corresponding `"ground_truth"` answer.

```json
[
  {
    "question": "What is the capital of Japan?",
    "ground_truth": "Tokyo."
  },
  {
    "question": "Who proposed the theory of general relativity?",
    "ground_truth": "Albert Einstein proposed the theory of general relativity, which revolutionized our understanding of gravity."
  },
  {
    "question": "What is the chemical symbol for Sodium?",
    "ground_truth": "Na."
  },
  {
    "question": "What is machine learning in the field of AI?",
    "ground_truth": "Machine learning is a subset of AI where algorithms improve through experience with data, enabling computers to make predictions or decisions without being explicitly programmed."
  }
  // Add more question-ground_truth pairs as needed
]
```

***

### **Creating the Dataset Step-by-Step**

#### **1. Select Diverse Questions**

When creating questions for the dataset:

* **Include Varied Topics:** Incorporate questions from different domains (science, history, technology, etc.) to test breadth.
* **Mix Difficulty Levels:** Use both simple and complex questions to challenge the workflows.
* **Ensure Clarity:** Questions should be clearly worded to avoid ambiguity.

**Examples:**

* **General Knowledge:** "What is the boiling point of water at sea level in Celsius?"
* **Science:** "What particle is exchanged to mediate the electromagnetic force?"
* **Technology:** "In computing, what does 'HTTP' stand for?"
* **Mathematics:** "What is the value of π (pi) up to two decimal places?"

#### **2. Provide Accurate Ground Truth Answers**

For each question:

* **Ensure Correctness:** Verify that the ground truth answer is accurate and authoritative.
* **Clarity and Completeness:** Answers should be clear and, where appropriate, provide sufficient detail.
* **Vary Answer Lengths:** Include both short answers (one or two words) and slightly longer explanations.

**Examples:**

* **Question:** "What is the boiling point of water at sea level in Celsius?"
  * **Ground Truth:** "100 degrees Celsius."
* **Question:** "What particle is exchanged to mediate the electromagnetic force?"
  * **Ground Truth:** "The photon mediates the electromagnetic force according to quantum electrodynamics."

#### **3. Introduce Variety in Answers**

Include a range of answer types to test different aspects of the evaluation metrics:

* **Short, Direct Answers:** For straightforward questions.
* **Detailed Explanations:** For complex questions that benefit from additional context.
* **Answers Requiring Precision:** Questions that test the specificity of the answer.

#### **4. Ensure Coverage of All Metrics**

Design the dataset so that, collectively, the questions and ground truth answers will allow testing of:

* **Factual Accuracy:** Questions where incorrect answers would be easily detectable.
* **Completeness:** Questions that have multiple components or require comprehensive answers.
* **Clarity and Coherence:** Questions that could be answered ambiguously, testing the need for clear responses.
* **Relevance:** Questions where off-topic answers could be a risk.
* **Language Quality:** Include technical terms or complex language to test grammar and spelling.
* **Ethical Compliance:** Ensure content is appropriate, avoiding sensitive topics.
* **Originality and Creativity:** Questions that allow for creative explanations or unique perspectives.

#### **5. Document the Dataset**

Create a clear record of the dataset for reference:

* **Maintain a Master List:** Keep all question and ground truth pairs in a single document or file.
* **Include Metadata (Optional):** You can add tags or notes about which metrics each question is intended to test.

***

### **Sample Dataset Entries**

Here are additional sample entries demonstrating variety and coverage:

#### **Entry 1: Short Answer**

* **Question:** "What is the smallest prime number?"
* **Ground Truth:** "2."

#### **Entry 2: Longer Answer**

* **Question:** "Explain the significance of the Turing Test in artificial intelligence."
* **Ground Truth:** "The Turing Test, proposed by Alan Turing, is an assessment of a machine's ability to exhibit human-like intelligence indistinguishable from a human, serving as a fundamental concept in AI development."

#### **Entry 3: Technical Term**

* **Question:** "What does 'CPU' stand for in computing?"
* **Ground Truth:** "Central Processing Unit."

#### **Entry 4: Multi-Part Answer**

* **Question:** "Name the three branches of government in the United States."
* **Ground Truth:** "The legislative branch, the executive branch, and the judicial branch."

***

### **Best Practices for Preparing the Dataset**

* **Balanced Representation:** Ensure a fair distribution of topics and answer types.
* **Quality Control:** Double-check ground truth answers for accuracy.
* **Clarity and Precision:** Avoid ambiguous questions and answers.
* **Ethical Considerations:** Exclude sensitive or inappropriate content to maintain ethical standards.
* **Relevance to Use Case:** Tailor the dataset to be relevant to the domains your AI workflows will likely encounter.

***

### **Organizing the Dataset**

* **File Structure:** Save the dataset in a structured format (e.g., `dataset.json`) for easy access.
* **Data Integrity:** Protect the dataset from unauthorized modifications to maintain the integrity of evaluations.

***

### **Utilizing the Dataset in Evaluations**

While implementation details will be covered in later sections, the dataset you've prepared here will serve as the foundation for:

* **Testing Workflows:** Feeding questions into your AI workflows to generate answers.
* **Evaluating Performance:** Comparing workflow-generated answers against the ground truth using the evaluation metrics.

***

### Example dataset

```json
[
  {
    "question": "What is the capital of Japan?",
    "ground_truth": "Tokyo."
  },
  {
    "question": "Who proposed the theory of general relativity?",
    "ground_truth": "Albert Einstein proposed the theory of general relativity, which revolutionized our understanding of gravity."
  },
  {
    "question": "What is the chemical symbol for Sodium?",
    "ground_truth": "Na."
  },
  {
    "question": "What is machine learning in the field of AI?",
    "ground_truth": "Machine learning is a subset of AI where algorithms improve through experience with data, enabling computers to make predictions or decisions without being explicitly programmed."
  },
  {
    "question": "What is the boiling point of water at sea level in Celsius?",
    "ground_truth": "100 degrees Celsius."
  },
  {
    "question": "Who developed the theory of evolution by natural selection?",
    "ground_truth": "Charles Darwin developed the theory of evolution by natural selection, explaining how species evolve over time through inherited traits."
  },
  {
    "question": "What is the smallest prime number?",
    "ground_truth": "2."
  },
  {
    "question": "Explain the significance of the Turing Test in artificial intelligence.",
    "ground_truth": "The Turing Test, proposed by Alan Turing, is an assessment of a machine's ability to exhibit human-like intelligence indistinguishable from a human, serving as a fundamental concept in AI development."
  },
  {
    "question": "What gas do animals exhale that plants use for photosynthesis?",
    "ground_truth": "Carbon dioxide."
  },
  {
    "question": "What is the powerhouse of the cell called?",
    "ground_truth": "The mitochondria."
  },
  {
    "question": "Which planet is known as the Red Planet?",
    "ground_truth": "Mars is known as the Red Planet due to its reddish appearance caused by iron oxide on its surface."
  },
  {
    "question": "What does 'CPU' stand for in computing?",
    "ground_truth": "Central Processing Unit."
  },
  {
    "question": "Name the three branches of government in the United States.",
    "ground_truth": "The legislative branch, the executive branch, and the judicial branch."
  },
  {
    "question": "What is the largest mammal in the world?",
    "ground_truth": "The blue whale is the largest mammal on Earth, reaching lengths of up to 100 feet."
  },
  {
    "question": "Who is known as the father of computers?",
    "ground_truth": "Charles Babbage is often called the father of computers for designing the first mechanical computer."
  },
  {
    "question": "What element does 'O' represent on the periodic table?",
    "ground_truth": "Oxygen."
  },
  {
    "question": "What is the main component of the sun?",
    "ground_truth": "Hydrogen, which undergoes nuclear fusion to form helium in the sun's core."
  },
  {
    "question": "Explain the process of photosynthesis in plants.",
    "ground_truth": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, and water into glucose and oxygen, using chlorophyll in their leaves."
  },
  {
    "question": "What is the value of π (pi) up to two decimal places?",
    "ground_truth": "Approximately 3.14."
  },
  {
    "question": "In computing, what does 'HTTP' stand for?",
    "ground_truth": "HyperText Transfer Protocol."
  },
  {
    "question": "What particle is exchanged to mediate the electromagnetic force?",
    "ground_truth": "The photon mediates the electromagnetic force according to quantum electrodynamics."
  },
  {
    "question": "What is the largest organ in the human body?",
    "ground_truth": "The skin is the largest organ of the human body."
  },
  {
    "question": "Who painted the Mona Lisa?",
    "ground_truth": "Leonardo da Vinci."
  },
  {
    "question": "What is the primary language spoken in Brazil?",
    "ground_truth": "Portuguese."
  },
  {
    "question": "What does DNA stand for?",
    "ground_truth": "Deoxyribonucleic acid."
  },
  {
    "question": "What is the chemical formula for water?",
    "ground_truth": "H₂O."
  },
  {
    "question": "Who wrote the play 'Romeo and Juliet'?",
    "ground_truth": "William Shakespeare."
  },
  {
    "question": "What is the acceleration due to gravity on Earth?",
    "ground_truth": "Approximately 9.8 meters per second squared (m/s²)."
  },
  {
    "question": "Which element has the atomic number 1?",
    "ground_truth": "Hydrogen."
  },
  {
    "question": "What is the formula for calculating the area of a circle?",
    "ground_truth": "Area = π × radius squared (A = πr²)."
  },
  {
    "question": "Who is known for the law of planetary motion?",
    "ground_truth": "Johannes Kepler formulated the laws of planetary motion."
  },
  {
    "question": "What organelle is responsible for protein synthesis?",
    "ground_truth": "Ribosomes are responsible for protein synthesis."
  },
  {
    "question": "What is the freezing point of water in Fahrenheit?",
    "ground_truth": "32 degrees Fahrenheit."
  },
  {
    "question": "Who is considered the father of modern physics?",
    "ground_truth": "Albert Einstein is often considered the father of modern physics."
  },
  {
    "question": "Explain Newton's third law of motion.",
    "ground_truth": "Newton's third law states that for every action, there is an equal and opposite reaction."
  },
  {
    "question": "What is the primary gas found in Earth's atmosphere?",
    "ground_truth": "Nitrogen, making up about 78% of the atmosphere."
  },
  {
    "question": "Who proposed the three laws of motion?",
    "ground_truth": "Sir Isaac Newton."
  },
  {
    "question": "What is an atom's central core called?",
    "ground_truth": "The nucleus."
  },
  {
    "question": "What is the distance light travels in one year called?",
    "ground_truth": "A light-year."
  },
  {
    "question": "What is the currency of the United Kingdom?",
    "ground_truth": "The Pound Sterling."
  },
  {
    "question": "Who is known for the theory of evolution by natural selection?",
    "ground_truth": "Charles Darwin."
  }
]
```

This dataset provides a comprehensive collection of question and ground truth answer pairs, which you can use to:

* **Test Workflows:** Input the questions into Workflow A and Workflow B to generate answers.
* **Evaluate Metrics:** Use the evaluation metrics to assess the generated answers against the ground truths.
* **Analyze Performance:** Compare the outputs from both workflows to see how they perform across different metrics.

**Notes on the Dataset:**

* **Variety of Topics:** The questions cover diverse subjects, including science, technology, mathematics, history, and general knowledge.
* **Answer Lengths:** The answers range from short, direct responses to slightly longer explanations, allowing you to test how the workflows handle different answer lengths.
* **Clarity and Precision:** Ground truth answers are crafted to be clear and precise to serve as an effective benchmark.

***

### **Conclusion**

By carefully preparing a comprehensive and diverse evaluation dataset:

* **You Equip Yourself:** With the necessary tools to thoroughly test and refine your AI workflows.
* **Enhance Reliability:** Establishing a solid ground truth ensures that evaluations are meaningful and accurate.
* **Facilitate Improvement:** Identifying areas where the workflows may underperform, guiding future enhancements.

***

**Next Steps:**

* Proceed to **Section 3: Implementing the Workflows**, where we'll set up the workflows that generate answers for evaluation.
* Keep your dataset handy, as it will be integral in the upcoming implementation and evaluation processes.

***

*Excited to see how your dataset brings value to the evaluation framework? Let's continue our journey!* 🚀


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.getdynamiq.ai/evaluations-code/dataset-creation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.