# Evaluations code

### Evaluating the Quality of RAG Nodes

Evaluating the quality of Retrieval-Augmented Generation (RAG) nodes is crucial to ensure that your application delivers accurate and contextually relevant responses. By assessing the performance of these nodes, you can identify areas for improvement and optimize your workflows for better user experiences.

#### **Importance of Evaluation**

1. **Accuracy:** Ensures that the generated responses are correct and align with the user's queries or the provided ground truth.
2. **Relevance:** Measures how well the responses address the user's search queries, enhancing the application's usefulness.
3. **Optimization:** Helps in fine-tuning the workflow components, leading to more efficient and effective RAG applications.

#### **Using LLMs for Evaluation**

Dynamiq provides tools to evaluate workflow responses using Language Models (LLMs) as judges. This approach leverages the capabilities of LLMs to assess the relevance and correctness of responses.

#### **Example Code: Evaluating Relevance**

The following example demonstrates how to evaluate the relevance of workflow responses using an LLM:

```python
from dynamiq.components.evaluators.llm_evaluator import LLMEvaluator
from dynamiq.nodes.llms import BaseLLM, OpenAI
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

def run_relevance_to_search_query(llm: BaseLLM):
    instruction_text = """
    Evaluate the relevance of the "Answer" to the "Search Query".
    - Score the relevance from 0 to 1.
    - Use 1 if the Answer directly addresses the Search Query.
    - Use 0 if the Answer is irrelevant to the Search Query.
    - Provide a brief justification for the score.
    """
    
    evaluator = LLMEvaluator(
        instructions=instruction_text.strip(),
        inputs=[
            ("search_queries", list[str]),
            ("answers", list[str]),
        ],
        outputs=["relevance_score"],
        examples=[
            {
                "inputs": {
                    "search_queries": "Best Italian restaurants in New York",
                    "answers": "Here are the top-rated Italian restaurants in New York City...",
                },
                "outputs": {"relevance_score": 1},
            },
            {
                "inputs": {
                    "search_queries": "Weather forecast for tomorrow",
                    "answers": "Apple released a new iPhone model today.",
                },
                "outputs": {"relevance_score": 0},
            },
        ],
        llm=llm,
    )

    search_queries = [
        "How to bake a chocolate cake?",
        "What is the capital of France?",
        "Latest news on technology.",
    ]

    answers = [
        "To bake a chocolate cake, you need the following ingredients...",
        "The capital of France is Paris.",
        "The weather today is sunny with a chance of rain.",
    ]

    results = evaluator.run(search_queries=search_queries, answers=answers)
    return results

# Example usage with an OpenAI LLM:
if __name__ == "__main__":
    llm = OpenAI(model="gpt-4o-mini")
    relevance_results = run_relevance_to_search_query(llm)
    print("Answer Relevance to Search Query Results:")
    print(relevance_results)

# Output: Answer Relevance to Search Query Results: {'results': [{'relevance_score': 1}, {'relevance_score': 1}, {'relevance_score': 0}]}
```

#### **Example Code: Evaluating Correctness**

This example shows how to evaluate the correctness of responses by comparing them to a ground truth:

```python
def run_correctness_comparing_to_ground_truth(llm: BaseLLM):
    instruction_text = """
    Evaluate the correctness of the "Answer" by comparing it to the "Ground Truth".
    - Score the correctness from 0 to 1.
    - Use 1 if the Answer is correct and matches the Ground Truth.
    - Use 0 if the Answer is incorrect or contradicts the Ground Truth.
    - Provide a brief explanation for the score.
    """

    evaluator = LLMEvaluator(
        instructions=instruction_text.strip(),
        inputs=[
            ("answers", list[str]),
            ("ground_truth", list[str]),
        ],
        outputs=["correctness_score"],
        examples=[
            {
                "inputs": {
                    "answers": "The capital of France is Paris.",
                    "ground_truth": "Paris is the capital of France.",
                },
                "outputs": {"correctness_score": 1},
            },
            {
                "inputs": {
                    "answers": "The capital of France is Berlin.",
                    "ground_truth": "Paris is the capital of France.",
                },
                "outputs": {"correctness_score": 0},
            },
        ],
        llm=llm,
    )

    answers = [
        "The capital of Germany is Berlin.",
        "Einstein developed the theory of gravity.",
        "The Great Wall is located in China.",
    ]

    ground_truth = [
        "Berlin is the capital of Germany.",
        "Newton developed the theory of gravity.",
        "The Great Wall of China is located in China.",
    ]

    results = evaluator.run(answers=answers, ground_truth=ground_truth)
    return results

# Example usage with an OpenAI LLM:
if __name__ == "__main__":
    llm = OpenAI(model="gpt-4o-mini")
    correctness_results = run_correctness_comparing_to_ground_truth(llm)
    print("\nAnswer Correctness Comparing to Ground Truth Results:")
    print(correctness_results)

# Output: Answer Correctness Comparing to Ground Truth Results: {'results': [{'correctness_score': 1}, {'correctness_score': 0}, {'correctness_score': 1}]}
```

By implementing these evaluation techniques, you can ensure that your RAG nodes are performing optimally, providing users with accurate and relevant information.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.getdynamiq.ai/evaluations-code.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.