# Old version Evaluations

## Evaluating Your AI Agents and RAG Applications with Dynamiq

Evaluations are essential for determining the performance and quality of your AI-driven solutions. In the rapidly-evolving AI landscape, ensuring your outputs consistently meet standards of accuracy, relevance, and reliability is critical.

Dynamiq provides a seamless, powerful, and flexible evaluation framework for your Agents and Retrieval-Augmented Generation (RAG) workflows.

<figure><img src="/files/8sQmwxtsG5xNmRpTsQbM" alt=""><figcaption></figcaption></figure>

### Why Evaluation Matters?

Accurate evaluation ensures that your AI solutions consistently deliver reliable, relevant, and trustworthy outputs. Without regular assessment, models risk delivering content that could be inaccurate, irrelevant, toxic, or generally unreliable, negatively impacting user trust and overall effectiveness.

Evaluating your models systematically enables you to:

* Identify and address weaknesses quickly and efficiently.
* Compare alternative methods, agents, or model configurations.
* Improve user satisfaction and drive continuous enhancement of your products.

### Simplified Evaluations with Dynamiq

Dynamiq makes evaluating your AI agents and RAG applications intuitive and straightforward:

#### 1. Leverage LLM-as-a-Judge Metrics (LLM-powered evaluations)

* **Custom Prompts:** Clearly define evaluation goals, such as checking if an answer is toxic, relevant, grammatically correct, coherent, and more.
* **Predefined Metrics:** Utilize built-in metrics powered by LLM, including:
  * **Faithfulness:** Is the model output faithful to the information provided?
  * **Answer Correctness:** Is the answer accurate and reliable?
  * **Context Precision & Recall:** How precisely and comprehensively is context information leveraged?

#### 2. Python-based Metrics

Dynamiq allows integration of traditional Python metrics, giving further customization flexibility:

* **Levenshtein Distance:** Measure similarity or differences between generated and expected responses.
* **Custom Python Code:** Write your own Python metrics for any evaluation scenario.

#### 3. Seamless Integration Into Your Workflow

Effortlessly connect Dynamiq's evaluation framework with your Agentic or RAG applications. Dynamiq simplifies the process to evaluate, measure, and directly compare different workflows, enabling you to pinpoint the best-performing solution rapidly and iterate continuously.

### Explore, Create, and Evaluate Easily

In the following sections, we’ll demonstrate how you can easily:

* Create and customize metrics tailored to your needs.
* Build robust evaluation datasets.
* Conduct complete evaluation runs and analyze results.

With Dynamiq, rapidly improving your AI solutions becomes simple and intuitive.

***

In the age where quality drives adoption, Dynamiq empowers you to confidently deliver trustworthy and high-performing AI applications.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.getdynamiq.ai/old-version-evaluations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.