# Faithfulness

<figure><img src="https://4279757243-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FTbBxR0Ob7RUmbvHZkQi2%2Fuploads%2F7pMetb9RlXo7Jffxcqig%2FFaithfulness.gif?alt=media&#x26;token=c30e3b15-d95a-495e-933a-30f6d56f7e51" alt=""><figcaption></figcaption></figure>

### Faithfulness Metric

The **Faithfulness Metric** assesses the factual consistency of a generated answer in relation to the provided context. It is calculated based on both the answer itself and the retrieved context. The resulting score is normalized to a range of **0 to 1**, where a higher score indicates better faithfulness.

$$
\text{Faithfulness score} = {|\text{Number of claims in the generated answer that can be inferred from given context}| \over |\text{Total number of claims in the generated answer}|}
$$

### Definition

An answer is considered *faithful* if all claims made in the answer can be logically inferred from the given context.

### Calculation Process

1. **Claim Identification**: A set of claims present in the generated answer is identified.
2. **Cross-Verification**: Each claim is then cross-checked with the provided context to determine if it can be substantiated by the context.

### Scoring

The final faithfulness score reflects how well the claims in the answer align with the given context.

### Result

Your Faithfulness metric will now be ready for use in evaluations!

### Example Code: Faithfulness Evaluation

This example demonstrates how to use the `FaithfulnessEvaluator` to assess the factual consistency of generated answers against given contexts using the `OpenAI` language model.

```python
import logging
import sys
from dotenv import find_dotenv, load_dotenv
from dynamiq.evaluations.metrics import FaithfulnessEvaluator
from dynamiq.nodes.llms import OpenAI

# Load environment variables for OpenAI API
load_dotenv(find_dotenv())

# Configure logging to display information or debug messages
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# Initialize the OpenAI language model (change model name as necessary)
llm = OpenAI(model="gpt-4o-mini")

# Sample questions, answers, and contexts
questions = [
    "Who was Albert Einstein and what is he best known for?",
    "Tell me about the Great Wall of China."
]

answers = [
    (
        "He was a German-born theoretical physicist, widely acknowledged to be one "
        "of the greatest and most influential physicists of all time. He was best "
        "known for developing the theory of relativity and also contributed to the "
        "development of quantum mechanics."
    ),
    (
        "The Great Wall of China is a large wall built to keep out invaders. "
        "It is famous for being visible from space."
    ),
]

contexts = [
    (
        "Albert Einstein was a German-born theoretical physicist who developed "
        "the theory of relativity."
    ),
    (
        "The Great Wall of China includes a series of fortifications built "
        "across northern borders of ancient Chinese states for protection against "
        "nomadic groups."
    ),
]

# Initialize the FaithfulnessEvaluator and run evaluation
evaluator = FaithfulnessEvaluator(llm=llm)
faithfulness_scores = evaluator.run(
    questions=questions,
    answers=answers,
    contexts=contexts,
    verbose=True,  # Set to False to disable verbose output
)

# Print the evaluation results
for idx, score in enumerate(faithfulness_scores):
    print(f"Question: {questions[idx]}")
    print(f"Faithfulness Score: {score}")
    print("-" * 50)

print("All Faithfulness Scores:")
print(faithfulness_scores)
```
