Faithfulness

Faithfulness Metric

The Faithfulness Metric assesses the factual consistency of a generated answer in relation to the provided context. It is calculated based on both the answer itself and the retrieved context. The resulting score is normalized to a range of 0 to 1, where a higher score indicates better faithfulness.

Faithfulness score=Number of claims in the generated answer that can be inferred from given contextTotal number of claims in the generated answer\text{Faithfulness score} = {|\text{Number of claims in the generated answer that can be inferred from given context}| \over |\text{Total number of claims in the generated answer}|}

Definition

An answer is considered faithful if all claims made in the answer can be logically inferred from the given context.

Calculation Process

  1. Claim Identification: A set of claims present in the generated answer is identified.

  2. Cross-Verification: Each claim is then cross-checked with the provided context to determine if it can be substantiated by the context.

Scoring

The final faithfulness score reflects how well the claims in the answer align with the given context.

Result

Your Faithfulness metric will now be ready for use in evaluations!

Example Code: Faithfulness Evaluation

This example demonstrates how to use the FaithfulnessEvaluator to assess the factual consistency of generated answers against given contexts using the OpenAI language model.

import logging
import sys
from dotenv import find_dotenv, load_dotenv
from dynamiq.evaluations.metrics import FaithfulnessEvaluator
from dynamiq.nodes.llms import OpenAI

# Load environment variables for OpenAI API
load_dotenv(find_dotenv())

# Configure logging to display information or debug messages
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# Initialize the OpenAI language model (change model name as necessary)
llm = OpenAI(model="gpt-4o-mini")

# Sample questions, answers, and contexts
questions = [
    "Who was Albert Einstein and what is he best known for?",
    "Tell me about the Great Wall of China."
]

answers = [
    (
        "He was a German-born theoretical physicist, widely acknowledged to be one "
        "of the greatest and most influential physicists of all time. He was best "
        "known for developing the theory of relativity and also contributed to the "
        "development of quantum mechanics."
    ),
    (
        "The Great Wall of China is a large wall built to keep out invaders. "
        "It is famous for being visible from space."
    ),
]

contexts = [
    (
        "Albert Einstein was a German-born theoretical physicist who developed "
        "the theory of relativity."
    ),
    (
        "The Great Wall of China includes a series of fortifications built "
        "across northern borders of ancient Chinese states for protection against "
        "nomadic groups."
    ),
]

# Initialize the FaithfulnessEvaluator and run evaluation
evaluator = FaithfulnessEvaluator(llm=llm)
faithfulness_scores = evaluator.run(
    questions=questions,
    answers=answers,
    contexts=contexts,
    verbose=True,  # Set to False to disable verbose output
)

# Print the evaluation results
for idx, score in enumerate(faithfulness_scores):
    print(f"Question: {questions[idx]}")
    print(f"Faithfulness Score: {score}")
    print("-" * 50)

print("All Faithfulness Scores:")
print(faithfulness_scores)

Last updated