Dynamiq Docs
  • Welcome to Dynamiq
  • Low-Code Builder
    • Extractors and Transformers
    • Chat
    • Basics
    • Connecting Nodes
    • Conditional Nodes and Multiple Outputs
    • Input and Output Transformers
    • Error Handling and Retries
    • LLM Nodes
    • Validator Nodes
    • RAG Nodes
      • Indexing Workflow
        • Pre-processing Nodes
        • Document Splitting
        • Document Embedders
        • Document Writers
      • Inference RAG workflow
        • Text embedders
        • Document retrievers
          • Complex retrievers
        • LLM Answer Generators
    • LLM Agents
      • Basics
      • Guide to Implementing LLM Agents: ReAct and Simple Agents
      • Guide to Agent Orchestration: Linear and Adaptive Orchestrators
      • Guide to Advanced Agent Orchestration: Graph Orchestrator
    • Audio and voice
    • Tools and External Integrations
    • Python Code in Workflows
    • Memory
    • Guardrails
  • Deployments
    • Workflows
      • Tracing Workflow Execution
    • LLMs
      • Fine-tuned Adapters
      • Supported Models
    • Vector Databases
  • Prompts
    • Prompt Playground
  • Connections
  • LLM Fine-tuning
    • Basics
    • Using Adapters
    • Preparing Data
    • Supported Models
    • Parameters Guide
  • Knowledge Bases
  • Evaluations
    • Metrics
      • LLM-as-a-Judge
      • Predefined metrics
        • Faithfulness
        • Context Precision
        • Context Recall
        • Factual Correctness
        • Answer Correctness
      • Python Code Metrics
    • Datasets
    • Evaluation Runs
    • Examples
      • Build Accurate vs. Inaccurate Workflows
  • Examples
    • Building a Search Assistant
      • Approach 1: Single Agent with a Defined Role
      • Approach 2: Adaptive Orchestrator with Multiple Agents
      • Approach 3: Custom Logic Pipeline with a Straightforward Workflow
    • Building a Code Assistant
  • Platform Settings
    • Access Keys
    • Organizations
    • Settings
    • Billing
  • On-premise Deployment
    • AWS
    • IBM
    • Red Hat OpenShift
  • Support Center
Powered by GitBook
On this page
  • Faithfulness Metric
  • Definition
  • Calculation Process
  • Scoring
  • Result
  • Example Code: Faithfulness Evaluation
  1. Evaluations
  2. Metrics
  3. Predefined metrics

Faithfulness

PreviousPredefined metricsNextContext Precision

Last updated 5 months ago

Faithfulness Metric

The Faithfulness Metric assesses the factual consistency of a generated answer in relation to the provided context. It is calculated based on both the answer itself and the retrieved context. The resulting score is normalized to a range of 0 to 1, where a higher score indicates better faithfulness.

Faithfulness score=∣Number of claims in the generated answer that can be inferred from given context∣∣Total number of claims in the generated answer∣\text{Faithfulness score} = {|\text{Number of claims in the generated answer that can be inferred from given context}| \over |\text{Total number of claims in the generated answer}|}Faithfulness score=∣Total number of claims in the generated answer∣∣Number of claims in the generated answer that can be inferred from given context∣​

Definition

An answer is considered faithful if all claims made in the answer can be logically inferred from the given context.

Calculation Process

  1. Claim Identification: A set of claims present in the generated answer is identified.

  2. Cross-Verification: Each claim is then cross-checked with the provided context to determine if it can be substantiated by the context.

Scoring

The final faithfulness score reflects how well the claims in the answer align with the given context.

Result

Your Faithfulness metric will now be ready for use in evaluations!

Example Code: Faithfulness Evaluation

This example demonstrates how to use the FaithfulnessEvaluator to assess the factual consistency of generated answers against given contexts using the OpenAI language model.

import logging
import sys
from dotenv import find_dotenv, load_dotenv
from dynamiq.evaluations.metrics import FaithfulnessEvaluator
from dynamiq.nodes.llms import OpenAI

# Load environment variables for OpenAI API
load_dotenv(find_dotenv())

# Configure logging to display information or debug messages
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# Initialize the OpenAI language model (change model name as necessary)
llm = OpenAI(model="gpt-4o-mini")

# Sample questions, answers, and contexts
questions = [
    "Who was Albert Einstein and what is he best known for?",
    "Tell me about the Great Wall of China."
]

answers = [
    (
        "He was a German-born theoretical physicist, widely acknowledged to be one "
        "of the greatest and most influential physicists of all time. He was best "
        "known for developing the theory of relativity and also contributed to the "
        "development of quantum mechanics."
    ),
    (
        "The Great Wall of China is a large wall built to keep out invaders. "
        "It is famous for being visible from space."
    ),
]

contexts = [
    (
        "Albert Einstein was a German-born theoretical physicist who developed "
        "the theory of relativity."
    ),
    (
        "The Great Wall of China includes a series of fortifications built "
        "across northern borders of ancient Chinese states for protection against "
        "nomadic groups."
    ),
]

# Initialize the FaithfulnessEvaluator and run evaluation
evaluator = FaithfulnessEvaluator(llm=llm)
faithfulness_scores = evaluator.run(
    questions=questions,
    answers=answers,
    contexts=contexts,
    verbose=True,  # Set to False to disable verbose output
)

# Print the evaluation results
for idx, score in enumerate(faithfulness_scores):
    print(f"Question: {questions[idx]}")
    print(f"Faithfulness Score: {score}")
    print("-" * 50)

print("All Faithfulness Scores:")
print(faithfulness_scores)