Answer Correctness

Answer Correctness Evaluator

The Answer Correctness Evaluator assesses the correctness of answers based on their alignment with ground truth answers. It extracts key statements from both answers and ground truths, classifies them into True Positives (TP), False Positives (FP), and False Negatives (FN), and computes similarity scores for each answer.

Key Formulas

  1. Precision:

Precision=TPTP+FP \text{Precision} = \frac{TP}{TP + FP}
  1. Recall:

Recall=TPTP+FN \text{Recall} = \frac{TP}{TP + FN}
  1. F1 Score:

F1=2×Precision×RecallPrecision+Recall F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
  1. Final Score (combined from F1 score and similarity score):

Final Score=w1×F1+w2×Similarity Score \text{Final Score} = w_1 \times F1 + w_2 \times \text{Similarity Score}

where (w_1) and (w_2) are the weights assigned to the F1 and similarity scores, respectively.

Example Code: Answer Correctness Evaluation

This example demonstrates how to compute the Answer Correctness metric using the AnswerCorrectnessEvaluator with the OpenAI language model.

Last updated