BARTScore

A generation evaluation metric that uses a pretrained BART model to assess the quality of generated text against reference text and is a type of comparison score .

Inputs: candidate (generated) text and reference text.
Returns: a numerical score (higher = better alignment with reference).

Example

from vero.metrics import BartScore

#example inputs
#chunks_list = ["The cat sat on the mat.", "The dog barked at the mailman."]
#answers_list = ["A cat is sitting on a mat and a dog is barking at the mailman."]
with BartScore() as bs:
    bart_results = [bs.evaluate(chunk, ans) for chunk, ans in zip(chunks_list, answers_list)]
print(bart_results)

Output

0.75

Note: This score does not hold any meaning in itself, it can be used to compare two models or versions of RAG pipelines and comparision can done as - higher the score better the generation capabilites of that pipeline compared to another.

Example​

Output​

Example

Output