Evaluation

Evaluation

XPM Taskxpmir.evaluation.BaseEvaluation(*, measures)[source]

Bases: Task

Submit type: Any

Base class for evaluation tasks

measures: List[xpmir.measures.Measure] = [Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure]]

List of metrics

aggregated: Pathgenerated

Path for aggregated results

detailed: Pathgenerated

Path for detailed results

XPM Taskxpmir.evaluation.RunEvaluation(*, measures, run, assessments)[source]

Bases: BaseEvaluation, Task

Submit type: Any

Evaluate a run

measures: List[xpmir.measures.Measure] = [Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure]]

List of metrics

aggregated: Pathgenerated

Path for aggregated results

detailed: Pathgenerated

Path for detailed results

run: datamaestro_text.data.ir.trec.TrecAdhocRun
assessments: datamaestro_text.data.ir.AdhocAssessments
XPM Taskxpmir.evaluation.Evaluate(*, measures, dataset, retriever, topic_wrapper)[source]

Bases: BaseEvaluation, Task

Submit type: Any

Evaluate a retriever directly (without generating the run explicitly)

measures: List[xpmir.measures.Measure] = [Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure], Config[xpmir.measures.measure]]

List of metrics

aggregated: Pathgenerated

Path for aggregated results

detailed: Pathgenerated

Path for detailed results

dataset: datamaestro_text.data.ir.Adhoc

The dataset for retrieval

retriever: xpmir.rankers.Retriever

The retriever to evaluate

topic_wrapper: datamaestro_text.transforms.ir.TopicWrapper

Topic extractor

class xpmir.evaluation.Evaluations(dataset: Adhoc, measures: List[Measure], *, topic_wrapper: TopicWrapper | None = None)[source]

Bases: object

Holds experiment results for several models on one dataset

evaluate_retriever(retriever: Retriever | RetrieverFactory, launcher: Launcher | None = None, init_tasks=[]) Tuple[Retriever, AdhocResults][source]

Evaluates a retriever

class xpmir.evaluation.EvaluationsCollection(**collection: Evaluations)[source]

Bases: object

A collection of evaluation

This is useful to group all the evaluations to be conducted, and then to call the evaluate_retriever()

evaluate_retriever(retriever: Retriever | RetrieverFactory, launcher: Launcher | None = None, model_id: str | None = None, overwrite: bool = False, init_tasks=[])[source]

Evaluate a retriever for all the evaluations in this collection (the tasks are submitted to the experimaestro scheduler)

to_dataframe() pandas.DataFrame[source]

Returns a Pandas dataframe

Metrics

Metrics are backed up by the module ir_measures

XPM Configxpmir.measures.Measure(*, identifier, rel, cutoff)[source]

Bases: Measure

Mirrors the ir_measures metric object

identifier: str

main identifier

rel: int = 1

minimum relevance score to be considered relevant (inclusive)

cutoff: int

Cutoff value

List of defined measures

xpmir.measures.AP = Config[xpmir.measures.measure]

Average precision metric

xpmir.measures.P = Config[xpmir.measures.measure]

Precision at rank

xpmir.measures.R = Config[xpmir.measures.measure]

Recall at rank

xpmir.measures.RR = Config[xpmir.measures.measure]

Reciprocical rank

xpmir.measures.Success = Config[xpmir.measures.measure]

1 if a document with at least rel relevance is found in the first cutoff documents, else 0.

xpmir.measures.nDCG = Config[xpmir.measures.measure]

Normalized Discounted Cumulated Gain

Measures can be used with the @ operator. Exemple:

from xpmir.measures import AP, P, nDCG, RR
from xpmir.evaluation import Evaluate

Evaluate(measures=[AP, P@20, nDCG, nDCG@10, nDCG@20, RR, RR@10], ...)