Dataset adapters
Adapters derive new datasets from existing ones by subsampling documents, topics, or relevance assessments. They are useful for creating train/validation splits, restricting evaluation to a subset of topics, or building retriever-based collections for re-ranking experiments.
Adhoc datasets
Split or combine ad-hoc retrieval datasets into folds.
- XPM Taskxpmir.datasets.adapters.RandomFold(*, seed, sizes, dataset, fold, exclude)[source]
Bases:
TaskSubmit type:
datamaestro_ir.data.AdhocExtracts a random subset of topics from a dataset
- dataset: datamaestro_ir.data.Adhoc
The Adhoc dataset from which a fold is extracted
- exclude: datamaestro_ir.data.Topics
Exclude some topics from the random fold
- assessments: pathgenerated
Generated assessments file
- topics: pathgenerated
Generated topics file
- static folds(seed: int, sizes: ~typing.List[float], dataset: ~typing.Annotated[~datamaestro_ir.data.Adhoc, <experimaestro.core.arguments._Param object at 0x7f4b6c5200b0>], exclude: ~typing.Annotated[~datamaestro_ir.data.Topics | None, <experimaestro.core.arguments._Param object at 0x7f4b6c5200b0>] = None, submit=True)[source]
Creates folds
Parameters:
submit: if true (default), submits the fold tasks to experimaestro
- XPM Taskxpmir.datasets.adapters.ConcatFold(*, datasets)[source]
Bases:
TaskSubmit type:
datamaestro_ir.data.AdhocConcatenation of several datasets to get a full dataset.
- datasets: List[datamaestro_ir.data.Adhoc]
The list of Adhoc datasets to concatenate
- assessments: pathgenerated
Generated assessments file
- topics: pathgenerated
Generated topics file
Documents
Create document subsets, e.g. restricting a collection to documents returned by a first-stage retriever.
- XPM Taskxpmir.datasets.adapters.RetrieverBasedCollection(*, relevance_threshold, dataset, retrievers, keepRelevant, keepNotRelevant)[source]
Bases:
TaskSubmit type:
datamaestro_ir.data.Adhoc.XPMConfigBuils a subset of documents based on the output of a set of retrievers and on relevance assessment. First get all the document based on the assessment then add the retrieved ones.
- dataset: datamaestro_ir.data.Adhoc
A dataset
- retrievers: List[xpmir.rankers.retriever.Retriever]
Rankers
- docids_path: pathgenerated
The file containing the document identifiers of the collection
Assessments
Fold relevance assessments (qrels) by topic ID or topic object.
- XPM Configxpmir.datasets.adapters.AbstractAdhocAssessmentFold(*, id, qrels)[source]
Bases:
AdhocAssessmentsFilter assessments by topic ID
- qrels: datamaestro_ir.data.AdhocAssessments
The collection of the assessments
- XPM Configxpmir.datasets.adapters.AdhocAssessmentFold(*, id, qrels, ids)[source]
Bases:
AbstractAdhocAssessmentFoldFilter assessments by topic ID
- qrels: datamaestro_ir.data.AdhocAssessments
The collection of the assessments
- XPM Configxpmir.datasets.adapters.IDAdhocAssessmentFold(*, id, qrels, id_list)[source]
Bases:
AbstractAdhocAssessmentFold- qrels: datamaestro_ir.data.AdhocAssessments
The collection of the assessments
- id_list: xpmir.misc.IDList
Topics
Fold or generate topic sets.
- XPM Configxpmir.datasets.adapters.AbstractTopicFold(*, id, topics)[source]
Bases:
TopicsID-based topic selection
- topics: datamaestro_ir.data.Topics
The collection of the topics
- XPM Configxpmir.datasets.adapters.TopicFold(*, id, topics, ids)[source]
Bases:
AbstractTopicFold- topics: datamaestro_ir.data.Topics
The collection of the topics
- XPM Configxpmir.datasets.adapters.IDTopicFold(*, id, topics, id_list)[source]
Bases:
AbstractTopicFold- topics: datamaestro_ir.data.Topics
The collection of the topics
- id_list: xpmir.misc.IDList
- XPM Taskxpmir.datasets.adapters.TopicsFoldGenerator(*, seed, sizes, dataset, fold, exclude)[source]
Bases:
FileIDList,TaskSubmit type:
datamaestro_ir.data.AdhocExtracts a random subset of topics from a dataset
This task is more generic than the RandomFold one and should work whatever the topics/assessments as long as they are serializable (using pickle).
- path: pathgenerated
Selected topic IDs
- dataset: datamaestro_ir.data.Adhoc
The Adhoc dataset from which a fold is extracted
- exclude: datamaestro_ir.data.Topics
Exclude some topics from the random fold
- XPM Configxpmir.datasets.adapters.MemoryTopicStore(*, topics)[source]
Bases:
TextStoreView a set of topics as a (in memory) text store
- topics: datamaestro_ir.data.Topics
The collection of the topics to build the store