Dataset adapters

Adapters can be used when a collection is derived from another one by subsampling document and/or queries.

XPM Taskxpmir.datasets.adapters.RandomFold(*, seed, sizes, dataset, fold, exclude)[source]

Bases: experimaestro.core.objects.Task

Extracts a random subset of topics from a dataset

seed: int

Random seed used to compute the fold

sizes: List[float]

Number of topics of each fold (or percentage if sums to 1)

dataset: datamaestro_text.data.ir.Adhoc

The Adhoc dataset from which a fold is extracted

fold: int

Which fold should be taken

exclude: datamaestro_text.data.ir.AdhocTopics

Exclude some topics from the random fold

assessments: Pathgenerated

Generated assessments file

topics: Pathgenerated

Generated topics file

XPM Taskxpmir.datasets.adapters.RetrieverBasedCollection(*, relevance_threshold, dataset, retrievers, keepRelevant, keepNotRelevant)[source]

Bases: experimaestro.core.objects.Task

Buils a subset of documents based on the output of a set of retrievers and on relevance assessment. First get all the document based on the assessment then add the retrieved ones.

relevance_threshold: float = 0

Relevance threshold

dataset: datamaestro_text.data.ir.Adhoc

A dataset

retrievers: List[xpmir.rankers.Retriever]

Rankers

keepRelevant: bool = True

Keep documents judged relevant

keepNotRelevant: bool = False

Keep documents judged not relevant

docids_path: Pathgenerated

The file containing the document identifiers of the collection

XPM Configxpmir.datasets.adapters.AdhocTopicFold(*, id, ids, topics)[source]

Bases: datamaestro_text.data.ir.AdhocTopics

ID-based topic selection

id: str

The unique dataset ID

ids: List[str]

A set of the ids for the topics where we select from

topics: datamaestro_text.data.ir.AdhocTopics

The collection of the topics

XPM Configxpmir.datasets.adapters.AdhocAssessmentFold(*, id, ids, qrels)[source]

Bases: datamaestro_text.data.ir.AdhocAssessments

id: str

The unique dataset ID

ids: List[str]

A set of the ids for the assessments where we select from

qrels: datamaestro_text.data.ir.AdhocAssessments

The collection of the assessments

XPM Configxpmir.datasets.adapters.AdhocDocumentSubset(*, id, count, base, docids_path)[source]

Bases: datamaestro_text.data.ir.AdhocDocuments

ID-based topic selection

id: str

The unique dataset ID

count: int

Number of documents

base: datamaestro_text.data.ir.AdhocDocumentStore

The full document store

docids_path: Path

Path to the file containing the document IDs

XPM Configxpmir.datasets.adapters.MemoryTopicStore(*, topics)[source]

Bases: xpmir.datasets.adapters.TextStore

View a set of topics as a (in memory) text store

topics: datamaestro_text.data.ir.AdhocTopics

The collection of the topics to build the store