Dataset adapters
Adapters can be used when a collection is derived from another one by subsampling document and/or queries.
- XPM Taskxpmir.datasets.adapters.RandomFold(*, seed, sizes, dataset, fold, exclude)[source]
Bases:
experimaestro.core.objects.Task
Extracts a random subset of topics from a dataset
- seed: int
Random seed used to compute the fold
- sizes: List[float]
Number of topics of each fold (or percentage if sums to 1)
- dataset: datamaestro_text.data.ir.Adhoc
The Adhoc dataset from which a fold is extracted
- fold: int
Which fold should be taken
- exclude: datamaestro_text.data.ir.AdhocTopics
Exclude some topics from the random fold
- assessments: Pathgenerated
Generated assessments file
- topics: Pathgenerated
Generated topics file
- XPM Taskxpmir.datasets.adapters.RetrieverBasedCollection(*, relevance_threshold, dataset, retrievers, keepRelevant, keepNotRelevant)[source]
Bases:
experimaestro.core.objects.Task
Buils a subset of documents based on the output of a set of retrievers and on relevance assessment. First get all the document based on the assessment then add the retrieved ones.
- relevance_threshold: float = 0
Relevance threshold
- dataset: datamaestro_text.data.ir.Adhoc
A dataset
- retrievers: List[xpmir.rankers.Retriever]
Rankers
- keepRelevant: bool = True
Keep documents judged relevant
- keepNotRelevant: bool = False
Keep documents judged not relevant
- docids_path: Pathgenerated
The file containing the document identifiers of the collection
- XPM Configxpmir.datasets.adapters.AdhocTopicFold(*, id, ids, topics)[source]
Bases:
datamaestro_text.data.ir.AdhocTopics
ID-based topic selection
- id: str
The unique dataset ID
- ids: List[str]
A set of the ids for the topics where we select from
- topics: datamaestro_text.data.ir.AdhocTopics
The collection of the topics
- XPM Configxpmir.datasets.adapters.AdhocAssessmentFold(*, id, ids, qrels)[source]
Bases:
datamaestro_text.data.ir.AdhocAssessments
- id: str
The unique dataset ID
- ids: List[str]
A set of the ids for the assessments where we select from
- qrels: datamaestro_text.data.ir.AdhocAssessments
The collection of the assessments
- XPM Configxpmir.datasets.adapters.AdhocDocumentSubset(*, id, count, base, docids_path)[source]
Bases:
datamaestro_text.data.ir.AdhocDocuments
ID-based topic selection
- id: str
The unique dataset ID
- count: int
Number of documents
- base: datamaestro_text.data.ir.AdhocDocumentStore
The full document store
- docids_path: Path
Path to the file containing the document IDs
- XPM Configxpmir.datasets.adapters.MemoryTopicStore(*, topics)[source]
Bases:
xpmir.datasets.adapters.TextStore
View a set of topics as a (in memory) text store
- topics: datamaestro_text.data.ir.AdhocTopics
The collection of the topics to build the store