Dataset adapters
Adapters can be used when a collection is derived from another one by subsampling document and/or queries.
Adhoc datasets
- XPM Taskxpmir.datasets.adapters.RandomFold(*, seed, sizes, dataset, fold, exclude)[source]
Bases:
Task
Submit type:
datamaestro_text.data.ir.Adhoc
Extracts a random subset of topics from a dataset
- seed: int
Random seed used to compute the fold
- sizes: List[float]
Number of topics of each fold (or percentage if sums to 1)
- dataset: datamaestro_text.data.ir.Adhoc
The Adhoc dataset from which a fold is extracted
- fold: int
Which fold should be taken
- exclude: datamaestro_text.data.ir.Topics
Exclude some topics from the random fold
- assessments: Pathgenerated
Generated assessments file
- topics: Pathgenerated
Generated topics file
- XPM Taskxpmir.datasets.adapters.ConcatFold(*, datasets)[source]
Bases:
Task
Submit type:
datamaestro_text.data.ir.Adhoc
Concatenation of several datasets to get a full dataset.
- datasets: List[datamaestro_text.data.ir.Adhoc]
The list of Adhoc datasets to concatenate
- assessments: Pathgenerated
Generated assessments file
- topics: Pathgenerated
Generated topics file
Documents
- XPM Taskxpmir.datasets.adapters.RetrieverBasedCollection(*, relevance_threshold, dataset, retrievers, keepRelevant, keepNotRelevant)[source]
Bases:
Task
Submit type:
datamaestro_text.data.ir.Adhoc
Buils a subset of documents based on the output of a set of retrievers and on relevance assessment. First get all the document based on the assessment then add the retrieved ones.
- relevance_threshold: float = 0
Relevance threshold
- dataset: datamaestro_text.data.ir.Adhoc
A dataset
- retrievers: List[xpmir.rankers.Retriever]
Rankers
- keepRelevant: bool = True
Keep documents judged relevant
- keepNotRelevant: bool = False
Keep documents judged not relevant
- docids_path: Pathgenerated
The file containing the document identifiers of the collection
- XPM Configxpmir.datasets.adapters.DocumentSubset(*, id, count, base, docids_path, in_memory)[source]
Bases:
Documents
Submit type:
xpmir.datasets.adapters.DocumentSubset
ID-based topic selection
- id: str
The unique dataset ID
- count: int
Number of documents
- base: datamaestro_text.data.ir.DocumentStore
The full document store
- docids_path: Path
Path to the file containing the document IDs
- in_memory: bool = False
Whether to load the dataset in memory
Assessments
- XPM Configxpmir.datasets.adapters.AdhocAssessmentFold(*, id, ids, qrels)[source]
Bases:
AdhocAssessments
Submit type:
xpmir.datasets.adapters.AdhocAssessmentFold
Filter assessments by topic ID
- id: str
The unique dataset ID
- ids: List[str]
A set of the ids for the assessments where we select from
- qrels: datamaestro_text.data.ir.AdhocAssessments
The collection of the assessments
Topics
- XPM Configxpmir.datasets.adapters.TopicFold(*, id, ids, topics)[source]
Bases:
Topics
Submit type:
xpmir.datasets.adapters.TopicFold
ID-based topic selection
- id: str
The unique dataset ID
- ids: List[str]
A set of the ids for the topics where we select from
- topics: datamaestro_text.data.ir.Topics
The collection of the topics
- XPM Configxpmir.datasets.adapters.MemoryTopicStore(*, topics)[source]
Bases:
TextStore
Submit type:
xpmir.datasets.adapters.MemoryTopicStore
View a set of topics as a (in memory) text store
- topics: datamaestro_text.data.ir.Topics
The collection of the topics to build the store
- XPM Configxpmir.datasets.adapters.TextStore[source]
Bases:
Config
Submit type:
xpmir.datasets.adapters.TextStore
Associates an ID with a text