Samplers
Samplers provide samples in the form of records. They all inherit from:
- XPM Configxpmir.letor.samplers.Sampler[source]
Bases:
experimaestro.core.objects.Config
,xpmir.utils.utils.EasyLogger
Abstract data sampler
- class xpmir.letor.samplers.SerializableIterator(*args, **kwargs)[source]
Bases:
Iterator
[xpmir.utils.iter.ItType
],Protocol
[xpmir.utils.iter.ItType
]An iterator that can be serialized through state dictionaries.
This is used when saving the sampler state
Pointwise
- XPM Configxpmir.letor.samplers.PointwiseSampler[source]
Bases:
xpmir.letor.samplers.Sampler
- pointwise_iter() xpmir.utils.iter.SerializableIterator[xpmir.letor.records.PointwiseRecord] [source]
Iterable over pointwise records
- XPM Configxpmir.letor.samplers.PointwiseModelBasedSampler(*, dataset, retriever, relevant_ratio)[source]
Bases:
xpmir.letor.samplers.PointwiseSampler
,xpmir.letor.samplers.ModelBasedSampler
- dataset: datamaestro_text.data.ir.Adhoc
The topics and assessments
- retriever: xpmir.rankers.Retriever
The document retriever
- relevant_ratio: float = 0.5
The target relevance ratio
Pairwise
- XPM Configxpmir.letor.samplers.PairwiseSampler[source]
Bases:
xpmir.letor.samplers.Sampler
Abstract class for pairwise samplers which output a set of (query, positive, negative) triples
- XPM Configxpmir.letor.samplers.PairwiseModelBasedSampler(*, dataset, retriever)[source]
Bases:
xpmir.letor.samplers.PairwiseSampler
,xpmir.letor.samplers.ModelBasedSampler
A pairwise sampler based on a retrieval model
- dataset: datamaestro_text.data.ir.Adhoc
The topics and assessments
- retriever: xpmir.rankers.Retriever
The document retriever
- XPM Configxpmir.documents.samplers.BatchwiseRandomSpanSampler(*, documents, max_spansize)[source]
Bases:
xpmir.documents.samplers.DocumentSampler
,xpmir.letor.samplers.BatchwiseSampler
This sampler uses positive samples coming from the same documents and negative ones coming from others
- Allows to (pre)-train as in co-condenser:
L. Gao and J. Callan, “Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval,” arXiv:2108.05540 [cs], Aug. 2021, Accessed: Sep. 17, 2021. [Online]. http://arxiv.org/abs/2108.05540
- documents: datamaestro_text.data.ir.AdhocDocumentStore
- max_spansize: int = 1000
Maximum span size in number of characters
- XPM Configxpmir.letor.samplers.TripletBasedSampler(*, source, index)[source]
Bases:
xpmir.letor.samplers.PairwiseSampler
Sampler based on a triplet file
Attributes:
source: the source of the triplets index: the index (if the source is only)
- XPM Configxpmir.letor.samplers.PairwiseDatasetTripletBasedSampler(*, dataset)[source]
Bases:
xpmir.letor.samplers.PairwiseSampler
Sampler based on a dataset where each query is associated with (1) a set of relevant documents (2) negative documents, where each negative is sampled with a specific algorithm
Hard Negatives Sampling (Tasks)
- XPM Taskxpmir.letor.samplers.ModelBasedHardNegativeSampler(*, dataset, retriever)[source]
Bases:
experimaestro.core.objects.Task
,xpmir.letor.samplers.Sampler
Retriever-based hard negative sampler
- dataset: datamaestro_text.data.ir.Adhoc
The dataset which contains the topics and assessments
- retriever: xpmir.rankers.Retriever
The retriever to score of the document wrt the query
- hard_negative_samples: Pathgenerated
Path to store the generated hard negatives
- XPM Taskxpmir.letor.samplers.TeacherModelBasedHardNegativesTripletSampler(*, sampler, document_store, query_store, teacher_model)[source]
Bases:
experimaestro.core.objects.Task
,xpmir.letor.samplers.Sampler
For a given set of triplet, assign the score for the documents according to the teacher model
- sampler: xpmir.letor.samplers.PairwiseSampler
The list of exsting hard negatives which we can sample from
- document_store: datamaestro_text.data.ir.AdhocDocumentStore
The document store
- query_store: xpmir.datasets.adapters.TextStore
The query_document store
- teacher_model: xpmir.rankers.Scorer
The teacher model which scores the positive and negative document
- hard_negative_triplet: Pathgenerated
The path to store the generated triplets
Distillation
- class xpmir.letor.distillation.samplers.PairwiseDistillationSample(query, documents)[source]
Bases:
NamedTuple
- documents: Tuple[xpmir.rankers.ScoredDocument, xpmir.rankers.ScoredDocument]
Positive/negative document with teacher scores
- query: xpmir.letor.records.Query
The query
- XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamples[source]
Bases:
experimaestro.core.objects.Config
,Iterable
[xpmir.letor.distillation.samplers.PairwiseDistillationSample
]Pairwise distillation file
- XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV(*, id, path, with_docid, with_queryid)[source]
Bases:
xpmir.letor.distillation.samplers.PairwiseDistillationSamples
,datamaestro.data.File
A TSV file (Score 1, Score 2, Query, Document 1, Document 2)
- id: str
The unique dataset ID
- path: Path
The path of the file
- with_docid: bool
- with_queryid: bool