Samplers

Samplers provide samples in the form of records. They all inherit from:

XPM Configxpmir.letor.samplers.Sampler[source]

Bases: experimaestro.core.objects.Config, xpmir.utils.utils.EasyLogger

Abstract data sampler

class xpmir.letor.samplers.SerializableIterator(*args, **kwargs)[source]

Bases: Iterator[xpmir.utils.iter.ItType], Protocol[xpmir.utils.iter.ItType]

An iterator that can be serialized through state dictionaries.

This is used when saving the sampler state

Pointwise

XPM Configxpmir.letor.samplers.PointwiseSampler[source]

Bases: xpmir.letor.samplers.Sampler

pointwise_iter() xpmir.utils.iter.SerializableIterator[xpmir.letor.records.PointwiseRecord][source]

Iterable over pointwise records

XPM Configxpmir.letor.samplers.PointwiseModelBasedSampler(*, dataset, retriever, relevant_ratio)[source]

Bases: xpmir.letor.samplers.PointwiseSampler, xpmir.letor.samplers.ModelBasedSampler

dataset: datamaestro_text.data.ir.Adhoc

The topics and assessments

retriever: xpmir.rankers.Retriever

The document retriever

relevant_ratio: float = 0.5

The target relevance ratio

Pairwise

XPM Configxpmir.letor.samplers.PairwiseSampler[source]

Bases: xpmir.letor.samplers.Sampler

Abstract class for pairwise samplers which output a set of (query, positive, negative) triples

XPM Configxpmir.letor.samplers.PairwiseModelBasedSampler(*, dataset, retriever)[source]

Bases: xpmir.letor.samplers.PairwiseSampler, xpmir.letor.samplers.ModelBasedSampler

A pairwise sampler based on a retrieval model

dataset: datamaestro_text.data.ir.Adhoc

The topics and assessments

retriever: xpmir.rankers.Retriever

The document retriever

XPM Configxpmir.documents.samplers.BatchwiseRandomSpanSampler(*, documents, max_spansize)[source]

Bases: xpmir.documents.samplers.DocumentSampler, xpmir.letor.samplers.BatchwiseSampler

This sampler uses positive samples coming from the same documents and negative ones coming from others

Allows to (pre)-train as in co-condenser:

L. Gao and J. Callan, “Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval,” arXiv:2108.05540 [cs], Aug. 2021, Accessed: Sep. 17, 2021. [Online]. http://arxiv.org/abs/2108.05540

documents: datamaestro_text.data.ir.AdhocDocumentStore
max_spansize: int = 1000

Maximum span size in number of characters

XPM Configxpmir.letor.samplers.TripletBasedSampler(*, source, index)[source]

Bases: xpmir.letor.samplers.PairwiseSampler

Sampler based on a triplet file

Attributes:

source: the source of the triplets index: the index (if the source is only)

source: datamaestro_text.data.ir.TrainingTriplets
index: datamaestro_text.data.ir.AdhocDocumentStore
XPM Configxpmir.letor.samplers.PairwiseDatasetTripletBasedSampler(*, dataset)[source]

Bases: xpmir.letor.samplers.PairwiseSampler

Sampler based on a dataset where each query is associated with (1) a set of relevant documents (2) negative documents, where each negative is sampled with a specific algorithm

dataset: datamaestro_text.data.ir.PairwiseSampleDataset

Hard Negatives Sampling (Tasks)

XPM Taskxpmir.letor.samplers.ModelBasedHardNegativeSampler(*, dataset, retriever)[source]

Bases: experimaestro.core.objects.Task, xpmir.letor.samplers.Sampler

Retriever-based hard negative sampler

dataset: datamaestro_text.data.ir.Adhoc

The dataset which contains the topics and assessments

retriever: xpmir.rankers.Retriever

The retriever to score of the document wrt the query

hard_negative_samples: Pathgenerated

Path to store the generated hard negatives

XPM Taskxpmir.letor.samplers.TeacherModelBasedHardNegativesTripletSampler(*, sampler, document_store, query_store, teacher_model)[source]

Bases: experimaestro.core.objects.Task, xpmir.letor.samplers.Sampler

For a given set of triplet, assign the score for the documents according to the teacher model

sampler: xpmir.letor.samplers.PairwiseSampler

The list of exsting hard negatives which we can sample from

document_store: datamaestro_text.data.ir.AdhocDocumentStore

The document store

query_store: xpmir.datasets.adapters.TextStore

The query_document store

teacher_model: xpmir.rankers.Scorer

The teacher model which scores the positive and negative document

hard_negative_triplet: Pathgenerated

The path to store the generated triplets

Distillation

class xpmir.letor.distillation.samplers.PairwiseDistillationSample(query, documents)[source]

Bases: NamedTuple

documents: Tuple[xpmir.rankers.ScoredDocument, xpmir.rankers.ScoredDocument]

Positive/negative document with teacher scores

query: xpmir.letor.records.Query

The query

XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamples[source]

Bases: experimaestro.core.objects.Config, Iterable[xpmir.letor.distillation.samplers.PairwiseDistillationSample]

Pairwise distillation file

XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV(*, id, path, with_docid, with_queryid)[source]

Bases: xpmir.letor.distillation.samplers.PairwiseDistillationSamples, datamaestro.data.File

A TSV file (Score 1, Score 2, Query, Document 1, Document 2)

id: str

The unique dataset ID

path: Path

The path of the file

with_docid: bool
with_queryid: bool

Records for training

class xpmir.letor.records.PairwiseRecord(query: xpmir.letor.records.Query, positive: xpmir.letor.records.Document, negative: xpmir.letor.records.Document)[source]

Bases: object

A pairwise record is composed of a query, a positive and a negative document

class xpmir.letor.records.PointwiseRecord(query: xpmir.letor.records.Query, docid: str, content: str, score: float, relevance: Optional[float])[source]

Bases: object

A record from a pointwise sampler