Samplers

Samplers provide samples in the form of records. They all inherit from:

XPM Configxpmir.letor.samplers.Sampler

Bases: experimaestro.core.objects.Config, xpmir.utils.EasyLogger

Abtract data sampler

class xpmir.letor.samplers.SerializableIterator(*args, **kwargs)

Bases: Iterator[xpmir.letor.samplers.ItType], Protocol[xpmir.letor.samplers.ItType]

An iterator that can be serialized through state dictionaries.

This is used when saving the sampler state

Pointwise

XPM Configxpmir.letor.samplers.PointwiseSampler

Bases: xpmir.letor.samplers.Sampler

pointwise_iter() xpmir.letor.samplers.SerializableIterator[xpmir.letor.records.PointwiseRecord]

Iteratable over pointwise records

XPM Configxpmir.letor.samplers.PointwiseModelBasedSampler(*, dataset, retriever, relevant_ratio)

Bases: xpmir.letor.samplers.PointwiseSampler, xpmir.letor.samplers.ModelBasedSampler

dataset: datamaestro_text.data.ir.Adhoc

The topics and assessments

retriever: xpmir.rankers.Retriever

The document retriever

relevant_ratio: float = 0.5

The target relevance ratio

Pairwise

XPM Configxpmir.letor.samplers.PairwiseSampler

Bases: xpmir.letor.samplers.Sampler

Abstract class for pairwise samplers which output a set of (query, positive, negative) triples

XPM Configxpmir.letor.samplers.PairwiseModelBasedSampler(*, dataset, retriever)

Bases: xpmir.letor.samplers.PairwiseSampler, xpmir.letor.samplers.ModelBasedSampler

A pairwise sampler based on a retrieval model

dataset: datamaestro_text.data.ir.Adhoc

The topics and assessments

retriever: xpmir.rankers.Retriever

The document retriever

XPM Configxpmir.documents.samplers.BatchwiseRandomSpanSampler(*, documents, max_spansize)

Bases: xpmir.documents.samplers.DocumentSampler, xpmir.letor.samplers.BatchwiseSampler

This sampler uses positive samples coming from the same documents and negative ones coming from others

Allows to (pre)-train as in co-condenser:
  1. Gao and J. Callan, “Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval,” arXiv:2108.05540 [cs], Aug. 2021, Accessed: Sep. 17, 2021. [Online]. Available: http://arxiv.org/abs/2108.05540

documents: datamaestro_text.data.ir.AdhocDocumentStore
max_spansize: int = 1000

Maximum span size in number of characters

Records for training

class xpmir.letor.records.PairwiseRecord(query: xpmir.letor.records.Query, positive: xpmir.letor.records.Document, negative: xpmir.letor.records.Document)

Bases: object

A pairwise record is composed of a query, a positive and a negative document

class xpmir.letor.records.PointwiseRecord(query: xpmir.letor.records.Query, docid: str, content: str, score: float, relevance: Optional[float])

Bases: object

A record from a pointwise sampler