Pairwise

Trainers are responsible for defining the the way to train a learnable scorer.

XPM Configxpmir.learning.trainers.multiple.MultipleTrainer(*, hooks, model, trainers)[source]

Bases: Trainer

Submit type: xpmir.learning.trainers.multiple.MultipleTrainer

This trainer can be used to combine various trainers

hooks: List[xpmir.learning.context.TrainingHook] = []

Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module

If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level

trainers: Dict[str, xpmir.learning.trainers.Trainer]

The trainers

XPM Configxpmir.letor.trainers.LossTrainer(*, hooks, model, batcher, sampler, batch_size)[source]

Bases: Trainer

Submit type: xpmir.letor.trainers.LossTrainer

Trainer based on a loss function

This trainer supposes that:

  • the sampler_iter is initialized – and is a serializable iterator over batches

hooks: List[xpmir.learning.context.TrainingHook] = []

Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module

If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level

batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()

How to batch samples together

sampler: xpmir.learning.base.Sampler

The sampler to use

batch_size: int = 16

Number of samples per batch

process_microbatch(records: BaseRecords)[source]

Combines a forward and backard

This method can be implemented by specific trainers that use the gradient. In that case the regularizer losses should be taken into account with self.add_losses.

XPM Configxpmir.letor.trainers.pointwise.PointwiseTrainer(*, hooks, model, batcher, sampler, batch_size, lossfn)[source]

Bases: LossTrainer

Submit type: xpmir.letor.trainers.pointwise.PointwiseTrainer

Pointwise trainer

hooks: List[xpmir.learning.context.TrainingHook] = []

Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module

If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level

batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()

How to batch samples together

sampler: xpmir.letor.samplers.PointwiseSampler

The pairwise sampler

batch_size: int = 16

Number of samples per batch

lossfn: xpmir.letor.trainers.pointwise.PointwiseLoss = xpmir.letor.trainers.pointwise.MSELoss.XPMValue(weight=1.0)

Loss function to use

Trainer

XPM Configxpmir.letor.trainers.generative.GenerativeTrainer(*, hooks, model, batcher, sampler, batch_size, loss)[source]

Bases: LossTrainer

Submit type: xpmir.letor.trainers.generative.GenerativeTrainer

hooks: List[xpmir.learning.context.TrainingHook] = []

Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module

If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level

batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()

How to batch samples together

sampler: xpmir.letor.samplers.PairwiseSampler

The pairwise sampler

batch_size: int = 16

Number of samples per batch

loss: xpmir.letor.trainers.generative.PairwiseGenerativeLoss

Losses

Pairwise

Samplers

XPM Configxpmir.letor.samplers.PairwiseSampler[source]

Bases: Sampler

Submit type: xpmir.letor.samplers.PairwiseSampler

Abstract class for pairwise samplers which output a set of (query, positive, negative) triples

XPM Configxpmir.letor.samplers.PairwiseModelBasedSampler(*, dataset, retriever)[source]

Bases: PairwiseSampler, ModelBasedSampler

Submit type: xpmir.letor.samplers.PairwiseModelBasedSampler

A pairwise sampler based on a retrieval model

dataset: datamaestro_text.data.ir.Adhoc

The IR adhoc dataset

retriever: xpmir.rankers.Retriever

A retriever to sample negative documents

XPM Configxpmir.letor.samplers.TripletBasedSampler(*, source)[source]

Bases: PairwiseSampler

Submit type: xpmir.letor.samplers.TripletBasedSampler

Sampler based on a triplet source

source: datamaestro_text.data.ir.TrainingTriplets

Triplets

XPM Configxpmir.letor.samplers.PairwiseDatasetTripletBasedSampler(*, documents, dataset, negative_algo)[source]

Bases: PairwiseSampler

Submit type: xpmir.letor.samplers.PairwiseDatasetTripletBasedSampler

Sampler based on a dataset where each query is associated with (1) a set of relevant documents (2) negative documents, where each negative is sampled with a specific algorithm

documents: datamaestro_text.data.ir.DocumentStore

The document store

dataset: datamaestro_text.data.ir.PairwiseSampleDataset

The dataset which contains the generated queries with its positives and negatives

negative_algo: str = random

The algo to sample the negatives, default value is random

XPM Configxpmir.letor.samplers.PairwiseInBatchNegativesSampler(*, sampler)[source]

Bases: BatchwiseSampler

Submit type: xpmir.letor.samplers.PairwiseInBatchNegativesSampler

An in-batch negative sampler constructured from a pairwise one

sampler: xpmir.letor.samplers.PairwiseSampler

The base pairwise sampler

XPM Configxpmir.letor.samplers.PairwiseSamplerFromTSV(*, pairwise_samples_path)[source]

Bases: PairwiseSampler

Submit type: xpmir.letor.samplers.PairwiseSamplerFromTSV

pairwise_samples_path: Path

The path which stores the existing triplets

XPM Taskxpmir.letor.samplers.ModelBasedHardNegativeSampler(*, dataset, retriever)[source]

Bases: Task, Sampler

Submit type: datamaestro_text.data.ir.PairwiseSampleDataset

Retriever-based hard negative sampler

dataset: datamaestro_text.data.ir.Adhoc

The dataset which contains the topics and assessments

retriever: xpmir.rankers.Retriever

The retriever to score of the document wrt the query

hard_negative_samples: Pathgenerated

Path to store the generated hard negatives

Dataset types

XPM Configxpmir.letor.samplers.JSONLPairwiseSampleDataset(*, id, path)[source]

Bases: PairwiseSampleDataset

Submit type: xpmir.letor.samplers.JSONLPairwiseSampleDataset

Transform a jsonl file to a pairwise dataset General format: {

queries: [str, str], pos_ids: [id, id], neg_ids: {

“bm25”: [id, id], “random”: [id, id]

}

}

id: str

The unique dataset ID

path: Path

The path to the Jsonl file

XPM Configxpmir.letor.samplers.TSVPairwiseSampleDataset(*, id, hard_negative_samples_path)[source]

Bases: PairwiseSampleDataset

Submit type: xpmir.letor.samplers.TSVPairwiseSampleDataset

Read the pairwise sample dataset from a tsv file

id: str

The unique dataset ID

hard_negative_samples_path: Path

The path which stores the existing ids

Dataset transforms

XPM Taskxpmir.letor.samplers.synthetic.JSONLNegativeGeneration(*, random, documents, pairwise_dataset, retrievers, k)[source]

Bases: Task

Submit type: Any

Add the negatives to the pairwise sampler according to the given retrievers.

random: xpmir.learning.base.Random

The random sampler

documents: datamaestro_text.data.ir.DocumentStore

The document store where the negatives are sampling from

pairwise_dataset: xpmir.letor.samplers.JSONLPairwiseSampleDataset

The pairwise dataset where we are going to add the negatives

retrievers: Dict[str, xpmir.rankers.Retriever]

The retrievers to retrieve the top k document wrt the query, if no retriever’s provided, we just use the random negatives

synthetic_samples: Pathgenerated

Path to store the generated queries

k: int = 100

The number of negatives for each algo

Adapters

XPM Configxpmir.letor.samplers.hydrators.PairwiseTransformAdapter(*, sampler, adapter)[source]

Bases: PairwiseSampler

Submit type: xpmir.letor.samplers.hydrators.PairwiseTransformAdapter

Transforms pairwise samples using an adapter

It is interesting to use this adapter since the transformation is only performed if the samples are used: when using a SkippingIterator, when recovering a checkpoint, all the records might have to be processed otherwise.

sampler: xpmir.letor.samplers.PairwiseSampler

The distillation samples without texts for query and documents

adapter: xpmir.letor.samplers.hydrators.SampleTransform

The transformation