Pairwise
Trainers are responsible for defining the the way to train a learnable scorer.
- XPM Configxpmir.learning.trainers.multiple.MultipleTrainer(*, hooks, model, trainers)[source]
Bases:
Trainer
Submit type:
xpmir.learning.trainers.multiple.MultipleTrainer
This trainer can be used to combine various trainers
- hooks: List[xpmir.learning.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpmir.learning.optim.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- trainers: Dict[str, xpmir.learning.trainers.Trainer]
The trainers
- XPM Configxpmir.letor.trainers.LossTrainer(*, hooks, model, batcher, sampler, batch_size)[source]
Bases:
Trainer
Submit type:
xpmir.letor.trainers.LossTrainer
Trainer based on a loss function
This trainer supposes that:
the sampler_iter is initialized – and is a serializable iterator over batches
- hooks: List[xpmir.learning.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpmir.learning.optim.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()
How to batch samples together
- sampler: xpmir.learning.base.Sampler
The sampler to use
- batch_size: int = 16
Number of samples per batch
- XPM Configxpmir.letor.trainers.pointwise.PointwiseTrainer(*, hooks, model, batcher, sampler, batch_size, lossfn)[source]
Bases:
LossTrainer
Submit type:
xpmir.letor.trainers.pointwise.PointwiseTrainer
Pointwise trainer
- hooks: List[xpmir.learning.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpmir.learning.optim.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()
How to batch samples together
- sampler: xpmir.letor.samplers.PointwiseSampler
The pairwise sampler
- batch_size: int = 16
Number of samples per batch
- lossfn: xpmir.letor.trainers.pointwise.PointwiseLoss = xpmir.letor.trainers.pointwise.MSELoss.XPMValue(weight=1.0)
Loss function to use
Trainer
- XPM Configxpmir.letor.trainers.generative.GenerativeTrainer(*, hooks, model, batcher, sampler, batch_size, loss)[source]
Bases:
LossTrainer
Submit type:
xpmir.letor.trainers.generative.GenerativeTrainer
- hooks: List[xpmir.learning.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpmir.learning.optim.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()
How to batch samples together
- sampler: xpmir.letor.samplers.PairwiseSampler
The pairwise sampler
- batch_size: int = 16
Number of samples per batch
Losses
Pairwise
Samplers
- XPM Configxpmir.letor.samplers.PairwiseSampler[source]
Bases:
Sampler
Submit type:
xpmir.letor.samplers.PairwiseSampler
Abstract class for pairwise samplers which output a set of (query, positive, negative) triples
- XPM Configxpmir.letor.samplers.PairwiseModelBasedSampler(*, dataset, retriever)[source]
Bases:
PairwiseSampler
,ModelBasedSampler
Submit type:
xpmir.letor.samplers.PairwiseModelBasedSampler
A pairwise sampler based on a retrieval model
- dataset: datamaestro_text.data.ir.Adhoc
The IR adhoc dataset
- retriever: xpmir.rankers.Retriever
A retriever to sample negative documents
- XPM Configxpmir.letor.samplers.TripletBasedSampler(*, source)[source]
Bases:
PairwiseSampler
Submit type:
xpmir.letor.samplers.TripletBasedSampler
Sampler based on a triplet source
- source: datamaestro_text.data.ir.TrainingTriplets
Triplets
- XPM Configxpmir.letor.samplers.PairwiseDatasetTripletBasedSampler(*, documents, dataset, negative_algo)[source]
Bases:
PairwiseSampler
Submit type:
xpmir.letor.samplers.PairwiseDatasetTripletBasedSampler
Sampler based on a dataset where each query is associated with (1) a set of relevant documents (2) negative documents, where each negative is sampled with a specific algorithm
- documents: datamaestro_text.data.ir.DocumentStore
The document store
- dataset: datamaestro_text.data.ir.PairwiseSampleDataset
The dataset which contains the generated queries with its positives and negatives
- negative_algo: str = random
The algo to sample the negatives, default value is random
- XPM Configxpmir.letor.samplers.PairwiseInBatchNegativesSampler(*, sampler)[source]
Bases:
BatchwiseSampler
Submit type:
xpmir.letor.samplers.PairwiseInBatchNegativesSampler
An in-batch negative sampler constructured from a pairwise one
- sampler: xpmir.letor.samplers.PairwiseSampler
The base pairwise sampler
- XPM Configxpmir.letor.samplers.PairwiseSamplerFromTSV(*, pairwise_samples_path)[source]
Bases:
PairwiseSampler
Submit type:
xpmir.letor.samplers.PairwiseSamplerFromTSV
- pairwise_samples_path: Path
The path which stores the existing triplets
- XPM Taskxpmir.letor.samplers.ModelBasedHardNegativeSampler(*, dataset, retriever)[source]
Bases:
Task
,Sampler
Submit type:
datamaestro_text.data.ir.PairwiseSampleDataset
Retriever-based hard negative sampler
- dataset: datamaestro_text.data.ir.Adhoc
The dataset which contains the topics and assessments
- retriever: xpmir.rankers.Retriever
The retriever to score of the document wrt the query
- hard_negative_samples: Pathgenerated
Path to store the generated hard negatives
Dataset types
- XPM Configxpmir.letor.samplers.JSONLPairwiseSampleDataset(*, id, path)[source]
Bases:
PairwiseSampleDataset
Submit type:
xpmir.letor.samplers.JSONLPairwiseSampleDataset
Transform a jsonl file to a pairwise dataset General format: {
queries: [str, str], pos_ids: [id, id], neg_ids: {
“bm25”: [id, id], “random”: [id, id]
}
}
- id: str
The unique dataset ID
- path: Path
The path to the Jsonl file
- XPM Configxpmir.letor.samplers.TSVPairwiseSampleDataset(*, id, hard_negative_samples_path)[source]
Bases:
PairwiseSampleDataset
Submit type:
xpmir.letor.samplers.TSVPairwiseSampleDataset
Read the pairwise sample dataset from a tsv file
- id: str
The unique dataset ID
- hard_negative_samples_path: Path
The path which stores the existing ids
Dataset transforms
- XPM Taskxpmir.letor.samplers.synthetic.JSONLNegativeGeneration(*, random, documents, pairwise_dataset, retrievers, k)[source]
Bases:
Task
Submit type:
Any
Add the negatives to the pairwise sampler according to the given retrievers.
- random: xpmir.learning.base.Random
The random sampler
- documents: datamaestro_text.data.ir.DocumentStore
The document store where the negatives are sampling from
- pairwise_dataset: xpmir.letor.samplers.JSONLPairwiseSampleDataset
The pairwise dataset where we are going to add the negatives
- retrievers: Dict[str, xpmir.rankers.Retriever]
The retrievers to retrieve the top k document wrt the query, if no retriever’s provided, we just use the random negatives
- synthetic_samples: Pathgenerated
Path to store the generated queries
- k: int = 100
The number of negatives for each algo
Adapters
- XPM Configxpmir.letor.samplers.hydrators.PairwiseTransformAdapter(*, sampler, adapter)[source]
Bases:
PairwiseSampler
Submit type:
xpmir.letor.samplers.hydrators.PairwiseTransformAdapter
Transforms pairwise samples using an adapter
It is interesting to use this adapter since the transformation is only performed if the samples are used: when using a SkippingIterator, when recovering a checkpoint, all the records might have to be processed otherwise.
- sampler: xpmir.letor.samplers.PairwiseSampler
The distillation samples without texts for query and documents
- adapter: xpmir.letor.samplers.hydrators.SampleTransform
The transformation