Pairwise learning
In pairwise learning, each training instance is a (query, positive document, negative document) triplet. The model learns to rank the positive document above the negative one, typically optimised with a margin-based or cross-entropy loss.
Trainer
- XPM Configxpmir.letor.trainers.pairwise.PairwiseTrainer(*, hooks, model, sampler, batch_size, num_workers, lossfn)[source]
Bases:
LossTrainerPairwise trainer uses samples of the form (query, positive, negative)
- hooks: List[xpm_torch.trainers.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpm_torch.module.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- batcher: xpm_torch.batchers.Batchergenerated
How to batch samples together
- sampler: xpm_torch.base.Sampler
The sampler to use
- lossfn: xpm_torch.losses.pairwise.PairwiseLoss
The loss function
Samplers
Samplers produce pairwise training triplets from different data sources (model scores, pre-computed files, or in-batch negatives).
- XPM Configxpmir.letor.samplers.PairwiseModelBasedSampler(*, dataset, retriever)[source]
Bases:
ModelBasedSampler,Sampler[PairwiseItem]A pairwise sampler based on a retrieval model
- dataset: datamaestro_ir.data.Adhoc
The IR adhoc dataset
- retriever: xpmir.rankers.retriever.Retriever
A retriever to sample negative documents
- XPM Configxpmir.letor.samplers.TripletBasedSampler(*, source)[source]
Bases:
Sampler[PairwiseItem]Sampler based on a triplet source
- source: datamaestro_ir.data.TrainingTriplets
Triplets
- XPM Configxpmir.letor.samplers.PairwiseDatasetTripletBasedSampler(*, documents, dataset, negative_algo)[source]
Bases:
Sampler[PairwiseItem]Sampler based on a dataset where each query is associated with (1) a set of relevant documents (2) negative documents, where each negative is sampled with a specific algorithm
- documents: datamaestro_ir.data.DocumentStore
The document store
- dataset: datamaestro_ir.data.PairwiseSampleDataset
The dataset which contains the generated queries with its positives and negatives
- XPM Configxpmir.letor.samplers.PairwiseInBatchNegativesSampler(*, sampler)[source]
Bases:
Sampler[BatchwiseItems]An in-batch negative sampler constructured from a pairwise one
- sampler: xpm_torch.base.Sampler[xpmir.letor.records.PairwiseItem]
The base pairwise sampler
- XPM Configxpmir.letor.samplers.PairwiseSamplerFromTSV(*, pairwise_samples_path)[source]
Bases:
Sampler[PairwiseItem]- pairwise_samples_path: path
The path which stores the existing triplets
- XPM Taskxpmir.letor.samplers.ModelBasedHardNegativeSampler(*, dataset, retriever)[source]
-
Submit type:
datamaestro_ir.data.PairwiseSampleDatasetRetriever-based hard negative sampler
- dataset: datamaestro_ir.data.Adhoc
The dataset which contains the topics and assessments
- retriever: xpmir.rankers.retriever.Retriever
The retriever to score of the document wrt the query
- hard_negative_samples: pathgenerated
Path to store the generated hard negatives
Dataset types
Pre-computed pairwise datasets stored as JSONL or TSV files.
- XPM Configxpmir.letor.samplers.JSONLPairwiseSampleDataset(*, id, path)[source]
Bases:
PairwiseSampleDatasetTransform a JSONL file to a pairwise dataset.
General format:
{ "queries": ["str", "str"], "pos_ids": ["id", "id"], "neg_ids": { "bm25": ["id", "id"], "random": ["id", "id"] } }
- path: path
The path to the Jsonl file
Adapters
- XPM Configxpmir.letor.samplers.adapters.SamplerAdapter(*, sampler, processors, buffer_size)[source]
Bases:
Sampler[SampleT]Wraps a sampler with processors that transform its output.
The adapter takes an input Sampler and applies a chain of RecordsProcessors to transform the samples.
- sampler: xpm_torch.base.Sampler
- processors: List[xpmir.letor.processors.RecordsProcessor]
Processors
- XPM Configxpmir.letor.processors.StoreHydrator(*, documentstore, querystore)[source]
Bases:
DocumentsProcessor[DocIn,QueryIn,DocOut],QueriesProcessor[DocIn,QueryIn,QueryOut]Hydrates ID-only records with text from document/query stores.
When documentstore is set, documents are hydrated via documents_ext(). When querystore is set, queries are hydrated via store lookup. For documents with ScoredItem, the score is preserved via ScoredDocument.
- documentstore: datamaestro_ir.data.DocumentStore
- querystore: xpmir.datasets.adapters.TextStore