Distillation

Sampler

XPM Configxpmir.letor.distillation.samplers.DistillationPairwiseSampler(*, samples)[source]

Bases: Sampler

Submit type: xpmir.letor.distillation.samplers.DistillationPairwiseSampler

Just loops over samples

samples: xpmir.letor.distillation.samplers.PairwiseDistillationSamples

XPM Configxpmir.letor.distillation.samplers.PairwiseHydrator(*, documentstore, querystore, samples)[source]

Bases: PairwiseDistillationSamples, SampleHydrator

Submit type: xpmir.letor.distillation.samplers.PairwiseHydrator

Hydrate ID-based samples with document and/or query content

documentstore: datamaestro_text.data.ir.DocumentStore: The store for document texts if needed

querystore: xpmir.datasets.adapters.TextStore: The store for query texts if needed

samples: xpmir.letor.distillation.samplers.PairwiseDistillationSamples: The distillation samples without texts for query and documents

XPM Taskxpmir.letor.samplers.TeacherModelBasedHardNegativesTripletSampler(*, sampler, document_store, topic_store, teacher_model)[source]

Bases: Task, Sampler

Submit type: xpmir.letor.samplers.PairwiseSampler

Builds a teacher file for pairwise distillation losses

sampler: xpmir.letor.samplers.PairwiseSampler: The list of exsting hard negatives which we can sample from

document_store: datamaestro_text.data.ir.DocumentStore: The document store

topic_store: xpmir.datasets.adapters.TextStore: The query_document store

teacher_model: xpmir.rankers.Scorer: The teacher model which scores the positive and negative document

hard_negative_triplet: Pathgenerated: The path to store the generated triplets

Trainer

XPM Configxpmir.letor.distillation.pairwise.DistillationPairwiseTrainer(*, hooks, model, batcher, sampler, batch_size, lossfn)[source]

Bases: LossTrainer

Submit type: xpmir.letor.distillation.pairwise.DistillationPairwiseTrainer

Pairwise trainer for distillation

hooks: List[xpmir.learning.context.TrainingHook] = []: Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module: If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level

batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue(): How to batch samples together

sampler: xpmir.letor.distillation.samplers.DistillationPairwiseSampler: The sampler

batch_size: int = 16: Number of samples per batch

lossfn: xpmir.letor.distillation.pairwise.DistillationPairwiseLoss: The distillation pairwise batch function

Losses

XPM Configxpmir.letor.distillation.pairwise.DistillationPairwiseLoss(*, weight)[source]

Bases: Config, TorchModule

Submit type: xpmir.letor.distillation.pairwise.DistillationPairwiseLoss

The abstract loss for pairwise distillation

weight: float = 1.0

compute(student_scores: torch.functional.Tensor, teacher_scores: torch.functional.Tensor, context: TrainerContext) → torch.Tensor[source]

Compute the loss

Parameters:

student_scores – A (batch x 2) tensor
teacher_scores – A (batch x 2) tensor

XPM Configxpmir.letor.distillation.pairwise.MSEDifferenceLoss(*, weight)[source]

Bases: DistillationPairwiseLoss

Submit type: xpmir.letor.distillation.pairwise.MSEDifferenceLoss

Computes the MSE between the score differences

Compute ((student 1 - student 2) - (teacher 1 - teacher 2))**2

weight: float = 1.0

XPM Configxpmir.letor.distillation.pairwise.DistillationKLLoss(*, weight)[source]

Bases: DistillationPairwiseLoss

Submit type: xpmir.letor.distillation.pairwise.DistillationKLLoss

Distillation loss from: Distilling Dense Representations for Ranking using Tightly-Coupled Teachers https://arxiv.org/abs/2010.11386

weight: float = 1.0

Samplers

class xpmir.letor.distillation.samplers.PairwiseDistillationSample(query, documents)[source]

Bases: NamedTuple

documents: Tuple[Record, Record]: Positive/negative document with teacher scores

query: Record: The query

XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamples[source]

Bases: Config, Iterable[PairwiseDistillationSample]

Submit type: xpmir.letor.distillation.samplers.PairwiseDistillationSamples

Pairwise distillation file

XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV(*, id, path, with_docid, with_queryid)[source]

Bases: PairwiseDistillationSamples, File

Submit type: xpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV

A TSV file (Score 1, Score 2, Query, Document 1, Document 2)

id: str: The unique dataset ID

path: Path: The path of the file

with_docid: bool

with_queryid: bool