Distillation

Sampler

XPM Configxpmir.letor.distillation.samplers.DistillationPairwiseSampler(*, samples)[source]

Bases: Sampler

Submit type: xpmir.letor.distillation.samplers.DistillationPairwiseSampler

Just loops over samples

samples: xpmir.letor.distillation.samplers.PairwiseDistillationSamples
XPM Configxpmir.letor.distillation.samplers.PairwiseHydrator(*, documentstore, querystore, samples)[source]

Bases: PairwiseDistillationSamples, SampleHydrator

Submit type: xpmir.letor.distillation.samplers.PairwiseHydrator

Hydrate ID-based samples with document and/or query content

documentstore: datamaestro_text.data.ir.DocumentStore

The store for document texts if needed

querystore: xpmir.datasets.adapters.TextStore

The store for query texts if needed

samples: xpmir.letor.distillation.samplers.PairwiseDistillationSamples

The distillation samples without texts for query and documents

XPM Taskxpmir.letor.samplers.TeacherModelBasedHardNegativesTripletSampler(*, sampler, document_store, topic_store, teacher_model)[source]

Bases: Task, Sampler

Submit type: xpmir.letor.samplers.PairwiseSampler

Builds a teacher file for pairwise distillation losses

sampler: xpmir.letor.samplers.PairwiseSampler

The list of exsting hard negatives which we can sample from

document_store: datamaestro_text.data.ir.DocumentStore

The document store

topic_store: xpmir.datasets.adapters.TextStore

The query_document store

teacher_model: xpmir.rankers.Scorer

The teacher model which scores the positive and negative document

hard_negative_triplet: Pathgenerated

The path to store the generated triplets

Trainer

XPM Configxpmir.letor.distillation.pairwise.DistillationPairwiseTrainer(*, hooks, model, batcher, sampler, batch_size, lossfn)[source]

Bases: LossTrainer

Submit type: xpmir.letor.distillation.pairwise.DistillationPairwiseTrainer

Pairwise trainer for distillation

hooks: List[xpmir.learning.context.TrainingHook] = []

Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module

If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level

batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()

How to batch samples together

sampler: xpmir.letor.distillation.samplers.DistillationPairwiseSampler

The sampler

batch_size: int = 16

Number of samples per batch

lossfn: xpmir.letor.distillation.pairwise.DistillationPairwiseLoss

The distillation pairwise batch function

Losses

XPM Configxpmir.letor.distillation.pairwise.DistillationPairwiseLoss(*, weight)[source]

Bases: Config, TorchModule

Submit type: xpmir.letor.distillation.pairwise.DistillationPairwiseLoss

The abstract loss for pairwise distillation

weight: float = 1.0
compute(student_scores: torch.functional.Tensor, teacher_scores: torch.functional.Tensor, context: TrainerContext) torch.Tensor[source]

Compute the loss

Parameters:
  • student_scores – A (batch x 2) tensor

  • teacher_scores – A (batch x 2) tensor

XPM Configxpmir.letor.distillation.pairwise.MSEDifferenceLoss(*, weight)[source]

Bases: DistillationPairwiseLoss

Submit type: xpmir.letor.distillation.pairwise.MSEDifferenceLoss

Computes the MSE between the score differences

Compute ((student 1 - student 2) - (teacher 1 - teacher 2))**2

weight: float = 1.0
XPM Configxpmir.letor.distillation.pairwise.DistillationKLLoss(*, weight)[source]

Bases: DistillationPairwiseLoss

Submit type: xpmir.letor.distillation.pairwise.DistillationKLLoss

Distillation loss from: Distilling Dense Representations for Ranking using Tightly-Coupled Teachers https://arxiv.org/abs/2010.11386

weight: float = 1.0

Samplers

class xpmir.letor.distillation.samplers.PairwiseDistillationSample(query, documents)[source]

Bases: NamedTuple

documents: Tuple[Record, Record]

Positive/negative document with teacher scores

query: Record

The query

XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamples[source]

Bases: Config, Iterable[PairwiseDistillationSample]

Submit type: xpmir.letor.distillation.samplers.PairwiseDistillationSamples

Pairwise distillation file

XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV(*, id, path, with_docid, with_queryid)[source]

Bases: PairwiseDistillationSamples, File

Submit type: xpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV

A TSV file (Score 1, Score 2, Query, Document 1, Document 2)

id: str

The unique dataset ID

path: Path

The path of the file

with_docid: bool
with_queryid: bool