Distillation
Sampler
- XPM Configxpmir.letor.distillation.samplers.DistillationPairwiseSampler(*, samples)[source]
Bases:
Sampler
Submit type:
xpmir.letor.distillation.samplers.DistillationPairwiseSampler
Just loops over samples
- XPM Configxpmir.letor.distillation.samplers.PairwiseHydrator(*, documentstore, querystore, samples)[source]
Bases:
PairwiseDistillationSamples
,SampleHydrator
Submit type:
xpmir.letor.distillation.samplers.PairwiseHydrator
Hydrate ID-based samples with document and/or query content
- documentstore: datamaestro_text.data.ir.DocumentStore
The store for document texts if needed
- querystore: xpmir.datasets.adapters.TextStore
The store for query texts if needed
- samples: xpmir.letor.distillation.samplers.PairwiseDistillationSamples
The distillation samples without texts for query and documents
- XPM Taskxpmir.letor.samplers.TeacherModelBasedHardNegativesTripletSampler(*, sampler, document_store, topic_store, teacher_model)[source]
Bases:
Task
,Sampler
Submit type:
xpmir.letor.samplers.PairwiseSampler
Builds a teacher file for pairwise distillation losses
- sampler: xpmir.letor.samplers.PairwiseSampler
The list of exsting hard negatives which we can sample from
- document_store: datamaestro_text.data.ir.DocumentStore
The document store
- topic_store: xpmir.datasets.adapters.TextStore
The query_document store
- teacher_model: xpmir.rankers.Scorer
The teacher model which scores the positive and negative document
- hard_negative_triplet: Pathgenerated
The path to store the generated triplets
Trainer
- XPM Configxpmir.letor.distillation.pairwise.DistillationPairwiseTrainer(*, hooks, model, batcher, sampler, batch_size, lossfn)[source]
Bases:
LossTrainer
Submit type:
xpmir.letor.distillation.pairwise.DistillationPairwiseTrainer
Pairwise trainer for distillation
- hooks: List[xpmir.learning.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpmir.learning.optim.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()
How to batch samples together
- sampler: xpmir.letor.distillation.samplers.DistillationPairwiseSampler
The sampler
- batch_size: int = 16
Number of samples per batch
- lossfn: xpmir.letor.distillation.pairwise.DistillationPairwiseLoss
The distillation pairwise batch function
Losses
- XPM Configxpmir.letor.distillation.pairwise.DistillationPairwiseLoss(*, weight)[source]
Bases:
Config
,TorchModule
Submit type:
xpmir.letor.distillation.pairwise.DistillationPairwiseLoss
The abstract loss for pairwise distillation
- weight: float = 1.0
- XPM Configxpmir.letor.distillation.pairwise.MSEDifferenceLoss(*, weight)[source]
Bases:
DistillationPairwiseLoss
Submit type:
xpmir.letor.distillation.pairwise.MSEDifferenceLoss
Computes the MSE between the score differences
Compute ((student 1 - student 2) - (teacher 1 - teacher 2))**2
- weight: float = 1.0
- XPM Configxpmir.letor.distillation.pairwise.DistillationKLLoss(*, weight)[source]
Bases:
DistillationPairwiseLoss
Submit type:
xpmir.letor.distillation.pairwise.DistillationKLLoss
Distillation loss from: Distilling Dense Representations for Ranking using Tightly-Coupled Teachers https://arxiv.org/abs/2010.11386
- weight: float = 1.0
Samplers
- class xpmir.letor.distillation.samplers.PairwiseDistillationSample(query, documents)[source]
Bases:
NamedTuple
- documents: Tuple[Record, Record]
Positive/negative document with teacher scores
- query: Record
The query
- XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamples[source]
Bases:
Config
,Iterable
[PairwiseDistillationSample
]Submit type:
xpmir.letor.distillation.samplers.PairwiseDistillationSamples
Pairwise distillation file
- XPM Configxpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV(*, id, path, with_docid, with_queryid)[source]
Bases:
PairwiseDistillationSamples
,File
Submit type:
xpmir.letor.distillation.samplers.PairwiseDistillationSamplesTSV
A TSV file (Score 1, Score 2, Query, Document 1, Document 2)
- id: str
The unique dataset ID
- path: Path
The path of the file
- with_docid: bool
- with_queryid: bool