Masked Language Model
Sampler
- XPM Configxpmir.mlm.samplers.MLMSampler(*, datasets)[source]
Bases:
Sampler
Sample texts from various sources
This sampler can be used for Masked Language Modeling to sample from several datasets.
- datasets: List[datamaestro_text.data.ir.DocumentStore]
Lists of datasets to sample from
Trainer
- XPM Configxpmir.mlm.trainer.MLMTrainer(*, hooks, model, batcher, sampler, batch_size, lossfn)[source]
Bases:
LossTrainer
Trainer for Masked Language Modeling
- hooks: List[xpmir.learning.context.TrainingHook] = []
Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer
- model: xpmir.learning.optim.Module
If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level
- batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher()
How to batch samples together
- sampler: xpmir.learning.base.Sampler
- batch_size: int = 16
Number of samples per batch
- lossfn: xpmir.mlm.trainer.MLMLoss = xpmir.mlm.trainer.CrossEntropyLoss(weight=1.0)