Masked Language Model

Sampler

XPM Configxpmir.mlm.samplers.MLMSampler(*, datasets)[source]

Sample texts from various sources

This sampler can be used for Masked Language Modeling to sample from several datasets.

datasets: List[datamaestro_text.data.ir.DocumentStore]: Lists of datasets to sample from

XPM Configxpmir.mlm.trainer.MLMTrainer(*, hooks, model, sampler, batch_size, lossfn)[source]

Trainer for Masked Language Modeling

hooks: List[xpmir.learning.context.TrainingHook] = []: Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module: If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level