Data Generation

Synthetic topics

XPM Taskxpmir.letor.samplers.synthetic.SyntheticQueryGeneration(*, model, batchsize, num_qry_per_doc, sampler, device, batcher, hooks)[source]

Bases: Task

Submit type: Any

model: xpmir.neural.generative.hf.T5ConditionalGenerator

The model we use to generate the queries

batchsize: int = 128

Batchsize when computing negatives

num_qry_per_doc: int = 5

How many synthetic qry to generate per document

sampler: xpmir.documents.samplers.DocumentSampler

document sampler to iterate over the corpus

device: xpmir.learning.devices.Device = xpmir.learning.devices.Device.XPMValue()

The device used by the encoder

batcher: xpmir.learning.batchers.Batcher = xpmir.learning.batchers.Batcher.XPMValue()

The way to prepare batches of documents

synthetic_samples: Pathgenerated

Path to store the generated queries

hooks: List[xpmir.context.Hook] = []

Global learning hooks