Optimization

Optimizers

XPM Configxpmir.learning.optim.Optimizer[source]

Bases: experimaestro.core.objects.Config

XPM Configxpmir.learning.optim.Adam(*, lr, weight_decay, eps)[source]

Bases: xpmir.learning.optim.Optimizer

Wrapper for Adam optimizer in PyTorch

lr: float = 0.001

Learning rate

weight_decay: float = 0.0

Weight decay (L2)

eps: float = 1e-08
XPM Configxpmir.learning.optim.AdamW(*, lr, weight_decay, eps)[source]

Bases: xpmir.learning.optim.Optimizer

Adam optimizer that takes into account the regularization

See the PyTorch documentation

lr: float = 0.001
weight_decay: float = 0.01
eps: float = 1e-08
XPM Configxpmir.learning.optim.ParameterOptimizer(*, optimizer, scheduler, module, filter)[source]

Bases: experimaestro.core.objects.Config

Associates an optimizer with a list of parameters to optimize

optimizer: xpmir.learning.optim.Optimizer

The optimizer

scheduler: xpmir.learning.schedulers.Scheduler

The optional scheduler

module: xpmir.learning.optim.Module

The module from which parameters should be extracted

filter: xpmir.learning.optim.ParameterFilter = xpmir.learning.optim.ParameterFilter()

How parameters should be selected for this (by default, use them all)

XPM Configxpmir.learning.optim.ParameterFilter[source]

Bases: experimaestro.core.objects.Config

One abstract class which doesn’t do the filtrage

XPM Configxpmir.learning.optim.Module[source]

Bases: experimaestro.core.objects.Config, TorchModule

A module contains parameters

Batching

XPM Configxpmir.learning.batchers.Batcher[source]

Bases: experimaestro.core.objects.Config

Responsible for micro-batching when the batch does not fit in memory

The base class just does nothing (no adaptation)

XPM Configxpmir.learning.batchers.PowerAdaptativeBatcher[source]

Bases: xpmir.learning.batchers.Batcher

Starts with the provided batch size, and then divides in 2, 3, etc. until there is no more OOM

Devices

The devices configuration allow to select both the device to use for computation and the way to use it (i.e. multi-gpu settings).

XPM Configxpmir.letor.devices.Device[source]

Bases: experimaestro.core.objects.Config

Device to use, as well as specific option (e.g. parallelism)

XPM Configxpmir.letor.devices.CudaDevice(*, gpu_determ, cpu_fallback, distributed)[source]

Bases: xpmir.letor.devices.Device

CUDA device

gpu_determ: bool = False

Sets the deterministic

cpu_fallback: bool = False

Fallback to CPU if no GPU is available

distributed: bool = False

Flag for using DistributedDataParallel When the number of GPUs is greater than one, use torch.nn.parallel.DistributedDataParallel when distributed is True and the number of GPUs greater than 1. When False, use torch.nn.DataParallel

Schedulers

XPM Configxpmir.learning.schedulers.CosineWithWarmup(*, num_warmup_steps, num_cycles)[source]

Bases: xpmir.learning.schedulers.Scheduler

Cosine schedule with warmup

Uses the implementation of the transformer library

https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.get_cosine_schedule_with_warmup

num_warmup_steps: int

Number of warmup steps

num_cycles: float = 0.5

Number of cycles

num_cycles: float = 0.5

Number of cycles

num_warmup_steps: int

Number of warmup steps

XPM Configxpmir.learning.schedulers.LinearWithWarmup(*, num_warmup_steps, min_factor)[source]

Bases: xpmir.learning.schedulers.Scheduler

Linear warmup followed by decay

num_warmup_steps: int

Number of warmup steps

min_factor: float = 0.0

Minimum multiplicative factor

min_factor: float = 0.0

Minimum multiplicative factor

num_warmup_steps: int

Number of warmup steps

XPM Configxpmir.learning.schedulers.Scheduler[source]

Bases: experimaestro.core.objects.Config

Base class for all optimizers schedulers