Optimization
Optimizers
- XPM Configxpmir.learning.optim.Adam(*, lr, weight_decay, eps)[source]
Bases:
xpmir.learning.optim.Optimizer
Wrapper for Adam optimizer in PyTorch
- lr: float = 0.001
Learning rate
- weight_decay: float = 0.0
Weight decay (L2)
- eps: float = 1e-08
- XPM Configxpmir.learning.optim.AdamW(*, lr, weight_decay, eps)[source]
Bases:
xpmir.learning.optim.Optimizer
Adam optimizer that takes into account the regularization
See the PyTorch documentation
- lr: float = 0.001
- weight_decay: float = 0.01
- eps: float = 1e-08
- XPM Configxpmir.learning.optim.ParameterOptimizer(*, optimizer, scheduler, module, filter)[source]
Bases:
experimaestro.core.objects.Config
Associates an optimizer with a list of parameters to optimize
- optimizer: xpmir.learning.optim.Optimizer
The optimizer
- scheduler: xpmir.learning.schedulers.Scheduler
The optional scheduler
- module: xpmir.learning.optim.Module
The module from which parameters should be extracted
- filter: xpmir.learning.optim.ParameterFilter = xpmir.learning.optim.ParameterFilter()
How parameters should be selected for this (by default, use them all)
Batching
- XPM Configxpmir.learning.batchers.Batcher[source]
Bases:
experimaestro.core.objects.Config
Responsible for micro-batching when the batch does not fit in memory
The base class just does nothing (no adaptation)
- XPM Configxpmir.learning.batchers.PowerAdaptativeBatcher[source]
Bases:
xpmir.learning.batchers.Batcher
Starts with the provided batch size, and then divides in 2, 3, etc. until there is no more OOM
Devices
The devices configuration allow to select both the device to use for computation and the way to use it (i.e. multi-gpu settings).
- XPM Configxpmir.letor.devices.Device[source]
Bases:
experimaestro.core.objects.Config
Device to use, as well as specific option (e.g. parallelism)
- XPM Configxpmir.letor.devices.CudaDevice(*, gpu_determ, cpu_fallback, distributed)[source]
Bases:
xpmir.letor.devices.Device
CUDA device
- gpu_determ: bool = False
Sets the deterministic
- cpu_fallback: bool = False
Fallback to CPU if no GPU is available
- distributed: bool = False
Flag for using DistributedDataParallel When the number of GPUs is greater than one, use torch.nn.parallel.DistributedDataParallel when distributed is True and the number of GPUs greater than 1. When False, use torch.nn.DataParallel
Schedulers
- XPM Configxpmir.learning.schedulers.CosineWithWarmup(*, num_warmup_steps, num_cycles)[source]
Bases:
xpmir.learning.schedulers.Scheduler
Cosine schedule with warmup
Uses the implementation of the transformer library
- num_warmup_steps: int
Number of warmup steps
- num_cycles: float = 0.5
Number of cycles
- num_cycles: float = 0.5
Number of cycles
- num_warmup_steps: int
Number of warmup steps
- XPM Configxpmir.learning.schedulers.LinearWithWarmup(*, num_warmup_steps, min_factor)[source]
Bases:
xpmir.learning.schedulers.Scheduler
Linear warmup followed by decay
- num_warmup_steps: int
Number of warmup steps
- min_factor: float = 0.0
Minimum multiplicative factor
- min_factor: float = 0.0
Minimum multiplicative factor
- num_warmup_steps: int
Number of warmup steps