Optimization

Modules

XPM Configxpmir.learning.optim.Module[source]

Bases: Config, Initializable, TorchModule

Submit type: xpmir.learning.optim.Module

A module contains parameters

XPM Configxpmir.learning.optim.ModuleList(*, sub_modules)[source]

Bases: Module, Initializable

Submit type: xpmir.learning.optim.ModuleList

Groups different models together, to be used within the Learner

sub_modules: List[xpmir.learning.optim.Module]

The module loader can be used to load a checkpoint

XPM Configxpmir.learning.optim.ModuleLoader(*, value, path)[source]

Bases: PathSerializationLWTask

Submit type: xpmir.learning.optim.ModuleLoader

value: experimaestro.core.objects.Config

The configuration that will be serialized

path: Path

Path containing the data

Optimizers

XPM Configxpmir.learning.optim.Optimizer[source]

Bases: Config

Submit type: xpmir.learning.optim.Optimizer

XPM Configxpmir.learning.optim.SGD(*, lr, weight_decay)[source]

Bases: Optimizer

Submit type: xpmir.learning.optim.SGD

Wrapper for SGD optimizer in Pytorch

lr: float = 1e-05

Learning rate

weight_decay: float = 0.0

Weight decay (L2)

XPM Configxpmir.learning.optim.Adam(*, lr, weight_decay, eps)[source]

Bases: Optimizer

Submit type: xpmir.learning.optim.Adam

Wrapper for Adam optimizer in PyTorch

lr: float = 0.001

Learning rate

weight_decay: float = 0.0

Weight decay (L2)

eps: float = 1e-08
XPM Configxpmir.learning.optim.AdamW(*, lr, weight_decay, eps)[source]

Bases: Optimizer

Submit type: xpmir.learning.optim.AdamW

Adam optimizer that takes into account the regularization

See the PyTorch documentation

lr: float = 0.001
weight_decay: float = 0.01
eps: float = 1e-08
XPM Configxpmir.learning.optim.Adafactor(*, lr, weight_decay, relative_step)[source]

Bases: Optimizer

Submit type: xpmir.learning.optim.Adafactor

Wrapper for Adafactor optimizer in Transformers library

See transformers.optimization.Adafactor for full documentation

lr: float

Learning rate

weight_decay: float = 0.0

Weight decay (L2)

relative_step: bool = True

If true, time-dependent learning rate is computed instead of external learning rate

XPM Configxpmir.learning.optim.ParameterOptimizer(*, optimizer, scheduler, module, filter)[source]

Bases: Config

Submit type: xpmir.learning.optim.ParameterOptimizer

Associates an optimizer with a list of parameters to optimize

optimizer: xpmir.learning.optim.Optimizer

The optimizer

scheduler: xpmir.learning.schedulers.Scheduler

The optional scheduler

module: xpmir.learning.optim.Module

The module from which parameters should be extracted

filter: xpmir.learning.optim.ParameterFilter = xpmir.learning.optim.ParameterFilter.XPMValue()

How parameters should be selected for this (by default, use them all)

XPM Configxpmir.learning.optim.ParameterFilter[source]

Bases: Config

Submit type: xpmir.learning.optim.ParameterFilter

One abstract class which doesn’t do the filtrage

XPM Configxpmir.learning.optim.RegexParameterFilter(*, includes, excludes)[source]

Bases: ParameterFilter

Submit type: xpmir.learning.optim.RegexParameterFilter

gives the name of the model to do the filtrage Precondition: Only and just one of the includes and excludes can be None

includes: List[str]

The str of params to be included from the model

excludes: List[str]

The str of params to be excludes from the model

XPM Configxpmir.learning.optim.OptimizationHook[source]

Bases: Hook

Submit type: xpmir.learning.optim.OptimizationHook

Base class for all optimization hooks

Hooks

XPM Configxpmir.learning.optim.GradientHook[source]

Bases: OptimizationHook

Submit type: xpmir.learning.optim.GradientHook

Hooks that are called when the gradient is computed

The gradient is guaranteed to be unscaled in this case.

XPM Configxpmir.learning.optim.GradientClippingHook(*, max_norm)[source]

Bases: GradientHook

Submit type: xpmir.learning.optim.GradientClippingHook

Gradient clipping

max_norm: float

Maximum norm for gradient clipping

XPM Configxpmir.learning.optim.GradientLogHook(*, name)[source]

Bases: GradientHook

Submit type: xpmir.learning.optim.GradientLogHook

“Log the gradient norm

name: str = gradient_norm

Parameters

During learning, some parameter-specific treatments can be applied (e.g. freezing).

Selecting

The classes below allow to select a subset of parameters.

XPM Configxpmir.learning.parameters.InverseParametersIterator(*, iterator)[source]

Bases: ParametersIterator

Submit type: xpmir.learning.parameters.InverseParametersIterator

Inverse the selection of a parameter iterator

iterator: xpmir.learning.parameters.ParametersIterator
XPM Configxpmir.learning.parameters.ParametersIterator[source]

Bases: Config, ABC

Submit type: xpmir.learning.parameters.ParametersIterator

Iterator over module parameters

This can be useful to freeze some layers, or perform any other parameter-wise operation

XPM Configxpmir.learning.parameters.SubParametersIterator(*, model, iterator, default)[source]

Bases: ParametersIterator

Submit type: xpmir.learning.parameters.SubParametersIterator

Wraps a parameter iterator over a global model and a selector over a subpart of the model

model: xpmir.learning.optim.Module

The model from which the parameters should be gathered

iterator: xpmir.learning.parameters.ParametersIterator

The sub-model iterator

default: bool

Default value for parameters not within the sub-model

XPM Configxpmir.learning.parameters.RegexParametersIterator(*, regex, model)[source]

Bases: ParametersIterator

Submit type: xpmir.learning.parameters.RegexParametersIterator

Itertor over all the parameters which match the given regex

regex: str

The regex expression

model: xpmir.learning.optim.Module

The model we want to select the parameters from

Freezing

XPM Configxpmir.learning.hooks.LayerFreezer(*, selector)[source]

Bases: InitializationTrainingHook

Submit type: xpmir.learning.hooks.LayerFreezer

This training hook class can be used to freeze some of the transformer layers

selector: xpmir.learning.parameters.ParametersIterator

How to select the layers to freeze

Loading

XPM Configxpmir.learning.parameters.NameMapper[source]

Bases: Config, ABC

Submit type: xpmir.learning.parameters.NameMapper

Changes name of parameters

XPM Configxpmir.learning.parameters.PrefixRenamer(*, model, data)[source]

Bases: NameMapper

Submit type: xpmir.learning.parameters.PrefixRenamer

Changes name of parameters

model: str

Prefix in model

data: str

Prefix in data

XPM Configxpmir.learning.parameters.PartialModuleLoader(*, value, path, selector, mapper)[source]

Bases: PathSerializationLWTask

Submit type: xpmir.learning.parameters.PartialModuleLoader

Allows to load only a part of the parameters

value: experimaestro.core.objects.Config

The configuration that will be serialized

path: Path

Path containing the data

selector: xpmir.learning.parameters.ParametersIterator

The selectors gives the list of parameters for which some

mapper: xpmir.learning.parameters.NameMapper

Maps parameter names so it matches so the saved ones

XPM Configxpmir.learning.parameters.SubModuleLoader(*, value, path, selector, saved_value)[source]

Bases: PathSerializationLWTask

Submit type: xpmir.learning.parameters.SubModuleLoader

Allows to load only a part of the parameters (with automatic renaming)

value: experimaestro.core.objects.Config

The configuration that will be serialized

path: Path

Path containing the data

selector: xpmir.learning.parameters.ParametersIterator

The selectors gives the list of parameters for which loaded parameters should be used

saved_value: xpmir.learning.optim.Module

The original module that is being loaded (optional, allows to map names)

Batching

XPM Configxpmir.learning.batchers.Batcher[source]

Bases: Config

Submit type: xpmir.learning.batchers.Batcher

Responsible for micro-batching when the batch does not fit in memory

The base class just does nothing (no adaptation)

XPM Configxpmir.learning.batchers.PowerAdaptativeBatcher[source]

Bases: Batcher

Submit type: xpmir.learning.batchers.PowerAdaptativeBatcher

Starts with the provided batch size, and then divides in 2, 3, etc. until there is no more OOM

Devices

The devices configuration allow to select both the device to use for computation and the way to use it (i.e. multi-gpu settings).

XPM Configxpmir.learning.devices.Device[source]

Bases: Config

Submit type: xpmir.learning.devices.Device

Device to use, as well as specific option (e.g. parallelism)

XPM Configxpmir.learning.devices.CudaDevice(*, gpu_determ, cpu_fallback, distributed)[source]

Bases: Device

Submit type: xpmir.learning.devices.CudaDevice

CUDA device

gpu_determ: bool = False

Sets the deterministic

cpu_fallback: bool = False

Fallback to CPU if no GPU is available

distributed: bool = False

Flag for using DistributedDataParallel When the number of GPUs is greater than one, use torch.nn.parallel.DistributedDataParallel when distributed is True and the number of GPUs greater than 1. When False, use torch.nn.DataParallel

Schedulers

XPM Configxpmir.learning.schedulers.Scheduler[source]

Bases: Config

Submit type: xpmir.learning.schedulers.Scheduler

Base class for all optimizers schedulers

XPM Configxpmir.learning.schedulers.CosineWithWarmup(*, num_warmup_steps, num_cycles)[source]

Bases: Scheduler

Submit type: xpmir.learning.schedulers.CosineWithWarmup

Cosine schedule with warmup

Uses the implementation of the transformer library

https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.get_cosine_schedule_with_warmup

num_warmup_steps: int

Number of warmup steps

num_cycles: float = 0.5

Number of cycles

XPM Configxpmir.learning.schedulers.LinearWithWarmup(*, num_warmup_steps, min_factor)[source]

Bases: Scheduler

Submit type: xpmir.learning.schedulers.LinearWithWarmup

Linear warmup followed by decay

num_warmup_steps: int

Number of warmup steps

min_factor: float = 0.0

Minimum multiplicative factor

Base classes

XPM Configxpmir.learning.base.Random(*, seed)[source]

Bases: Config

Submit type: xpmir.learning.base.Random

Random configuration

seed: int = 0

The seed to use so the random process is deterministic

XPM Configxpmir.learning.base.Sampler[source]

Bases: Config, EasyLogger

Submit type: xpmir.learning.base.Sampler

Abstract data sampler

XPM Configxpmir.learning.base.BaseSampler[source]

Bases: Sampler, Generic[T], ABC

Submit type: xpmir.learning.base.BaseSampler

XPM Configxpmir.learning.trainers.Trainer(*, hooks, model)[source]

Bases: Config, EasyLogger

Submit type: xpmir.learning.trainers.Trainer

Generic trainer

hooks: List[xpmir.learning.context.TrainingHook] = []

Hooks for this trainer: this includes the losses, but can be adapted for other uses The specific list of hooks depends on the specific trainer

model: xpmir.learning.optim.Module

If the model to optimize is different from the model passsed to Learn, this parameter can be used – initialization is still expected to be done at the learner level