HuggingFace Transformers
Integration with HuggingFace Transformers for loading pre-trained language models, tokenizers, and building transformer-based text encoders. These components are used by neural models such as cross-encoders, SPLADE, and ColBERT.
Models
Wrappers around HuggingFace model classes. These configurations define which pre-trained model to use and how it should be loaded.
- XPM Configxpmir.text.huggingface.base.HFMaskedLanguageModel(*, config)[source]
Bases:
HFModel- config: ConfigT
HuggingFace model configuration
- XPM Configxpmir.text.huggingface.base.HFModel(*, config)[source]
Bases:
Module,Generic[ConfigT]Base transformer class from Huggingface
Model structure is created during
__initialize__from theconfigwhen available. Pretrained weights can be loaded via init tasks such asHFModelInitFromIDorHFFromCheckpoint.- config: ConfigT
HuggingFace model configuration
Init tasks
Tasks that handle model weight loading at experiment submit time (from a HuggingFace model ID or a local checkpoint).
- XPM Configxpmir.text.huggingface.base.HFModelInitFromID(*, model, fabric)[source]
Bases:
HFModelInitBaseLoad pretrained weights from a HuggingFace Hub model ID.
Uses
model.config.hf_idto resolve the model.- fabric: xpm_torch.configuration.FabricConfiguration
The fabric configuration to use for initialization. When set, model creation runs inside
fabric.init_module()so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html
- XPM Configxpmir.text.huggingface.base.HFFromCheckpoint(*, model, fabric, checkpoint)[source]
Bases:
HFModelInitBaseLoad from a local checkpoint.
Uses
model.config.hf_idfor the architecture config, then loads weights fromcheckpoint.- fabric: xpm_torch.configuration.FabricConfiguration
The fabric configuration to use for initialization. When set, model creation runs inside
fabric.init_module()so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html
- checkpoint: path
The checkpoint path to load weights from
Tokenizers
HuggingFace tokenizer wrappers, with variants for different output formats (token IDs, strings, lists).
- XPM Configxpmir.text.huggingface.tokenizers.HFTokenizer(*, model_id, max_length)[source]
Bases:
Config,InitializableThis is the main tokenizer class
- XPM Configxpmir.text.huggingface.tokenizers.HFTokenizerBase(*, tokenizer)[source]
Bases:
TokenizerBase[TokenizerInput,TokenizedTexts]Base class for all Hugging-Face tokenizers
- tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer
The HuggingFace tokenizer
- XPM Configxpmir.text.huggingface.tokenizers.HFListTokenizer(*, tokenizer, separate_index)[source]
Bases:
HFTokenizerBase[List[List[str]]]Process list of texts by separating them by a separator token
- tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer
The HuggingFace tokenizer
- XPM Configxpmir.text.huggingface.tokenizers.HFStringTokenizer(*, tokenizer)[source]
Bases:
HFTokenizerBase[str|List[str] |List[Tuple[str,str]]]Process list of texts
- tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer
The HuggingFace tokenizer
- XPM Configxpmir.text.huggingface.tokenizers.HFTokenizerAdapter(*, tokenizer, converter)[source]
Bases:
HFTokenizerBase[TokenizerInput]Process list of texts
- tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer
The HuggingFace tokenizer
Encoders
Encoders that produce text representations from HuggingFace models. These
implement the TokensEncoder interface.
- XPM Configxpmir.text.huggingface.encoders.HFEncoderBase(*, model)[source]
Bases:
ModuleBase HuggingFace encoder
- model: xpmir.text.huggingface.base.HFModel
A Hugging-Face model
- XPM Configxpmir.text.huggingface.encoders.HFTokensEncoder(*, model)[source]
Bases:
HFEncoderBase,TokenizedEncoderHuggingFace-based tokenized
- model: xpmir.text.huggingface.base.HFModel
A Hugging-Face model
- XPM Configxpmir.text.huggingface.encoders.HFCLSEncoder(*, model)[source]
Bases:
HFEncoderBase,TokenizedEncoderEncodes a text using the [CLS] token
- model: xpmir.text.huggingface.base.HFModel
A Hugging-Face model
- XPM Configxpmir.text.huggingface.encoders.OneHotHuggingFaceEncoder(*, model_id, maxlen)[source]
Bases:
TextEncoderA tokenizer which encodes the tokens into 0 and 1 vector 1 represents the text contains the token and 0 otherwise
- XPM Configxpmir.text.huggingface.encoders.SentenceTransformerTextEncoder(*, model_id)[source]
Bases:
TextEncoderA Sentence Transformers text encoder
Training hooks
Hooks that modify encoder behaviour during training (e.g. selecting intermediate layers).
- XPM Configxpmir.text.huggingface.encoders.LayerSelector(*, re_layer, transformer, pick_layers, select_embeddings, select_feed_forward)[source]
Bases:
ParametersIteratorThis class can be used to pick some of the transformer layers
- transformer: xpmir.text.huggingface.base.HFModel
The model for which layers are selected