Neural models

XPMIR provides implementations of the main neural IR scoring architectures. Each architecture computes relevance scores differently:

Dual models encode queries and documents independently, enabling pre-computation of document representations for fast retrieval.
Dense models (DotDense, CosineDense) are dual models that produce a single vector per input and score via dot product or cosine similarity.
Late-interaction models (ColBERT) keep per-token representations and compute fine-grained token-level interactions at scoring time.
Sparse models (SPLADE) produce sparse bag-of-words representations with learned term weights.
Cross-encoders jointly encode the query-document pair for maximum accuracy, at the cost of not being able to pre-compute representations.

Dual models 

Dual models compute a separate representation for documents and queries, which allows pre-computing document representations and scoring efficiently over large collections.

XPM Configxpmir.neural.DualRepresentationScorer(*, doc, bibtex)[source]

Bases: AbstractModuleScorer, Generic[QueriesRep, DocsRep]

Neural scorer based on (at least a partially) independent representation of the document and the question.

This is the base class for all scorers that depend on a map of cosine/inner products between query and document tokens.

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

abstractmethod score_pairs(queries: QueriesRep, documents: DocsRep, info: TrainerContext | None = None) → Tensor[source]

Score the specified pairs of queries/documents.

There are as many queries as documents. The exact type of queries and documents depends on the specific instance of the dual representation scorer.

Parameters:

queries (QueriesRep) – The list of encoded queries
documents (DocsRep) – The matching list of encoded documents
info (Optional[TrainerContext]) – _description_

Returns:

A tensor of dimension (N) where N is the number of documents/queries

Return type:

torch.Tensor

abstractmethod score_product(queries: QueriesRep, documents: DocsRep, info: TrainerContext | None = None) → Tensor[source]

Computes the score of all possible pairs of query and document

Parameters:

queries (Any) – The encoded queries
documents (Any) – The encoded documents
info (Optional[TrainerContext]) – The training context (if learning)

Returns:

A tensor of dimension (N, P) where N is the number of queries and P the number of documents

Return type:

torch.Tensor

XPM Configxpmir.neural.dual.DualVectorScorer(*, doc, bibtex, encoder, query_encoder)[source]

Bases: DualRepresentationScorer

A scorer based on dual vectorial representations

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase: The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase: The query encoder (optional, if not defined uses the query_encoder)

XPM Configxpmir.neural.dual.DualModuleLoader(*, value, settings, encoder_path, query_encoder_path)[source]

Bases: ModuleLoader

ModuleLoader for dual encoder models.

Has distinct encoder_path and query_encoder_path DataPaths so each encoder is serialized independently. This enables proper sentence-transformers format on HF Hub export (symmetric vs router/asymmetric).

value: experimaestro.core.objects.config.Config: The configuration that will be serialized

settings: experimaestro.core.objects.config.Config: Optional metadata (validation info, checkpoint epoch, etc.)

encoder_path: path: Path to the document encoder checkpoint

query_encoder_path: path: Path to the query encoder checkpoint (if separate from doc encoder)

Training hooks 

Hooks that can be attached to dual models during training (e.g. for regularisation).

XPM Configxpmir.neural.dual.DualVectorListener[source]

Bases: TrainingHook

Listener called with the (vectorial) representation of queries and documents

The hook is called just after the computation of documents and queries representations.

This can be used for logging purposes, but more importantly, to add regularization losses such as the FlopsRegularizer regularizer.

__call__(context: TrainerContext, queries: Tensor, documents: Tensor)[source]

Hook handler

Parameters:

context (TrainerContext) – The training context
queries (torch.Tensor) – The query vectors
documents (torch.Tensor) – The document vectors

Raises:

NotImplementedError – _description_

XPM Configxpmir.neural.dual.DualVectorScorerListener[source]

Bases: TrainingHook, ABC

Listener called with the (vectorial) representation of queries and documents

The hook is called just after the computation of documents and queries representations.

This can be used for logging purposes, but more importantly, to add regularization losses such as the FlopsRegularizer regularizer.

XPM Configxpmir.neural.dual.FlopsRegularizer(*, lambda_q, lambda_d)[source]

Bases: DualVectorListener

The FLOPS regularizer computes

\[FLOPS(q,d) = \lambda_q FLOPS(q) + \lambda_d FLOPS(d)\]

where

\[FLOPS(x) = \left( \frac{1}{d} \sum_{i=1}^d |x_i| \right)^2\]

lambda_q: float: Lambda for queries

lambda_d: float: Lambda for documents

XPM Configxpmir.neural.dual.ScheduledFlopsRegularizer(*, lambda_q, lambda_d, min_lambda_q, min_lambda_d, lambda_warmup_steps)[source]

Bases: FlopsRegularizer

The FLOPS regularizer where the lamdba_q and lambda_d varie according to the steps. The lambda values goes quadratic before the `lambda_warmup_steps`, and then remains constant

lambda_q: float: Lambda for queries

lambda_d: float: Lambda for documents

min_lambda_q: float = 0: Min value for the lambda_q before it increase

min_lambda_d: float = 0: Min value for the lambda_d before it increase

lambda_warmup_steps: int = 0: The warmup steps for the lambda

Dense models 

Dense models produce a single fixed-size vector per query or document and score with a dot product or cosine similarity. They are commonly initialised from Sentence Transformers checkpoints.

XPM Configxpmir.neural.dual.Dense(*, doc, bibtex, encoder, query_encoder)[source]

Bases: DualVectorScorer

A scorer based on a pair of (query, document) dense vectors

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase: The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase: The query encoder (optional, if not defined uses the query_encoder)

classmethod from_sentence_transformers(hf_id: str, **kwargs)[source]

Creates a dense model from a Sentence transformer

The list can be found on HuggingFace https://huggingface.co/models?library=sentence-transformers

Parameters:: hf_id – The HuggingFace ID

XPM Configxpmir.neural.dual.DotDense(*, doc, bibtex, encoder, query_encoder)[source]

Bases: Dense

Dual model based on inner product.

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase: The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase: The query encoder (optional, if not defined uses the query_encoder)

XPM Configxpmir.neural.dual.CosineDense(*, doc, bibtex, encoder, query_encoder)[source]

Bases: Dense

Dual model based on cosine similarity.

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase: The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase: The query encoder (optional, if not defined uses the query_encoder)

Late-interaction (ColBERT)

ColBERT retains per-token embeddings and scores via late interaction (MaxSim). This gives accuracy close to cross-encoders while still allowing document representations to be pre-computed and indexed.

XPM Configxpmir.neural.colbert.ColBERTEncoder(*, doc, bibtex, encoder, query_encoder, dim, query_maxlen, doc_maxlen, query_augmentation)[source]

Bases: DualVectorScorer

ColBERT-style dual scorer with late interaction MaxSim.

The document (and optional query) encoder must return TokensRepresentationOutput, i.e. a (batch, max_tokens, hidden_dim) tensor together with the tokenized inputs (providing the attention mask). A trainable linear projection reduces the per-token vectors to dim and the vectors are L2-normalised so the dot product amounts to a cosine similarity.

The encoder (and query_encoder) inherited from DualVectorScorer must in practice be a TokenizedTextEncoder returning TokensRepresentationOutput. The TokenizedTextEncoder exposes the tokenize / forward_tokenized split that query augmentation needs.

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase: The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase: The query encoder (optional, if not defined uses the query_encoder)

dim: int = 128: Output dimension of the per-token projection.

query_maxlen: int = 32: Maximum number of tokens kept for a query.

doc_maxlen: int = 180: Maximum number of tokens kept for a document.

query_augmentation: bool = True: Whether to apply ColBERT’s query augmentation: queries shorter than query_maxlen are right-padded with [MASK] tokens (instead of [PAD]) and every position participates in MaxSim. This mirrors the original ColBERT implementation. Disable to use plain padded queries with padding excluded from MaxSim.

document_token_embeddings(records: List[IDTextRecord]) → List[Tensor][source]: Encode a batch of documents and return the list of per-token embeddings, one tensor (num_tokens, dim) per document. Padding positions are filtered out.

encode_documents(records: List[IDTextRecord]) → TokensRepresentationOutput[source]

Encode a list of texts (document or query)

The return value is model dependent

encode_queries(records: List[IDTextRecord]) → TokensRepresentationOutput[source]

Encode a list of texts (document or query)

The return value is model dependent, but should be sequence

By default, uses merge

query_token_embeddings(records: List[IDTextRecord]) → Tensor[source]: Encode a batch of queries and return a dense (batch, query_maxlen, dim) tensor suitable for fast-plaid search.

score_pairs(queries: TokensRepresentationOutput, documents: TokensRepresentationOutput, info: TrainerContext | None = None) → Tensor[source]

Score the specified pairs of queries/documents.

There are as many queries as documents. The exact type of queries and documents depends on the specific instance of the dual representation scorer.

Parameters:

queries (QueriesRep) – The list of encoded queries
documents (DocsRep) – The matching list of encoded documents
info (Optional[TrainerContext]) – _description_

Returns:

A tensor of dimension (N) where N is the number of documents/queries

Return type:

torch.Tensor

score_product(queries: TokensRepresentationOutput, documents: TokensRepresentationOutput, info: TrainerContext | None = None) → Tensor[source]

Computes the score of all possible pairs of query and document

Parameters:

queries (Any) – The encoded queries
documents (Any) – The encoded documents
info (Optional[TrainerContext]) – The training context (if learning)

Returns:

A tensor of dimension (N, P) where N is the number of queries and P the number of documents

Return type:

torch.Tensor

XPM Configxpmir.neural.colbert.PylateColBERT(*, doc, bibtex, model_id, dim, query_maxlen, doc_maxlen)[source]

Bases: AbstractModuleScorer

Interface with Pylate to use a ColBERT model as a scorer.

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

model_id: str: The HuggingFace model ID or path.

dim: int = 128: Output dimension of the per-token projection.

query_maxlen: int = 32: Maximum number of tokens kept for a query.

doc_maxlen: int = 180: Maximum number of tokens kept for a document.

XPM Configxpmir.neural.colbert.InitPylateColBERT(*, model)[source]

Bases: LightweightTask

Initializes the PylateColBERT by loading the model.

model: xpmir.rankers.scorer.AbstractModuleScorer

Sparse models (SPLADE)

SPLADE-family models produce sparse representations with learned term weights. Documents and queries are mapped to high-dimensional sparse vectors over the vocabulary, enabling efficient inverted-index retrieval.

Cross-encoders (HuggingFace)

Cross-encoders jointly encode the query and document with a single transformer pass, producing the most accurate relevance scores. They are typically used as re-rankers in a multi-stage pipeline.

XPM Configxpmir.neural.huggingface.HFCrossScorer(*, doc, bibtex, encoder, tokenizer)[source]

Bases: AbstractModuleScorer

Load a cross scorer model from the huggingface.

Example

>>> from xpmir.neural.huggingface import hf_cross_scorer
>>> model, init_tasks = hf_cross_scorer(hf_id="cross-encoder/ms-marco-MiniLM-L-6-v2")

doc: str: Paper description or title (used in HF Hub README)

bibtex: str: BibTeX citation (used in HF Hub README)

encoder: xpmir.text.huggingface.base.HFSequenceClassification: The encoder from Hugging Face

tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer: The tokenizer for the cross-scorer

XPM Configxpmir.neural.huggingface.HFQueryDocTokenizer(*, model_id, max_length, max_query_length, max_doc_length)[source]

Bases: HFTokenizer

Specific tokenizer for Cross-Scorers that handles query and document truncation.

This tokenizer allows for independent limits on query and document lengths, while ensuring the combined sequence ([CLS] query [SEP] document [SEP]) never exceeds the model’s maximum length.

Truncation strategy:

Initial encoding caps each side at its respective max length (or the total available content limit).
If the combined length still exceeds the total limit, the document is truncated first to make room.
The query is only truncated if the document is entirely consumed and the sequence still exceeds the limit.

This ensures that if a query is short, the document can utilize the remaining space up to the total limit.

model_id: str: The tokenizer hugginface ID

max_length: int: Maximum length for the tokenizer (can be overridden by the model) default can be set by default using the hf config

max_query_length: int: maximum number of tokens for the query side (defaults to max_doc_length // 2)

max_doc_length: int: maximum number of tokens for the document side (defaults to max_length)

XPM Configxpmir.neural.huggingface.LLMRankerTokenizer(*, model_id, max_length, max_query_length, max_doc_length, prompt_template)[source]

Bases: HFTokenizer

Specific tokenizer for LLM Cross-Scorers that handles query and document truncation separately

model_id: str: The tokenizer hugginface ID

max_length: int: Maximum length for the tokenizer (can be overridden by the model) default can be set by default using the hf config

max_query_length: int: maximum number of tokens for the query side

max_doc_length: int: maximum number of tokens for the document side

prompt_template: str = Query: {query} Document: {document} Relevant:: Prompt template for the LLM

XPM Configxpmir.neural.huggingface.CrossEncoderModuleLoader(*, value, settings, encoder_path)[source]

Bases: ModuleLoader

ModuleLoader for cross-encoder models.

Saves the model in standard HuggingFace format (config.json + model.safetensors + tokenizer), which is directly loadable by sentence-transformers CrossEncoder.

value: experimaestro.core.objects.config.Config: The configuration that will be serialized

settings: experimaestro.core.objects.config.Config: Optional metadata (validation info, checkpoint epoch, etc.)

encoder_path: path: Path to the encoder checkpoint directory

XPM Configxpmir.neural.huggingface.InitCEFromHFID(*, model, fabric)[source]

Bases: HFModelInitBase

Load Cross-encoder weights from a HuggingFace Hub model ID. this is specific to this class: we need to ensure n_labels is 1. Uses model.config.hf_id to resolve the model.

model: xpmir.text.huggingface.base.HFModel

fabric: xpm_torch.configuration.FabricConfiguration: The fabric configuration to use for initialization. When set, model creation runs inside fabric.init_module() so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html

XPM Configxpmir.text.huggingface.base.HFConfig[source]

Bases: Config

Base configuration for HuggingFace models

XPM Configxpmir.text.huggingface.base.HFConfigID(*, hf_id)[source]

Bases: HFConfig

Configuration identified by a HuggingFace model ID

hf_id: str: HuggingFace model ID (e.g. distilbert-base-uncased)

XPM Configxpmir.text.huggingface.base.HFModelInitBase(*, model, fabric)[source]

Bases: LightweightTask, ABC

Base class for initializing HF models

model: xpmir.text.huggingface.base.HFModel

fabric: xpm_torch.configuration.FabricConfiguration: The fabric configuration to use for initialization. When set, model creation runs inside fabric.init_module() so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html

XPM Configxpmir.text.huggingface.base.HFSequenceClassification(*, config, n_labels)[source]

Bases: HFModel

HuggingFace model for sequence classification

config: ConfigT: HuggingFace model configuration

n_labels: int = 1

Cross-encoders (Sentence-Transformers)

XPMIR also supports cross-encoders via the Sentence Transformers library. This is particularly useful for models that require specific chat templates or prompt-based ranking (like some LLM-based rankers) that are natively supported by Sentence-Transformers.

Neural models

Dual models

Training hooks

Dense models

Late-interaction (ColBERT)

Sparse models (SPLADE)

Cross-encoders (HuggingFace)

Cross-encoders (Sentence-Transformers)

Dual models 

Training hooks 

Dense models 