Neural models

XPMIR provides implementations of the main neural IR scoring architectures. Each architecture computes relevance scores differently:

  • Dual models encode queries and documents independently, enabling pre-computation of document representations for fast retrieval.

  • Dense models (DotDense, CosineDense) are dual models that produce a single vector per input and score via dot product or cosine similarity.

  • Late-interaction models (ColBERT) keep per-token representations and compute fine-grained token-level interactions at scoring time.

  • Sparse models (SPLADE) produce sparse bag-of-words representations with learned term weights.

  • Cross-encoders jointly encode the query-document pair for maximum accuracy, at the cost of not being able to pre-compute representations.

Dual models

Dual models compute a separate representation for documents and queries, which allows pre-computing document representations and scoring efficiently over large collections.

XPM Configxpmir.neural.DualRepresentationScorer(*, doc, bibtex)[source]

Bases: AbstractModuleScorer, Generic[QueriesRep, DocsRep]

Neural scorer based on (at least a partially) independent representation of the document and the question.

This is the base class for all scorers that depend on a map of cosine/inner products between query and document tokens.

doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

abstractmethod score_pairs(queries: QueriesRep, documents: DocsRep, info: TrainerContext | None = None) Tensor[source]

Score the specified pairs of queries/documents.

There are as many queries as documents. The exact type of queries and documents depends on the specific instance of the dual representation scorer.

Parameters:
  • queries (QueriesRep) – The list of encoded queries

  • documents (DocsRep) – The matching list of encoded documents

  • info (Optional[TrainerContext]) – _description_

Returns:

A tensor of dimension (N) where N is the number of documents/queries

Return type:

torch.Tensor

abstractmethod score_product(queries: QueriesRep, documents: DocsRep, info: TrainerContext | None = None) Tensor[source]

Computes the score of all possible pairs of query and document

Parameters:
  • queries (Any) – The encoded queries

  • documents (Any) – The encoded documents

  • info (Optional[TrainerContext]) – The training context (if learning)

Returns:

A tensor of dimension (N, P) where N is the number of queries and P the number of documents

Return type:

torch.Tensor

XPM Configxpmir.neural.dual.DualVectorScorer(*, doc, bibtex, encoder, query_encoder)[source]

Bases: DualRepresentationScorer

A scorer based on dual vectorial representations

doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase

The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase

The query encoder (optional, if not defined uses the query_encoder)

XPM Configxpmir.neural.dual.DualModuleLoader(*, value, settings, encoder_path, query_encoder_path)[source]

Bases: ModuleLoader

ModuleLoader for dual encoder models.

Has distinct encoder_path and query_encoder_path DataPaths so each encoder is serialized independently. This enables proper sentence-transformers format on HF Hub export (symmetric vs router/asymmetric).

value: experimaestro.core.objects.config.Config

The configuration that will be serialized

settings: experimaestro.core.objects.config.Config

Optional metadata (validation info, checkpoint epoch, etc.)

encoder_path: path

Path to the document encoder checkpoint

query_encoder_path: path

Path to the query encoder checkpoint (if separate from doc encoder)

Training hooks

Hooks that can be attached to dual models during training (e.g. for regularisation).

XPM Configxpmir.neural.dual.DualVectorListener[source]

Bases: TrainingHook

Listener called with the (vectorial) representation of queries and documents

The hook is called just after the computation of documents and queries representations.

This can be used for logging purposes, but more importantly, to add regularization losses such as the FlopsRegularizer regularizer.

__call__(context: TrainerContext, queries: Tensor, documents: Tensor)[source]

Hook handler

Parameters:
  • context (TrainerContext) – The training context

  • queries (torch.Tensor) – The query vectors

  • documents (torch.Tensor) – The document vectors

Raises:

NotImplementedError – _description_

XPM Configxpmir.neural.dual.DualVectorScorerListener[source]

Bases: TrainingHook, ABC

Listener called with the (vectorial) representation of queries and documents

The hook is called just after the computation of documents and queries representations.

This can be used for logging purposes, but more importantly, to add regularization losses such as the FlopsRegularizer regularizer.

XPM Configxpmir.neural.dual.FlopsRegularizer(*, lambda_q, lambda_d)[source]

Bases: DualVectorListener

The FLOPS regularizer computes

\[FLOPS(q,d) = \lambda_q FLOPS(q) + \lambda_d FLOPS(d)\]

where

\[FLOPS(x) = \left( \frac{1}{d} \sum_{i=1}^d |x_i| \right)^2\]
lambda_q: float

Lambda for queries

lambda_d: float

Lambda for documents

XPM Configxpmir.neural.dual.ScheduledFlopsRegularizer(*, lambda_q, lambda_d, min_lambda_q, min_lambda_d, lambda_warmup_steps)[source]

Bases: FlopsRegularizer

The FLOPS regularizer where the lamdba_q and lambda_d varie according to the steps. The lambda values goes quadratic before the `lambda_warmup_steps`, and then remains constant

lambda_q: float

Lambda for queries

lambda_d: float

Lambda for documents

min_lambda_q: float = 0

Min value for the lambda_q before it increase

min_lambda_d: float = 0

Min value for the lambda_d before it increase

lambda_warmup_steps: int = 0

The warmup steps for the lambda

Dense models

Dense models produce a single fixed-size vector per query or document and score with a dot product or cosine similarity. They are commonly initialised from Sentence Transformers checkpoints.

XPM Configxpmir.neural.dual.Dense(*, doc, bibtex, encoder, query_encoder)[source]

Bases: DualVectorScorer

A scorer based on a pair of (query, document) dense vectors

doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase

The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase

The query encoder (optional, if not defined uses the query_encoder)

classmethod from_sentence_transformers(hf_id: str, **kwargs)[source]

Creates a dense model from a Sentence transformer

The list can be found on HuggingFace https://huggingface.co/models?library=sentence-transformers

Parameters:

hf_id – The HuggingFace ID

XPM Configxpmir.neural.dual.DotDense(*, doc, bibtex, encoder, query_encoder)[source]

Bases: Dense

Dual model based on inner product.

doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase

The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase

The query encoder (optional, if not defined uses the query_encoder)

XPM Configxpmir.neural.dual.CosineDense(*, doc, bibtex, encoder, query_encoder)[source]

Bases: Dense

Dual model based on cosine similarity.

doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TextEncoderBase

The document (and potentially query) encoder

query_encoder: xpmir.text.encoders.TextEncoderBase

The query encoder (optional, if not defined uses the query_encoder)

Late-interaction (ColBERT)

ColBERT retains per-token embeddings and scores via late interaction (MaxSim). This gives accuracy close to cross-encoders while still allowing document representations to be pre-computed and indexed.

XPM Configxpmir.neural.colbert.ColBERTEncoder(*, doc, bibtex, encoder, query_encoder, dim, query_maxlen, doc_maxlen)[source]

Bases: DualVectorScorer

ColBERT-style dual scorer with late interaction MaxSim.

The document (and optional query) encoder must return TokensRepresentationOutput, i.e. a (batch, max_tokens, hidden_dim) tensor together with the tokenized inputs (providing the attention mask). A trainable linear projection reduces the per-token vectors to dim and the vectors are L2-normalised so the dot product amounts to a cosine similarity.

doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

encoder: xpmir.text.encoders.TokenizedTextEncoderBase

The document token encoder (returns one vector per token).

query_encoder: xpmir.text.encoders.TokenizedTextEncoderBase

Optional separate query encoder. When unset, the document encoder is used for queries too.

dim: int = 128

Output dimension of the per-token projection.

query_maxlen: int = 32

Maximum number of tokens kept for a query.

doc_maxlen: int = 180

Maximum number of tokens kept for a document.

document_token_embeddings(records: List[IDTextRecord]) List[Tensor][source]

Encode a batch of documents and return the list of per-token embeddings, one tensor (num_tokens, dim) per document. Padding positions are filtered out.

encode_documents(records: List[IDTextRecord]) TokensRepresentationOutput[source]

Encode a list of texts (document or query)

The return value is model dependent

encode_queries(records: List[IDTextRecord]) TokensRepresentationOutput[source]

Encode a list of texts (document or query)

The return value is model dependent, but should be sequence

By default, uses merge

query_token_embeddings(records: List[IDTextRecord]) Tensor[source]

Encode a batch of queries and return a dense (batch, query_maxlen, dim) tensor suitable for fast-plaid search.

score_pairs(queries: TokensRepresentationOutput, documents: TokensRepresentationOutput, info: TrainerContext | None = None) Tensor[source]

Score the specified pairs of queries/documents.

There are as many queries as documents. The exact type of queries and documents depends on the specific instance of the dual representation scorer.

Parameters:
  • queries (QueriesRep) – The list of encoded queries

  • documents (DocsRep) – The matching list of encoded documents

  • info (Optional[TrainerContext]) – _description_

Returns:

A tensor of dimension (N) where N is the number of documents/queries

Return type:

torch.Tensor

score_product(queries: TokensRepresentationOutput, documents: TokensRepresentationOutput, info: TrainerContext | None = None) Tensor[source]

Computes the score of all possible pairs of query and document

Parameters:
  • queries (Any) – The encoded queries

  • documents (Any) – The encoded documents

  • info (Optional[TrainerContext]) – The training context (if learning)

Returns:

A tensor of dimension (N, P) where N is the number of queries and P the number of documents

Return type:

torch.Tensor

Sparse models (SPLADE)

SPLADE-family models produce sparse representations with learned term weights. Documents and queries are mapped to high-dimensional sparse vectors over the vocabulary, enabling efficient inverted-index retrieval.

Cross-encoders (HuggingFace)

Cross-encoders jointly encode the query and document with a single transformer pass, producing the most accurate relevance scores. They are typically used as re-rankers in a multi-stage pipeline.

XPM Configxpmir.neural.huggingface.HFCrossScorer(*, doc, bibtex, encoder, tokenizer)[source]

Bases: AbstractModuleScorer

Load a cross scorer model from the huggingface.

Example

>>> from xpmir.neural.huggingface import hf_cross_scorer
>>> model, init_tasks = hf_cross_scorer(hf_id="cross-encoder/ms-marco-MiniLM-L-6-v2")
doc: str

Paper description or title (used in HF Hub README)

bibtex: str

BibTeX citation (used in HF Hub README)

encoder: xpmir.text.huggingface.base.HFSequenceClassification

The encoder from Hugging Face

tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer

The tokenizer for the cross-scorer

XPM Configxpmir.neural.huggingface.HFQueryDocTokenizer(*, model_id, max_length, max_query_length, max_doc_length)[source]

Bases: HFTokenizer

Specific tokenizer for Cross-Scorers that handles query and document truncation.

This tokenizer allows for independent limits on query and document lengths, while ensuring the combined sequence ([CLS] query [SEP] document [SEP]) never exceeds the model’s maximum length.

Truncation strategy:

  1. Initial encoding caps each side at its respective max length (or the total available content limit).

  2. If the combined length still exceeds the total limit, the document is truncated first to make room.

  3. The query is only truncated if the document is entirely consumed and the sequence still exceeds the limit.

This ensures that if a query is short, the document can utilize the remaining space up to the total limit.

model_id: str

The tokenizer hugginface ID

max_length: int

Maximum length for the tokenizer (can be overridden by the model) default can be set by default using the hf config

max_query_length: int

maximum number of tokens for the query side (defaults to max_doc_length // 2)

max_doc_length: int

maximum number of tokens for the document side (defaults to max_length)

XPM Configxpmir.neural.huggingface.CrossEncoderModuleLoader(*, value, settings, encoder_path)[source]

Bases: ModuleLoader

ModuleLoader for cross-encoder models.

Saves the model in standard HuggingFace format (config.json + model.safetensors + tokenizer), which is directly loadable by sentence-transformers CrossEncoder.

value: experimaestro.core.objects.config.Config

The configuration that will be serialized

settings: experimaestro.core.objects.config.Config

Optional metadata (validation info, checkpoint epoch, etc.)

encoder_path: path

Path to the encoder checkpoint directory

XPM Configxpmir.neural.huggingface.InitCEFromHFID(*, model, fabric)[source]

Bases: HFModelInitBase

Load Cross-encoder weights from a HuggingFace Hub model ID. this is specific to this class: we need to ensure n_labels is 1. Uses model.config.hf_id to resolve the model.

model: xpmir.text.huggingface.base.HFModel
fabric: xpm_torch.configuration.FabricConfiguration

The fabric configuration to use for initialization. When set, model creation runs inside fabric.init_module() so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html

XPM Configxpmir.text.huggingface.base.HFConfig[source]

Bases: Config

Base configuration for HuggingFace models

XPM Configxpmir.text.huggingface.base.HFConfigID(*, hf_id)[source]

Bases: HFConfig

Configuration identified by a HuggingFace model ID

hf_id: str

HuggingFace model ID (e.g. distilbert-base-uncased)

XPM Configxpmir.text.huggingface.base.HFModelInitBase(*, model, fabric)[source]

Bases: LightweightTask, ABC

Base class for initializing HF models

model: xpmir.text.huggingface.base.HFModel
fabric: xpm_torch.configuration.FabricConfiguration

The fabric configuration to use for initialization. When set, model creation runs inside fabric.init_module() so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html

XPM Configxpmir.text.huggingface.base.HFSequenceClassification(*, config, n_labels)[source]

Bases: HFModel

HuggingFace model for sequence classification

config: ConfigT

HuggingFace model configuration

n_labels: int = 1

Cross-encoders (Sentence-Transformers)

XPMIR also supports cross-encoders via the Sentence Transformers library. This is particularly useful for models that require specific chat templates or prompt-based ranking (like some LLM-based rankers) that are natively supported by Sentence-Transformers.