Neural models
XPMIR provides implementations of the main neural IR scoring architectures. Each architecture computes relevance scores differently:
Dual models encode queries and documents independently, enabling pre-computation of document representations for fast retrieval.
Dense models (DotDense, CosineDense) are dual models that produce a single vector per input and score via dot product or cosine similarity.
Late-interaction models (ColBERT) keep per-token representations and compute fine-grained token-level interactions at scoring time.
Sparse models (SPLADE) produce sparse bag-of-words representations with learned term weights.
Cross-encoders jointly encode the query-document pair for maximum accuracy, at the cost of not being able to pre-compute representations.
Dual models
Dual models compute a separate representation for documents and queries, which allows pre-computing document representations and scoring efficiently over large collections.
- XPM Configxpmir.neural.DualRepresentationScorer(*, doc, bibtex)[source]
Bases:
AbstractModuleScorer,Generic[QueriesRep,DocsRep]Neural scorer based on (at least a partially) independent representation of the document and the question.
This is the base class for all scorers that depend on a map of cosine/inner products between query and document tokens.
- abstractmethod score_pairs(queries: QueriesRep, documents: DocsRep, info: TrainerContext | None = None) Tensor[source]
Score the specified pairs of queries/documents.
There are as many queries as documents. The exact type of queries and documents depends on the specific instance of the dual representation scorer.
- Parameters:
queries (QueriesRep) – The list of encoded queries
documents (DocsRep) – The matching list of encoded documents
info (Optional[TrainerContext]) – _description_
- Returns:
A tensor of dimension (N) where N is the number of documents/queries
- Return type:
- abstractmethod score_product(queries: QueriesRep, documents: DocsRep, info: TrainerContext | None = None) Tensor[source]
Computes the score of all possible pairs of query and document
- Parameters:
queries (Any) – The encoded queries
documents (Any) – The encoded documents
info (Optional[TrainerContext]) – The training context (if learning)
- Returns:
A tensor of dimension (N, P) where N is the number of queries and P the number of documents
- Return type:
- XPM Configxpmir.neural.dual.DualVectorScorer(*, doc, bibtex, encoder, query_encoder)[source]
Bases:
DualRepresentationScorerA scorer based on dual vectorial representations
- encoder: xpmir.text.encoders.TextEncoderBase
The document (and potentially query) encoder
- query_encoder: xpmir.text.encoders.TextEncoderBase
The query encoder (optional, if not defined uses the query_encoder)
- XPM Configxpmir.neural.dual.DualModuleLoader(*, value, settings, encoder_path, query_encoder_path)[source]
Bases:
ModuleLoaderModuleLoader for dual encoder models.
Has distinct
encoder_pathandquery_encoder_pathDataPaths so each encoder is serialized independently. This enables proper sentence-transformers format on HF Hub export (symmetric vs router/asymmetric).- value: experimaestro.core.objects.config.Config
The configuration that will be serialized
- settings: experimaestro.core.objects.config.Config
Optional metadata (validation info, checkpoint epoch, etc.)
- encoder_path: path
Path to the document encoder checkpoint
- query_encoder_path: path
Path to the query encoder checkpoint (if separate from doc encoder)
Training hooks
Hooks that can be attached to dual models during training (e.g. for regularisation).
- XPM Configxpmir.neural.dual.DualVectorListener[source]
Bases:
TrainingHookListener called with the (vectorial) representation of queries and documents
The hook is called just after the computation of documents and queries representations.
This can be used for logging purposes, but more importantly, to add regularization losses such as the
FlopsRegularizerregularizer.- __call__(context: TrainerContext, queries: Tensor, documents: Tensor)[source]
Hook handler
- Parameters:
context (TrainerContext) – The training context
queries (torch.Tensor) – The query vectors
documents (torch.Tensor) – The document vectors
- Raises:
NotImplementedError – _description_
- XPM Configxpmir.neural.dual.DualVectorScorerListener[source]
Bases:
TrainingHook,ABCListener called with the (vectorial) representation of queries and documents
The hook is called just after the computation of documents and queries representations.
This can be used for logging purposes, but more importantly, to add regularization losses such as the
FlopsRegularizerregularizer.
- XPM Configxpmir.neural.dual.FlopsRegularizer(*, lambda_q, lambda_d)[source]
Bases:
DualVectorListenerThe FLOPS regularizer computes
\[FLOPS(q,d) = \lambda_q FLOPS(q) + \lambda_d FLOPS(d)\]where
\[FLOPS(x) = \left( \frac{1}{d} \sum_{i=1}^d |x_i| \right)^2\]
- XPM Configxpmir.neural.dual.ScheduledFlopsRegularizer(*, lambda_q, lambda_d, min_lambda_q, min_lambda_d, lambda_warmup_steps)[source]
Bases:
FlopsRegularizerThe FLOPS regularizer where the lamdba_q and lambda_d varie according to the steps. The lambda values goes quadratic before the
`lambda_warmup_steps`, and then remains constant
Dense models
Dense models produce a single fixed-size vector per query or document and score with a dot product or cosine similarity. They are commonly initialised from Sentence Transformers checkpoints.
- XPM Configxpmir.neural.dual.Dense(*, doc, bibtex, encoder, query_encoder)[source]
Bases:
DualVectorScorerA scorer based on a pair of (query, document) dense vectors
- encoder: xpmir.text.encoders.TextEncoderBase
The document (and potentially query) encoder
- query_encoder: xpmir.text.encoders.TextEncoderBase
The query encoder (optional, if not defined uses the query_encoder)
- classmethod from_sentence_transformers(hf_id: str, **kwargs)[source]
Creates a dense model from a Sentence transformer
The list can be found on HuggingFace https://huggingface.co/models?library=sentence-transformers
- Parameters:
hf_id – The HuggingFace ID
- XPM Configxpmir.neural.dual.DotDense(*, doc, bibtex, encoder, query_encoder)[source]
Bases:
DenseDual model based on inner product.
- encoder: xpmir.text.encoders.TextEncoderBase
The document (and potentially query) encoder
- query_encoder: xpmir.text.encoders.TextEncoderBase
The query encoder (optional, if not defined uses the query_encoder)
- XPM Configxpmir.neural.dual.CosineDense(*, doc, bibtex, encoder, query_encoder)[source]
Bases:
DenseDual model based on cosine similarity.
- encoder: xpmir.text.encoders.TextEncoderBase
The document (and potentially query) encoder
- query_encoder: xpmir.text.encoders.TextEncoderBase
The query encoder (optional, if not defined uses the query_encoder)
Late-interaction (ColBERT)
ColBERT retains per-token embeddings and scores via late interaction (MaxSim). This gives accuracy close to cross-encoders while still allowing document representations to be pre-computed and indexed.
- XPM Configxpmir.neural.colbert.ColBERTEncoder(*, doc, bibtex, encoder, query_encoder, dim, query_maxlen, doc_maxlen)[source]
Bases:
DualVectorScorerColBERT-style dual scorer with late interaction MaxSim.
The document (and optional query) encoder must return
TokensRepresentationOutput, i.e. a(batch, max_tokens, hidden_dim)tensor together with the tokenized inputs (providing the attention mask). A trainable linear projection reduces the per-token vectors todimand the vectors are L2-normalised so the dot product amounts to a cosine similarity.- encoder: xpmir.text.encoders.TokenizedTextEncoderBase
The document token encoder (returns one vector per token).
- query_encoder: xpmir.text.encoders.TokenizedTextEncoderBase
Optional separate query encoder. When unset, the document encoder is used for queries too.
- document_token_embeddings(records: List[IDTextRecord]) List[Tensor][source]
Encode a batch of documents and return the list of per-token embeddings, one tensor
(num_tokens, dim)per document. Padding positions are filtered out.
- encode_documents(records: List[IDTextRecord]) TokensRepresentationOutput[source]
Encode a list of texts (document or query)
The return value is model dependent
- encode_queries(records: List[IDTextRecord]) TokensRepresentationOutput[source]
Encode a list of texts (document or query)
The return value is model dependent, but should be sequence
By default, uses merge
- query_token_embeddings(records: List[IDTextRecord]) Tensor[source]
Encode a batch of queries and return a dense
(batch, query_maxlen, dim)tensor suitable for fast-plaid search.
- score_pairs(queries: TokensRepresentationOutput, documents: TokensRepresentationOutput, info: TrainerContext | None = None) Tensor[source]
Score the specified pairs of queries/documents.
There are as many queries as documents. The exact type of queries and documents depends on the specific instance of the dual representation scorer.
- Parameters:
queries (QueriesRep) – The list of encoded queries
documents (DocsRep) – The matching list of encoded documents
info (Optional[TrainerContext]) – _description_
- Returns:
A tensor of dimension (N) where N is the number of documents/queries
- Return type:
- score_product(queries: TokensRepresentationOutput, documents: TokensRepresentationOutput, info: TrainerContext | None = None) Tensor[source]
Computes the score of all possible pairs of query and document
- Parameters:
queries (Any) – The encoded queries
documents (Any) – The encoded documents
info (Optional[TrainerContext]) – The training context (if learning)
- Returns:
A tensor of dimension (N, P) where N is the number of queries and P the number of documents
- Return type:
Sparse models (SPLADE)
SPLADE-family models produce sparse representations with learned term weights. Documents and queries are mapped to high-dimensional sparse vectors over the vocabulary, enabling efficient inverted-index retrieval.
Cross-encoders (HuggingFace)
Cross-encoders jointly encode the query and document with a single transformer pass, producing the most accurate relevance scores. They are typically used as re-rankers in a multi-stage pipeline.
- XPM Configxpmir.neural.huggingface.HFCrossScorer(*, doc, bibtex, encoder, tokenizer)[source]
Bases:
AbstractModuleScorerLoad a cross scorer model from the huggingface.
Example
>>> from xpmir.neural.huggingface import hf_cross_scorer >>> model, init_tasks = hf_cross_scorer(hf_id="cross-encoder/ms-marco-MiniLM-L-6-v2")
- encoder: xpmir.text.huggingface.base.HFSequenceClassification
The encoder from Hugging Face
- tokenizer: xpmir.text.huggingface.tokenizers.HFTokenizer
The tokenizer for the cross-scorer
- XPM Configxpmir.neural.huggingface.HFQueryDocTokenizer(*, model_id, max_length, max_query_length, max_doc_length)[source]
Bases:
HFTokenizerSpecific tokenizer for Cross-Scorers that handles query and document truncation.
This tokenizer allows for independent limits on query and document lengths, while ensuring the combined sequence ([CLS] query [SEP] document [SEP]) never exceeds the model’s maximum length.
Truncation strategy:
Initial encoding caps each side at its respective max length (or the total available content limit).
If the combined length still exceeds the total limit, the document is truncated first to make room.
The query is only truncated if the document is entirely consumed and the sequence still exceeds the limit.
This ensures that if a query is short, the document can utilize the remaining space up to the total limit.
- max_length: int
Maximum length for the tokenizer (can be overridden by the model) default can be set by default using the hf config
- XPM Configxpmir.neural.huggingface.CrossEncoderModuleLoader(*, value, settings, encoder_path)[source]
Bases:
ModuleLoaderModuleLoader for cross-encoder models.
Saves the model in standard HuggingFace format (config.json + model.safetensors + tokenizer), which is directly loadable by sentence-transformers
CrossEncoder.- value: experimaestro.core.objects.config.Config
The configuration that will be serialized
- settings: experimaestro.core.objects.config.Config
Optional metadata (validation info, checkpoint epoch, etc.)
- encoder_path: path
Path to the encoder checkpoint directory
- XPM Configxpmir.neural.huggingface.InitCEFromHFID(*, model, fabric)[source]
Bases:
HFModelInitBaseLoad Cross-encoder weights from a HuggingFace Hub model ID. this is specific to this class: we need to ensure n_labels is 1. Uses
model.config.hf_idto resolve the model.- fabric: xpm_torch.configuration.FabricConfiguration
The fabric configuration to use for initialization. When set, model creation runs inside
fabric.init_module()so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html
- XPM Configxpmir.text.huggingface.base.HFConfig[source]
Bases:
ConfigBase configuration for HuggingFace models
- XPM Configxpmir.text.huggingface.base.HFConfigID(*, hf_id)[source]
Bases:
HFConfigConfiguration identified by a HuggingFace model ID
- XPM Configxpmir.text.huggingface.base.HFModelInitBase(*, model, fabric)[source]
Bases:
LightweightTask,ABCBase class for initializing HF models
- fabric: xpm_torch.configuration.FabricConfiguration
The fabric configuration to use for initialization. When set, model creation runs inside
fabric.init_module()so that parameters are allocated directly on the target device and dtype. See https://lightning.ai/docs/fabric/stable/advanced/model_init.html
Cross-encoders (Sentence-Transformers)
XPMIR also supports cross-encoders via the Sentence Transformers library. This is particularly useful for models that require specific chat templates or prompt-based ranking (like some LLM-based rankers) that are natively supported by Sentence-Transformers.