Learning to Rank
This section covers the learning-to-rank (LTR) components: scorers that assign relevance scores to query-document pairs, samplers that produce training data, and trainers that optimise model parameters. XPMIR supports pointwise, pairwise, batchwise, and distillation training strategies.
Scorers
Scorers assign a relevance score to a (query, document) pair.
AbstractModuleScorer is the base class for
scorers with learnable parameters (neural models).
- XPM Configxpmir.rankers.scorer.Scorer(*, doc, bibtex)[source]
Bases:
Config,Initializable,EasyLogger,ABCQuery-document scorer
A model able to give a score to a list of documents given a query
- abstractmethod compute(topic: IDTextRecord, documents: Iterable[ScoredDocument]) List[ScoredDocument][source]
Score all documents with respect to a single topic.
This method should be implemented by subclasses to provide the actual scoring logic. It is query-atomic (processes one query at a time).
- getRetriever(retriever: Retriever, batch_size: int, top_k=None, device=None)[source]
Returns a two stage re-ranker from this retriever and a scorer
- Parameters:
device – Device for the ranker or None if no change should be made
batch_size – The number of documents in each batch
top_k – Number of documents to re-rank (or None for all)
- initialize(*args, **kwargs)
Main initialization
Calls
__initialize__()once (using__initialize__())
- rsv(topic: str | IDTextRecord, documents: List[ScoredDocument] | ScoredDocument | str | List[str]) List[ScoredDocument][source]
Compute the Retrieval Status Value (RSV) for a query and a set of documents.
This method is the primary entry point for scoring a set of documents against a single query. It handles input normalization and delegates to the
compute()method.Note
For large-scale evaluation involving multiple queries, using
Retriever.retrieve_all()via aTwoStageRetrieveris preferred as it allows for cross-query batching on GPUs.
- XPM Configxpmir.rankers.scorer.RandomScorer(*, doc, bibtex, random)[source]
Bases:
ScorerA random scorer
- random: xpm_torch.base.Random
The random number generator
- XPM Configxpmir.rankers.scorer.AbstractModuleScorer(*, doc, bibtex)[source]
-
Base class for all torch-based Modules implementing the xpmir.rankers.Scorer.
While
compute()(inherited fromScorer) processes documents for a single query,AbstractModuleScoreralso supports cross-query batching when called directly through its forward method (aliased as __call__).When used in a
TwoStageRetrieverwith a batchsize > 0, the retriever will use thePointwiseItemsbatching to maximize GPU utilization across multiple queries.
- xpmir.rankers.scorer_retriever(documents: Documents, *, retrievers: RetrieverFactory, scorer: Scorer, key: str = None, **kwargs)[source]
Helper function that returns a two stage retriever. This is useful when used with partial (when the scorer is not known).
- Parameters:
documents – The document collection
retrievers – A retriever factory
scorer – The scorer
- Returns:
A retriever, calling the :meth:scorer.getRetriever
Retrievers from scorers
Scorers can be wrapped as retrievers through a
TwoStageRetriever (see Retrieval).
Naming conventions
The project uses consistent naming for data objects at different layers:
Records – Low-level data structures (e.g.
IDTextRecord,ScoreRecord). Implemented asTypedDictfor raw data or identifiers.Samples – Data-layer objects (e.g.
PairwiseSample). Found in datamaestro; represent raw containers, possibly non-hydrated.Items – Model-ready objects (e.g.
PointwiseItem,PairwiseItem). Hydrated objects used in the training loop, ready to be converted into tensors.
Samplers
Samplers generate model-ready items from a dataset and a scorer (used for hard-negative mining or scoring).
- XPM Configxpmir.letor.samplers.ModelBasedSampler(*, dataset, retriever)[source]
Bases:
SamplerBase class for retriever-based sampler
- dataset: datamaestro_ir.data.Adhoc
The IR adhoc dataset
- retriever: xpmir.rankers.retriever.Retriever
A retriever to sample negative documents
Training items
Data classes representing training instances at different granularities.
- class xpmir.letor.records.BatchwiseItems(iterable=(), /)[source]
Bases:
BaseItemsSeveral documents (with associated [pseudo]relevance) per query
Assumes that the number of documents per query is always the same (even though documents themselves can be different)
- class xpmir.letor.records.ListwiseItem(query: QueryT, documents: List[DocT])[source]
Bases:
SampleItem[DocT,QueryT]A listwise Item is a generic data class composed of a query and a list of documents
Document samplers
Samplers that produce documents (without queries). Useful for pre-training objectives or for learning index parameters (e.g. FAISS quantisers).
- XPM Configxpmir.documents.samplers.DocumentSampler(*, documents)[source]
-
How to sample from a document store
- documents: datamaestro_ir.data.DocumentStore
- XPM Configxpmir.documents.samplers.HeadDocumentSampler(*, documents, max_count, max_ratio)[source]
Bases:
DocumentSamplerA basic sampler that iterates over the first documents
if max_count is 0, it iterates over all documents
- documents: datamaestro_ir.data.DocumentStore
- XPM Configxpmir.documents.samplers.RandomDocumentSampler(*, documents, max_count, max_ratio, random)[source]
Bases:
DocumentSamplerA basic sampler that iterates over the first documents
Either max_count or max_ratio should be non null
- documents: datamaestro_ir.data.DocumentStore
- random: xpm_torch.base.Random
Random sampler
Sample adapters
Transforms applied to samples before they reach the model (e.g. hydrating document text from a store, adding query prefixes).
- XPM Configxpmir.letor.samplers.hydrators.SampleHydrator(*, documentstore, querystore)[source]
Bases:
SampleTransformBase class for document/topic hydrators (deprecated: use StoreHydrator + SamplerAdapter)
- documentstore: datamaestro_ir.data.DocumentStore
The store for document texts if needed
- querystore: xpmir.datasets.adapters.TextStore
The store for query texts if needed
- XPM Configxpmir.letor.samplers.hydrators.SamplePrefixAdding(*, query_prefix, document_prefix)[source]
Bases:
SampleTransformTransform the query and documents by adding the prefix
- XPM Configxpmir.letor.samplers.hydrators.SampleTransformList(*, adapters)[source]
Bases:
SampleTransformA class which group a list of sample transforms
- adapters: List[xpmir.letor.samplers.hydrators.SampleTransform]
The list of sample transform to be applied