Text Representation

The text module groups classes and configurations that compute a representation of text – this includes word embeddings as well as contextual word embeddings and document embeddings.

XPM Configxpmir.text.Vocab

Bases: experimaestro.core.objects.Config, xpmir.utils.EasyLogger, TorchModule

Represents a vocabulary and corresponding neural encoding technique (e.g., embedding). This class can also handle the case of a cross-encoding of the query-document couple (e.g. BERT with [SEP]).

XPM Configxpmir.text.encoders.TextEncoder

Bases: xpmir.text.encoders.Encoder

Vector representation of a text - can be dense or sparse

forward(texts: List[str]) <Mock name='mock.Tensor' id='139719205112128'>

Returns a matrix encoding the provided texts

XPM Configxpmir.text.encoders.DualTextEncoder

Bases: xpmir.text.encoders.Encoder

Dense representation for a pair of text

This is used for instance in the case of BERT models that represent the (query, document couple)

forward(texts: List[Tuple[str, str]])

Computes the representation of a list of pair of texts