Text Representation
The text module groups classes and configurations that compute a representation of text – this includes word embeddings as well as contextual word embeddings and document embeddings.
- XPM Configxpmir.text.encoders.Tokenizer[source]
Bases:
Config
Represents a vocabulary and a tokenization method
- XPM Configxpmir.text.encoders.TokensEncoder[source]
-
Represent a text as a sequence of token representations
- XPM Configxpmir.text.encoders.Encoder[source]
Bases:
Module
,EasyLogger
Base class for all word and text encoders
- XPM Configxpmir.text.encoders.MeanTextEncoder(*, encoder)[source]
Bases:
TextEncoder
Returns the mean of the word embeddings
- encoder: xpmir.text.encoders.TokensEncoder
- XPM Configxpmir.text.encoders.TripletTextEncoder[source]
Bases:
Encoder
The generic class for triplet encoders: query-document-document
This encoding is used in models such as DuoBERT that compute whether a pair is preferred to another
- XPM Configxpmir.text.encoders.TextEncoder[source]
Bases:
Encoder
Vectorial representation of a text - can be dense or sparse