Text Representation
The text module groups classes and configurations that compute a representation of text – this includes word embeddings as well as contextual word embeddings and document embeddings.
- XPM Configxpmir.text.encoders.Encoder[source]
Bases:
Module,EasyLogger,ABCSubmit type:
xpmir.text.encoders.EncoderBase class for all word and text encoders
- XPM Configxpmir.text.encoders.TokensEncoder[source]
-
Submit type:
xpmir.text.encoders.TokensEncoder(deprecated) Represent a text as a sequence of token representations
Tokenizers
- XPM Configxpmir.text.tokenizers.Tokenizer[source]
Bases:
ConfigSubmit type:
xpmir.text.tokenizers.TokenizerRepresents a vocabulary and a tokenization method
Deprecated: Use TokenizerBase instead
- XPM Configxpmir.text.tokenizers.TokenizerBase[source]
Bases:
Config,Initializable,Generic[TokenizerInput,TokenizerOutput],ABCSubmit type:
xpmir.text.tokenizers.TokenizerBaseBase tokenizer
Text Encoders
- XPM Configxpmir.text.encoders.TextEncoderBase[source]
Bases:
Encoder,Generic[InputType,EncoderOutput]Submit type:
xpmir.text.encoders.TextEncoderBaseBase class for all text encoders
- XPM Configxpmir.text.encoders.TextEncoder[source]
Bases:
TextEncoderBase[str, torch.Tensor]Submit type:
xpmir.text.encoders.TextEncoderEncodes a text into a vector
Deprecated since version 1.3: Use TextEncoderBase directly
- XPM Configxpmir.text.encoders.DualTextEncoder[source]
Bases:
TextEncoderBase[Tuple[str,str], torch.Tensor]Submit type:
xpmir.text.encoders.DualTextEncoderEncodes a pair of text into a vector
Deprecated since version 1.3: Use TextEncoderBase directly
- XPM Configxpmir.text.encoders.TripletTextEncoder[source]
Bases:
TextEncoderBase[Tuple[str,str,str], torch.Tensor]Submit type:
xpmir.text.encoders.TripletTextEncoderEncodes a triplet of text into a vector
Deprecated since version 1.3: Use TextEncoderBase directly
This is used in models such as DuoBERT where we encode (query, positive, negative) triplets.
- XPM Configxpmir.text.encoders.TokenizedTextEncoderBase[source]
Bases:
TextEncoderBase[InputType,EncoderOutput]Submit type:
xpmir.text.encoders.TokenizedTextEncoderBase
- XPM Configxpmir.text.encoders.TokenizedEncoder[source]
Bases:
Encoder,Generic[EncoderOutput,TokenizerOutput]Submit type:
xpmir.text.encoders.TokenizedEncoderEncodes a tokenized text into a vector
Tokenizer-based encoders
- XPM Configxpmir.text.encoders.TokenizedTextEncoder(*, tokenizer, encoder)[source]
Bases:
TokenizedTextEncoderBase[InputType,EncoderOutput],Generic[InputType,EncoderOutput,TokenizerOutput]Submit type:
xpmir.text.encoders.TokenizedTextEncoderEncodes a tokenizer input into a vector
This pipelines two objects:
A tokenizer that segments the text;
An encoder that returns a representation of the tokens in a vector space
- tokenizer: xpmir.text.tokenizers.TokenizerBase[InputType, TokenizerOutput]
- encoder: xpmir.text.encoders.TokenizedEncoder[TokenizerOutput, EncoderOutput]
Adapters
- XPM Configxpmir.text.adapters.MeanTextEncoder(*, encoder)[source]
Bases:
TokenizedTextEncoderBase[InputType,RepresentationOutput]Submit type:
xpmir.text.adapters.MeanTextEncoderReturns the mean of the word embeddings
- encoder: xpmir.text.encoders.TokenizedTextEncoderBase[InputType, xpmir.text.encoders.RepresentationOutput]
- XPM Configxpmir.text.adapters.TopicTextConverter[source]
Bases:
Converter[Record,str]Submit type:
xpmir.text.adapters.TopicTextConverterExtracts the text from a topic