Text Representation
The text module groups classes and configurations that compute a representation of text – this includes word embeddings as well as contextual word embeddings and document embeddings.
- XPM Configxpmir.text.encoders.Encoder[source]
Bases:
Module
,EasyLogger
,ABC
Submit type:
xpmir.text.encoders.Encoder
Base class for all word and text encoders
- XPM Configxpmir.text.encoders.TokensEncoder[source]
-
Submit type:
xpmir.text.encoders.TokensEncoder
(deprecated) Represent a text as a sequence of token representations
Tokenizers
- XPM Configxpmir.text.tokenizers.Tokenizer[source]
Bases:
Config
Submit type:
xpmir.text.tokenizers.Tokenizer
Represents a vocabulary and a tokenization method
Deprecated: Use TokenizerBase instead
- XPM Configxpmir.text.tokenizers.TokenizerBase[source]
Bases:
Config
,Initializable
,Generic
[TokenizerInput
,TokenizerOutput
],ABC
Submit type:
xpmir.text.tokenizers.TokenizerBase
Base tokenizer
Text Encoders
- XPM Configxpmir.text.encoders.TextEncoderBase[source]
Bases:
Encoder
,Generic
[InputType
,EncoderOutput
]Submit type:
xpmir.text.encoders.TextEncoderBase
Base class for all text encoders
- XPM Configxpmir.text.encoders.TextEncoder[source]
Bases:
TextEncoderBase
[str
, torch.Tensor]Submit type:
xpmir.text.encoders.TextEncoder
Encodes a text into a vector
Deprecated since version 1.3: Use TextEncoderBase directly
- XPM Configxpmir.text.encoders.DualTextEncoder[source]
Bases:
TextEncoderBase
[Tuple
[str
,str
], torch.Tensor]Submit type:
xpmir.text.encoders.DualTextEncoder
Encodes a pair of text into a vector
Deprecated since version 1.3: Use TextEncoderBase directly
- XPM Configxpmir.text.encoders.TripletTextEncoder[source]
Bases:
TextEncoderBase
[Tuple
[str
,str
,str
], torch.Tensor]Submit type:
xpmir.text.encoders.TripletTextEncoder
Encodes a triplet of text into a vector
Deprecated since version 1.3: Use TextEncoderBase directly
This is used in models such as DuoBERT where we encode (query, positive, negative) triplets.
- XPM Configxpmir.text.encoders.TokenizedTextEncoderBase[source]
Bases:
TextEncoderBase
[InputType
,EncoderOutput
]Submit type:
xpmir.text.encoders.TokenizedTextEncoderBase
- XPM Configxpmir.text.encoders.TokenizedEncoder[source]
Bases:
Encoder
,Generic
[EncoderOutput
,TokenizerOutput
]Submit type:
xpmir.text.encoders.TokenizedEncoder
Encodes a tokenized text into a vector
Tokenizer-based encoders
- XPM Configxpmir.text.encoders.TokenizedTextEncoder(*, tokenizer, encoder)[source]
Bases:
TokenizedTextEncoderBase
[InputType
,EncoderOutput
],Generic
[InputType
,EncoderOutput
,TokenizerOutput
]Submit type:
xpmir.text.encoders.TokenizedTextEncoder
Encodes a tokenizer input into a vector
This pipelines two objects:
A tokenizer that segments the text;
An encoder that returns a representation of the tokens in a vector space
- tokenizer: xpmir.text.tokenizers.TokenizerBase[InputType, TokenizerOutput]
- encoder: xpmir.text.encoders.TokenizedEncoder[TokenizerOutput, EncoderOutput]
Adapters
- XPM Configxpmir.text.adapters.MeanTextEncoder(*, encoder)[source]
Bases:
TokenizedTextEncoderBase
[InputType
,RepresentationOutput
]Submit type:
xpmir.text.adapters.MeanTextEncoder
Returns the mean of the word embeddings
- encoder: xpmir.text.encoders.TokenizedTextEncoderBase[InputType, xpmir.text.encoders.RepresentationOutput]
- XPM Configxpmir.text.adapters.TopicTextConverter[source]
Bases:
Converter
[Record
,str
]Submit type:
xpmir.text.adapters.TopicTextConverter
Extracts the text from a topic