copied
Readme
Files and versions
Updated 3 years ago
text-embedding
Text Embedding with longformer
author: Kyle He
Desription
This operator uses Longformer to convert long text to embeddings.
The Longformer model was presented in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan[1].
Longformer models were proposed in “[Longformer: The Long-Document Transformer][2].
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer[2].
Reference
[2].https://arxiv.org/pdf/2004.05150.pdf
from towhee import ops
text_encoder = ops.text_embedding.longformer(model_name="allenai/longformer-base-4096")
text_embedding = text_encoder("Hello, world.")
Factory Constructor
Create the operator via the following factory method
ops.text_embedding.longformer(model_name)
Interface
A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements.
Parameters:
text: str
The text in string.
Returns: numpy.ndarray
The text embedding extracted by model.
Code Example
Use the pretrained model ('allenai/longformer-base-4096') to generate a text embedding for the sentence "Hello, world.".
Write the pipeline in simplified style:
import towhee.DataCollection as dc
dc.glob("Hello, world.")
.text_embedding.longformer('longformer-base-4096')
.show()
Write a same pipeline with explicit inputs/outputs name specifications:
from towhee import DataCollection as dc
dc.glob['text']('Hello, world.')
.text_embedding.longformer['text', 'vec']('longformer-base-4096')
.select('vec')
.show()
| 3 Commits | ||
---|---|---|---|
|
1.1 KiB
|
3 years ago | |
|
2.1 KiB
|
3 years ago | |
|
697 B
|
3 years ago | |
|
2.1 KiB
|
3 years ago | |
|
42 B
|
3 years ago |