logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 7 months ago

text-embedding

Text Embedding with data2vec

author: David Wang


Description

This operator extracts features for text with data2vec. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.


Code Example

Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".

*Write a pipeline with explicit inputs/outputs name specifications:

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('text')
        .map('text', 'vec', ops.text_embedding.data2vec(model_name='facebook/data2vec-text-base'))
        .output('text', 'vec')
)

DataCollection(p('Hello, world.')).show()


Factory Constructor

Create the operator via the following factory method

data2vec(model_name='facebook/data2vec-text-base')

Parameters:

model_name: str

The model name in string. The default value is "facebook/data2vec-text-base".

Supported model name:

  • facebook/data2vec-text-base


Interface

Parameters:

text: str

​ The text in string.

Returns: numpy.ndarray

​ The text embedding extracted by model.

More Resources

Jael Gu ac0e1e1e16 Update README 12 Commits
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 3 years ago
file-icon README.md
3.8 KiB
download-icon
Update README 7 months ago
file-icon __init__.py
730 B
download-icon
change data2vec_text to data2vec. 3 years ago
file-icon data2vec_text.py
1.1 KiB
download-icon
update the operator. 3 years ago
file-icon requirements.txt
39 B
download-icon
change data2vec_text to data2vec. 3 years ago
file-icon result.png
19 KiB
download-icon
update the readme. 2 years ago