logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 3 years ago

text-embedding

Text Embedding with data2vec

author: David Wang


Description

This operator extracts features for text with data2vec. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.


Code Example

Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".

Write the pipeline in simplified style:

import towhee

towhee.dc(["Hello, world."]) \
      .text_embedding.data2vec() \
      .show()


Factory Constructor

Create the operator via the following factory method

data2vec(model_name='facebook/data2vec-text-base')

Parameters:

model_name: str

The model name in string. The default value is "facebook/data2vec-text-base".

Supported model name:

  • facebook/data2vec-text-base


Interface

Parameters:

text: str

​ The text in string.

Returns: numpy.ndarray

​ The text embedding extracted by model.

wxywb fbd79d484e update the operator. 6 Commits
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 3 years ago
file-icon README.md
1.1 KiB
download-icon
update the operator. 3 years ago
file-icon __init__.py
730 B
download-icon
change data2vec_text to data2vec. 3 years ago
file-icon data2vec_text.py
1.1 KiB
download-icon
update the operator. 3 years ago
file-icon main.py
0 B
download-icon
init the operator. 3 years ago
file-icon requirements.txt
39 B
download-icon
change data2vec_text to data2vec. 3 years ago