data2vec/README.md

# Text Embedding with data2vec

*author: David Wang*


<br />


## Description

This operator extracts features for text with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

<br />


## Code Example

Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".

*Write a pipeline with explicit inputs/outputs name specifications:

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('text')
        .map('text', 'vec', ops.text_embedding.data2vec(model_name='facebook/data2vec-text-base'))
        .output('text', 'vec')
)

DataCollection(p('Hello, world.')).show()
```
<img src="./result.png" width="800px"/>


<br />


## Factory Constructor

Create the operator via the following factory method

***data2vec(model_name='facebook/data2vec-text-base')***

**Parameters:**

  ***model_name***: *str*

The model name in string.
The default value is "facebook/data2vec-text-base". 

Supported model name:
- facebook/data2vec-text-base

<br />


## Interface


**Parameters:**

	***text:*** *str*

	The text in string.


**Returns:** *numpy.ndarray*

   The text embedding extracted by model.
type fix. 3 years ago			`# Text Embedding with data2vec`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`author: David Wang`


			`<br />`



			`## Description`

			`This operator extracts features for text with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.`

			`<br />`


			`## Code Example`

			`Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".`

update the readme. Signed-off-by: wxywb <xy.wang@zilliz.com> 2 years ago			`*Write a pipeline with explicit inputs/outputs name specifications:`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			```python
remove dc2 Signed-off-by: wxywb <xy.wang@zilliz.com> 2 years ago			`from towhee import pipe, ops, DataCollection`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
update the readme. Signed-off-by: wxywb <xy.wang@zilliz.com> 2 years ago			`p = (`
			`pipe.input('text')`
			`.map('text', 'vec', ops.text_embedding.data2vec(model_name='facebook/data2vec-text-base'))`
			`.output('text', 'vec')`
			`)`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
update the readme. Signed-off-by: wxywb <xy.wang@zilliz.com> 2 years ago			`DataCollection(p('Hello, world.')).show()`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			```
update the readme. Signed-off-by: wxywb <xy.wang@zilliz.com> 2 years ago			`<img src="./result.png" width="800px"/>`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago

			`<br />`



			`## Factory Constructor`

			`Create the operator via the following factory method`

update the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`*data2vec(model_name='facebook/data2vec-text-base')*`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
update the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`Parameters:`

			`*model_name: str*`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
update the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`The model name in string.`
			`The default value is "facebook/data2vec-text-base".`

			`Supported model name:`
			`- facebook/data2vec-text-base`

			`<br />`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago

			`## Interface`


			`Parameters:`

			`*text:* str`

			`The text in string.`



			`Returns: numpy.ndarray`

			`The text embedding extracted by model.`


Initial commit 3 years ago