# Text Embedding with data2vec

*author: David Wang*


<br />


## Description

This operator extracts features for text with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

<br />


## Code Example

Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".

*Write a pipeline with explicit inputs/outputs name specifications:

```python
from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('text')
        .map('text', 'vec', ops.text_embedding.data2vec(model_name='facebook/data2vec-text-base'))
        .output('text', 'vec')
)

DataCollection(p('Hello, world.')).show()
```
<img src="./result.png" width="800px"/>


<br />


## Factory Constructor

Create the operator via the following factory method

***data2vec(model_name='facebook/data2vec-text-base')***

**Parameters:**

​  ***model_name***: *str*

The model name in string.
The default value is "facebook/data2vec-text-base". 

Supported model name:
- facebook/data2vec-text-base

<br />


## Interface


**Parameters:**

​	***text:*** *str*

​	The text in string.


**Returns:** *numpy.ndarray*

​   The text embedding extracted by model.