# Text Embedding with data2vec

*author: David Wang*


<br />


## Description

This operator extracts features for text with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

<br />


## Code Example

Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".

*Write a pipeline with explicit inputs/outputs name specifications:

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('text')
        .map('text', 'vec', ops.text_embedding.data2vec(model_name='facebook/data2vec-text-base'))
        .output('text', 'vec')
)

DataCollection(p('Hello, world.')).show()
```
<img src="./result.png" width="800px"/>


<br />


## Factory Constructor

Create the operator via the following factory method

***data2vec(model_name='facebook/data2vec-text-base')***

**Parameters:**

​  ***model_name***: *str*

The model name in string.
The default value is "facebook/data2vec-text-base". 

Supported model name:
- facebook/data2vec-text-base

<br />


## Interface


**Parameters:**

​	***text:*** *str*

​	The text in string.


**Returns:** *numpy.ndarray*

​   The text embedding extracted by model.


# More Resources

- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.
- [The guide to text-embedding-ada-002 model  | OpenAI](https://zilliz.com/ai-models/text-embedding-ada-002): text-embedding-ada-002: OpenAI's legacy text embedding model; average price/performance compared to text-embedding-3-large and text-embedding-3-small.
- [Sentence Transformers for Long-Form Text - Zilliz blog](https://zilliz.com/learn/Sentence-Transformers-for-Long-Form-Text): Deep diving into modern transformer-based embeddings for long-form text.
- [Massive Text Embedding Benchmark (MTEB)](https://zilliz.com/glossary/massive-text-embedding-benchmark-(mteb)): A standardized way to evaluate text embedding models across a range of tasks and languages, leading to better text embedding models for your app
- [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models): Register for a free webinar diving into text embedding models in a presentation and tutorial
- [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models/success): Register for a free webinar diving into text embedding models in a presentation and tutorial
- [The guide to text-embedding-3-small | OpenAI](https://zilliz.com/ai-models/text-embedding-3-small): text-embedding-3-small: OpenAIâs small text embedding model optimized for accuracy and efficiency with a lower cost.
- [Evaluating Your Embedding Model - Zilliz blog](https://zilliz.com/learn/evaluating-your-embedding-model): Review some practical examples to evaluate different text embedding models.