# Text Embedding with data2vec
*author: David Wang*
## Description
This operator extracts features for text with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.
## Code Example
Use the pre-trained model to generate a text embedding for the sentence "Hello, world.".
*Write a pipeline with explicit inputs/outputs name specifications:
```python
from towhee import pipe, ops, DataCollection
p = (
pipe.input('text')
.map('text', 'vec', ops.text_embedding.data2vec(model_name='facebook/data2vec-text-base'))
.output('text', 'vec')
)
DataCollection(p('Hello, world.')).show()
```
## Factory Constructor
Create the operator via the following factory method
***data2vec(model_name='facebook/data2vec-text-base')***
**Parameters:**
***model_name***: *str*
The model name in string.
The default value is "facebook/data2vec-text-base".
Supported model name:
- facebook/data2vec-text-base
## Interface
**Parameters:**
***text:*** *str*
The text in string.
**Returns:** *numpy.ndarray*
The text embedding extracted by model.
# More Resources
- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.
- [The guide to text-embedding-ada-002 model | OpenAI](https://zilliz.com/ai-models/text-embedding-ada-002): text-embedding-ada-002: OpenAI's legacy text embedding model; average price/performance compared to text-embedding-3-large and text-embedding-3-small.
- [Sentence Transformers for Long-Form Text - Zilliz blog](https://zilliz.com/learn/Sentence-Transformers-for-Long-Form-Text): Deep diving into modern transformer-based embeddings for long-form text.
- [Massive Text Embedding Benchmark (MTEB)](https://zilliz.com/glossary/massive-text-embedding-benchmark-(mteb)): A standardized way to evaluate text embedding models across a range of tasks and languages, leading to better text embedding models for your app
- [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models): Register for a free webinar diving into text embedding models in a presentation and tutorial
- [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models/success): Register for a free webinar diving into text embedding models in a presentation and tutorial
- [The guide to text-embedding-3-small | OpenAI](https://zilliz.com/ai-models/text-embedding-3-small): text-embedding-3-small: OpenAIâs small text embedding model optimized for accuracy and efficiency with a lower cost.
- [Evaluating Your Embedding Model - Zilliz blog](https://zilliz.com/learn/evaluating-your-embedding-model): Review some practical examples to evaluate different text embedding models.