# Text Embedding with Realm *author: Jael Gu*
## Description A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator uses the REALM model, which is a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks. [1] The original model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.[2] ### References [1].https://huggingface.co/docs/transformers/model_doc/realm [2].https://arxiv.org/abs/2002.08909
## Code Example Use the pre-trained model "google/realm-cc-news-pretrained-embedder" to generate a text embedding for the sentence "Hello, world.". *Write the pipeline*: ```python from towhee import pipe, ops, DataCollection p = ( pipe.input('text') .map('text', 'vec', ops.text_embedding.realm(model_name="google/realm-cc-news-pretrained-embedder")) .output('text', 'vec') ) DataCollection(p('Hello, world.')).show() ```
## Factory Constructor Create the operator via the following factory method: ***text_embedding.transformers(model_name="google/realm-cc-news-pretrained-embedder")*** **Parameters:** ***model_name***: *str* The model name in string. The default value is "google/realm-cc-news-pretrained-embedder". Supported model name: - google/realm-cc-news-pretrained-embedder
## Interface The operator takes a piece of text in string as input. It loads tokenizer and pre-trained model using model name and then return text embedding in ndarray. **Parameters:** ***text***: *str* The text in string. **Returns**: ​ *numpy.ndarray* ​ The text embedding extracted by model. # More Resources - [The guide to text-embedding-ada-002 model | OpenAI](https://zilliz.com/ai-models/text-embedding-ada-002): text-embedding-ada-002: OpenAI's legacy text embedding model; average price/performance compared to text-embedding-3-large and text-embedding-3-small. - [Massive Text Embedding Benchmark (MTEB)](https://zilliz.com/glossary/massive-text-embedding-benchmark-(mteb)): A standardized way to evaluate text embedding models across a range of tasks and languages, leading to better text embedding models for your app - [The guide to mistral-embed | Mistral AI](https://zilliz.com/ai-models/mistral-embed): mistral-embed: a specialized embedding model for text data with a context window of 8,000 tokens. Optimized for similarity retrieval and RAG applications. - [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models): Register for a free webinar diving into text embedding models in a presentation and tutorial - [Tutorial: Diving into Text Embedding Models | Zilliz Webinar](https://zilliz.com/event/tutorial-text-embedding-models/success): Register for a free webinar diving into text embedding models in a presentation and tutorial - [The guide to text-embedding-3-small | OpenAI](https://zilliz.com/ai-models/text-embedding-3-small): text-embedding-3-small: OpenAI’s small text embedding model optimized for accuracy and efficiency with a lower cost. - [The guide to voyage-large-2 | Voyage AI](https://zilliz.com/ai-models/voyage-large-2): voyage-large-2: general-purpose text embedding model; optimized for retrieval quality; ideal for tasks like summarization, clustering, and classification. - [Training Text Embeddings with Jina AI - Zilliz blog](https://zilliz.com/blog/training-text-embeddings-with-jina-ai): In a recent talk by Bo Wang, he discussed the creation of Jina text embeddings for modern vector search and RAG systems. He also shared methodologies for training embedding models that effectively encode extensive information, along with guidance o