# BERT Text Embedding Operator (Pytorch)

Authors: Kyle He

## Overview

This operator transforms text into embedding using BERT[1], which stands for 
Bidirectional Encoder Representations from Transformers. 

## Interface

```python
__call__(self, text: str)
```

**Args:**

- audio_path:
  - the text to be embedded
  - supported types: str

**Returns:**

The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:

- embs:
  - embeddings of the text
  - data type: `numpy.ndarray`
  - shape: 768

## Requirements

You can get the required python package by [requirements.txt](./requirements.txt).

## How it works

The `towhee/torch-bert` Operator is based on Huggingface[2].

## Reference

[1]. https://arxiv.org/pdf/1810.04805.pdf

[2]. https://huggingface.co/docs/transformers

# More Resources

- [The guide to text-embedding-ada-002 model  | OpenAI](https://zilliz.com/ai-models/text-embedding-ada-002): text-embedding-ada-002: OpenAI's legacy text embedding model; average price/performance compared to text-embedding-3-large and text-embedding-3-small.
- [Sentence Transformers for Long-Form Text - Zilliz blog](https://zilliz.com/learn/Sentence-Transformers-for-Long-Form-Text): Deep diving into modern transformer-based embeddings for long-form text.
- [What is BERT (Bidirectional Encoder Representations from Transformers)? - Zilliz blog](https://zilliz.com/learn/what-is-bert): Learn what Bidirectional Encoder Representations from Transformers (BERT) is and how it uses pre-training and fine-tuning to achieve its remarkable performance.
- [Training Your Own Text Embedding Model - Zilliz blog](https://zilliz.com/learn/training-your-own-text-embedding-model): Explore how to train your text embedding model using the `sentence-transformers` library and generate our training data by leveraging a pre-trained LLM.
- [The guide to gte-base-en-v1.5 | Alibaba](https://zilliz.com/ai-models/gte-base-en-v1.5): gte-base-en-v1.5: specialized for English text; Built upon the transformer++ encoder backbone (BERT + RoPE + GLU)
- [Training Text Embeddings with Jina AI  - Zilliz blog](https://zilliz.com/blog/training-text-embeddings-with-jina-ai): In a recent talk by Bo Wang, he discussed the creation of Jina text embeddings for modern vector search and RAG systems. He also shared methodologies for training embedding models that effectively encode extensive information, along with guidance o