towhee
/
bert-embedding
copied
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions
Updated 7 months ago
towhee
BERT Text Embedding Operator (Pytorch)
Authors: Kyle He
Overview
This operator transforms text into embedding using BERT[1], which stands for Bidirectional Encoder Representations from Transformers.
Interface
__call__(self, text: str)
Args:
- audio_path:
- the text to be embedded
- supported types: str
Returns:
The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:
- embs:
- embeddings of the text
- data type:
numpy.ndarray
- shape: 768
Requirements
You can get the required python package by requirements.txt.
How it works
The towhee/torch-bert
Operator is based on Huggingface[2].
Reference
[1]. https://arxiv.org/pdf/1810.04805.pdf
[2]. https://huggingface.co/docs/transformers
More Resources
- The guide to text-embedding-ada-002 model | OpenAI: text-embedding-ada-002: OpenAI's legacy text embedding model; average price/performance compared to text-embedding-3-large and text-embedding-3-small.
- Sentence Transformers for Long-Form Text - Zilliz blog: Deep diving into modern transformer-based embeddings for long-form text.
- What is BERT (Bidirectional Encoder Representations from Transformers)? - Zilliz blog: Learn what Bidirectional Encoder Representations from Transformers (BERT) is and how it uses pre-training and fine-tuning to achieve its remarkable performance.
- Training Your Own Text Embedding Model - Zilliz blog: Explore how to train your text embedding model using the
sentence-transformers
library and generate our training data by leveraging a pre-trained LLM. - The guide to gte-base-en-v1.5 | Alibaba: gte-base-en-v1.5: specialized for English text; Built upon the transformer++ encoder backbone (BERT + RoPE + GLU)
- Training Text Embeddings with Jina AI - Zilliz blog: In a recent talk by Bo Wang, he discussed the creation of Jina text embeddings for modern vector search and RAG systems. He also shared methodologies for training embedding models that effectively encode extensive information, along with guidance o
| 4 Commits | ||
---|---|---|---|
|
2.4 KiB
|
7 months ago | |
|
592 B
|
3 years ago | |
|
32 B
|
3 years ago | |
|
2.0 KiB
|
3 years ago |