|
|
|
# Sentence Embedding with OpenAI
|
|
|
|
|
|
|
|
*author: Junjie, Jael*
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
|
|
A sentence embedding operator generates one embedding vector in ndarray for each input text.
|
|
|
|
The embedding represents the semantic information of the whole input text as one vector.
|
|
|
|
This operator is implemented with embedding models from [OpenAI](https://platform.openai.com/docs/guides/embeddings).
|
|
|
|
Please note you need an [OpenAI API key](https://platform.openai.com/account/api-keys) to access OpenAI.
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Code Example
|
|
|
|
|
|
|
|
Use the pre-trained model ''
|
|
|
|
to generate an embedding for the sentence "Hello, world.".
|
|
|
|
|
|
|
|
*Write a pipeline with explicit inputs/outputs name specifications:*
|
|
|
|
|
|
|
|
```python
|
|
|
|
from towhee import pipe, ops, DataCollection
|
|
|
|
|
|
|
|
p = (
|
|
|
|
pipe.input('text')
|
|
|
|
.map('text', 'vec',
|
|
|
|
ops.sentence_embedding.openai(model_name='text-embedding-ada-002', api_key=OPENAI_API_KEY))
|
|
|
|
.output('text', 'vec')
|
|
|
|
)
|
|
|
|
|
|
|
|
DataCollection(p('Hello, world.')).show()
|
|
|
|
```
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Factory Constructor
|
|
|
|
|
|
|
|
Create the operator via the following factory method:
|
|
|
|
|
|
|
|
***sentence_embedding.openai(model_name='text-embedding-ada-002')***
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***model_name***: *str*
|
|
|
|
|
|
|
|
The model name in string, defaults to 'text-embedding-ada-002'. Supported model names:
|
|
|
|
- text-embedding-ada-002
|
|
|
|
- text-similarity-davinci-001
|
|
|
|
- text-similarity-curie-001
|
|
|
|
- text-similarity-babbage-001
|
|
|
|
- text-similarity-ada-001
|
|
|
|
|
|
|
|
***api_key***: *str=None*
|
|
|
|
|
|
|
|
The OpenAI API key in string, defaults to None.
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
|
|
|
The operator takes a piece of text in string as input.
|
|
|
|
It returns a text emabedding in numpy.ndarray.
|
|
|
|
|
|
|
|
***\_\_call\_\_(txt)***
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***text***: *str*
|
|
|
|
|
|
|
|
The text in string.
|
|
|
|
|
|
|
|
**Returns**:
|
|
|
|
|
|
|
|
*numpy.ndarray or list*
|
|
|
|
|
|
|
|
The text embedding extracted by model.
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
***supported_model_names()***
|
|
|
|
|
|
|
|
Get a list of supported model names.
|
|
|
|
|
|
|
|
|