Sentence Embedding with OpenAI

author: Junjie, Jael

Description

A sentence embedding operator generates one embedding vector in ndarray for each input text. The embedding represents the semantic information of the whole input text as one vector. This operator is implemented with embedding models from OpenAI. Please note you need an OpenAI API key to access OpenAI.

Code Example

Use the pre-trained model '' to generate an embedding for the sentence "Hello, world.".

Write a pipeline with explicit inputs/outputs name specifications:

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('text')
        .map('text', 'vec', 
             ops.sentence_embedding.openai(model_name='text-embedding-ada-002', api_key=OPENAI_API_KEY))
        .output('text', 'vec')
)

DataCollection(p('Hello, world.')).show()

Factory Constructor

Create the operator via the following factory method:

sentence_embedding.openai(model_name='text-embedding-ada-002')

Parameters:

model_name: str

The model name in string, defaults to 'text-embedding-ada-002'. Supported model names:

text-embedding-ada-002
text-similarity-davinci-001
text-similarity-curie-001
text-similarity-babbage-001
text-similarity-ada-001

api_key: str=None

The OpenAI API key in string, defaults to None.

Interface

The operator takes a piece of text in string as input. It returns a text emabedding in numpy.ndarray.

__call__(txt)

Parameters:

text: str

The text in string.

Returns:

numpy.ndarray or list

The text embedding extracted by model.

supported_model_names()

Get a list of supported model names.

Jael Gu 5f42987634 remove dc2 Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			4 Commits
.gitattributes	1.1 KiB	Initial commit	4 years ago
README.md	1.8 KiB	remove dc2	3 years ago
__init__.py	118 B	Add files	4 years ago
openai_embedding.py	1.6 KiB	Add files	4 years ago
requirements.txt	7 B	Add files	4 years ago