Readme

Files and versions

2.5 KiB

Raw Blame History

Sentence Embedding with Sentence Transformers

Description

This operator takes a sentence or a list of sentences in string as input. It generates an embedding vector in numpy.ndarray for each sentence, which captures the input sentence's core semantic elements. This operator is implemented with pre-trained models from Sentence Transformers.

Code Example

Use the pre-trained model "all-MiniLM-L12-v2" to generate a text embedding for the sentence "This is a sentence.".

Write a pipeline with explicit inputs/outputs name specifications:

from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('sentence')
        .map('sentence', 'vec', ops.sentence_embedding.sbert(model_name='all-MiniLM-L12-v2'))
        .output('sentence', 'vec')
)

DataCollection(p('This is a sentence.')).show()

Factory Constructor

Create the operator via the following factory method:

text_embedding.sbert(model_name='all-MiniLM-L12-v2')

Parameters:

model_name: str

The model name in string. Supported model names:

Refer to SBert Doc. Please note that only models listed supported_model_names are tested. You can refer to Towhee Pipeline for model performance.

device: str

The device to run model, defaults to None. If None, it will use 'cuda' automatically when cuda is available.

Interface

The operator takes a sentence or a list of sentences in string as input. It loads tokenizer and pre-trained model using model name, and then returns text embedding in numpy.ndarray.

call(txt)

Parameters:

txt: Union[List[str], str]

A sentence or a list of sentences in string.

Returns:

Union[List[numpy.ndarray], numpy.ndarray]

If input is a sentence in string, then it returns an embedding vector of shape (dim,) in numpy.ndarray. If input is a list of sentences, then it returns a list of embedding vectors, each of which a numpy.ndarray in shape of (dim,).

supported_model_names(format=None)

Get a list of all supported model names or supported model names for specified model format.

Parameters:

format: str

The model format such as 'pytorch', defaults to None. If None, it will return a full list of supported model names.

from towhee import ops

op = ops.sentence_embedding.sentence_transformers().get_op()
full_list = op.supported_model_names()

2.5 KiB

Raw Blame History

Sentence Embedding with Sentence Transformers

author: Jael Gu

Description

Code Example

Use the pre-trained model "all-MiniLM-L12-v2" to generate a text embedding for the sentence "This is a sentence.".

Write a pipeline with explicit inputs/outputs name specifications:

from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('sentence')
        .map('sentence', 'vec', ops.sentence_embedding.sbert(model_name='all-MiniLM-L12-v2'))
        .output('sentence', 'vec')
)

DataCollection(p('This is a sentence.')).show()

Factory Constructor

Create the operator via the following factory method:

text_embedding.sbert(model_name='all-MiniLM-L12-v2')

Parameters:

model_name: str

The model name in string. Supported model names:

Refer to SBert Doc. Please note that only models listed supported_model_names are tested. You can refer to Towhee Pipeline for model performance.

device: str

The device to run model, defaults to None. If None, it will use 'cuda' automatically when cuda is available.

Interface

The operator takes a sentence or a list of sentences in string as input. It loads tokenizer and pre-trained model using model name, and then returns text embedding in numpy.ndarray.

call(txt)

Parameters:

txt: Union[List[str], str]

A sentence or a list of sentences in string.

Returns:

Union[List[numpy.ndarray], numpy.ndarray]

supported_model_names(format=None)

Get a list of all supported model names or supported model names for specified model format.

Parameters:

format: str

The model format such as 'pytorch', defaults to None. If None, it will return a full list of supported model names.

from towhee import ops

op = ops.sentence_embedding.sentence_transformers().get_op()
full_list = op.supported_model_names()

Readme

Files and versions

2.5 KiB Raw Blame History

Sentence Embedding with Sentence Transformers

Description

Code Example

Factory Constructor

Interface

2.5 KiB Raw Blame History

Sentence Embedding with Sentence Transformers

Description

Code Example

Factory Constructor

Interface

2.5 KiB

Raw Blame History

2.5 KiB

Raw Blame History