Text Embedding with Transformers

author: Jael Gu

Desription

A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator is implemented with pretrained models from Huggingface Transformers.

Code Example

Use the pretrained model 'distilbert-base-cased' to generate a text embedding for the sentence "Hello, world.".

Write the pipeline:

from towhee import dc


dc.stream(["Hello, world."]) \
  .text_embedding.transformers(model_name="distilbert-base-cased") \
  .to_list()

Factory Constructor

Create the operator via the following factory method

text_embedding.transformers(model_name="bert-base-uncased")

Parameters:

model_name: str

The model name in string. You can get the list of supported model names by calling get_model_list from auto_transformers.py.

Interface

The operator takes a text in string as input. It loads tokenizer and pre-trained model using model name. and then return text embedding in ndarray.

Parameters:

text: str

The text in string.

Returns:

numpy.ndarray

The text embedding extracted by model.

Jael Gu 518c0b8737 Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			11 Commits
.gitattributes	1.1 KiB	Initial commit	4 years ago
README.md	1.4 KiB	Update README	4 years ago
__init__.py	709 B	Debug	4 years ago
auto_transformers.py	6.8 KiB	Update	4 years ago
requirements.txt	42 B	Refactor operator	4 years ago