logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

1.5 KiB

Text Embedding with Transformers

author: Jael Gu

Desription

A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator is implemented with pretrained models from Huggingface Transformers.

Code Example

Use the pretrained model 'distilbert-base-cased' to generate a text embedding for the sentence "Hello, world.".

Write the pipeline in simplified style:

from towhee import dc


dc.stream(["Hello, world."])
  .text_embedding.transformers('distilbert-base-cased')
  .show()

Write a same pipeline with explicit inputs/outputs name specifications:

from towhee import dc


dc.stream['txt'](["Hello, world."])
  .text_embedding.transformers['txt', 'vec']('distilbert-base-cased')
  .select('txt', 'vec')  
  .show()

Factory Constructor

Create the operator via the following factory method

text_embedding.transformers(model_name="bert-base-uncased")

Parameters:

model_name: str

​ The model name in string. You can get the list of supported model names by calling get_model_list.

Interface

The operator takes a text in string as input. It loads tokenizer and pre-trained model using model name. Text embeddings are returned in ndarray.

Parameters:

text: str

​ The text in string.

Returns:

numpy.ndarray

​ The text embedding extracted by model.

1.5 KiB

Text Embedding with Transformers

author: Jael Gu

Desription

A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator is implemented with pretrained models from Huggingface Transformers.

Code Example

Use the pretrained model 'distilbert-base-cased' to generate a text embedding for the sentence "Hello, world.".

Write the pipeline in simplified style:

from towhee import dc


dc.stream(["Hello, world."])
  .text_embedding.transformers('distilbert-base-cased')
  .show()

Write a same pipeline with explicit inputs/outputs name specifications:

from towhee import dc


dc.stream['txt'](["Hello, world."])
  .text_embedding.transformers['txt', 'vec']('distilbert-base-cased')
  .select('txt', 'vec')  
  .show()

Factory Constructor

Create the operator via the following factory method

text_embedding.transformers(model_name="bert-base-uncased")

Parameters:

model_name: str

​ The model name in string. You can get the list of supported model names by calling get_model_list.

Interface

The operator takes a text in string as input. It loads tokenizer and pre-trained model using model name. Text embeddings are returned in ndarray.

Parameters:

text: str

​ The text in string.

Returns:

numpy.ndarray

​ The text embedding extracted by model.