logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 3 years ago

text-embedding

Text Embedding with Transformers

author: Jael Gu

Desription

A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator uses the REALM model, which is a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks. [1] The original model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.[2]

Reference

[1].https://huggingface.co/docs/transformers/model_doc/realm

[2].https://arxiv.org/abs/2002.08909

Code Example

Use the pretrained model "google/realm-cc-news-pretrained-embedder" to generate a text embedding for the sentence "Hello, world.".

Write the pipeline:

from towhee import dc


dc.stream(["Hello, world."])
  .text_embedding.realm(model_name="google/realm-cc-news-pretrained-embedder")
  .show()

Factory Constructor

Create the operator via the following factory method

text_embedding.transformers(model_name="google/realm-cc-news-pretrained-embedder")

Parameters:

model_name: str

​ The model name in string. You can get the list of supported model names by calling get_model_list from realm.py.

Interface

The operator takes a text in string as input. It loads tokenizer and pre-trained model using model name. and then return text embedding in ndarray.

Parameters:

text: str

​ The text in string.

Returns:

numpy.ndarray

​ The text embedding extracted by model.

Jael Gu f34e610e6c Update 4 Commits
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 3 years ago
file-icon README.md
1.8 KiB
download-icon
Update 3 years ago
file-icon __init__.py
668 B
download-icon
Update 3 years ago
file-icon realm.py
2.5 KiB
download-icon
Update 3 years ago
file-icon requirements.txt
42 B
download-icon
init the repo. 3 years ago