logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 3 years ago

text-embedding

Text Embedding with Realm

author: Jael Gu


Description

A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator uses the REALM model, which is a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks. [1] The original model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.[2]

References

[1].https://huggingface.co/docs/transformers/model_doc/realm

[2].https://arxiv.org/abs/2002.08909


Code Example

Use the pre-trained model "google/realm-cc-news-pretrained-embedder" to generate a text embedding for the sentence "Hello, world.".

Write the pipeline:

import towhee

towhee.dc(["Hello, world."]) \
      .text_embedding.realm(model_name="google/realm-cc-news-pretrained-embedder")


Factory Constructor

Create the operator via the following factory method:

text_embedding.transformers(model_name="google/realm-cc-news-pretrained-embedder")

Parameters:

model_name: str

The model name in string. The default value is "google/realm-cc-news-pretrained-embedder".

Supported model name:

  • google/realm-cc-news-pretrained-embedder


Interface

The operator takes a piece of text in string as input. It loads tokenizer and pre-trained model using model name and then return text embedding in ndarray.

Parameters:

text: str

The text in string.

Returns:

numpy.ndarray

​ The text embedding extracted by model.

ChengZi 404eba7602 add requirement 15 Commits
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 3 years ago
file-icon README.md
1.7 KiB
download-icon
[DOC] Refine Readme 3 years ago
file-icon __init__.py
668 B
download-icon
Update 3 years ago
file-icon realm.py
2.5 KiB
download-icon
Update 3 years ago
file-icon requirements.txt
55 B
download-icon
add requirement 3 years ago