logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

Updated 7 months ago

text-embedding

Text Embedding with Realm

author: Jael Gu


Description

A text embedding operator takes a sentence, paragraph, or document in string as an input and output an embedding vector in ndarray which captures the input's core semantic elements. This operator uses the REALM model, which is a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks. [1] The original model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.[2]

References

[1].https://huggingface.co/docs/transformers/model_doc/realm

[2].https://arxiv.org/abs/2002.08909


Code Example

Use the pre-trained model "google/realm-cc-news-pretrained-embedder" to generate a text embedding for the sentence "Hello, world.".

Write the pipeline:

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('text')
        .map('text', 'vec', ops.text_embedding.realm(model_name="google/realm-cc-news-pretrained-embedder"))
        .output('text', 'vec')
)

DataCollection(p('Hello, world.')).show()     


Factory Constructor

Create the operator via the following factory method:

text_embedding.transformers(model_name="google/realm-cc-news-pretrained-embedder")

Parameters:

model_name: str

The model name in string. The default value is "google/realm-cc-news-pretrained-embedder".

Supported model name:

  • google/realm-cc-news-pretrained-embedder


Interface

The operator takes a piece of text in string as input. It loads tokenizer and pre-trained model using model name and then return text embedding in ndarray.

Parameters:

text: str

The text in string.

Returns:

numpy.ndarray

​ The text embedding extracted by model.

More Resources

Jael Gu 38c0e8429c Add more resources 20 Commits
file-icon .gitattributes
1.1 KiB
download-icon
Initial commit 3 years ago
file-icon README.md
3.9 KiB
download-icon
Add more resources 7 months ago
file-icon __init__.py
668 B
download-icon
Update 3 years ago
file-icon realm.py
2.5 KiB
download-icon
Update 3 years ago
file-icon requirements.txt
55 B
download-icon
add requirement 3 years ago
file-icon result.png
5.6 KiB
download-icon
Update 2 years ago