Readme

Files and versions

1.9 KiB

Raw Blame History

Text Embedding with dpr

author: Kyle He

Desription

This operator uses Dense Passage Retrieval (DPR) to convert long text to embeddings.

Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was introduced in Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih[1].

DPR models were proposed in "Dense Passage Retrieval for Open-Domain Question Answering"[2].

In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework[2].

References

[1].https://huggingface.co/docs/transformers/model_doc/dpr

[2].https://arxiv.org/abs/2004.04906

Code Example

Use the pretrained model "facebook/dpr-ctx_encoder-single-nq-base" to generate a text embedding for the sentence "Hello, world.".

Write the pipeline:

import towhee

towhee.dc(["Hello, world."]) \
      .text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base")

Factory Constructor

Create the operator via the following factory method

text_embedding.dpr(model_name="")

Parameters:

model_name: str

The model name in string. The default value is "facebook/dpr-ctx_encoder-single-nq-base". You can get the list of supported model names by calling get_model_list from dpr.py.

Interface

The operator takes a text in string as input. It loads tokenizer and pre-trained model using model name. and then return text embedding in ndarray.

Parameters:

text: str

The text in string.

Returns:

numpy.ndarray

The text embedding extracted by model.

1.9 KiB

Raw Blame History

Text Embedding with dpr

author: Kyle He

Desription

This operator uses Dense Passage Retrieval (DPR) to convert long text to embeddings.

DPR models were proposed in "Dense Passage Retrieval for Open-Domain Question Answering"[2].

References

[1].https://huggingface.co/docs/transformers/model_doc/dpr

[2].https://arxiv.org/abs/2004.04906

Code Example

Use the pretrained model "facebook/dpr-ctx_encoder-single-nq-base" to generate a text embedding for the sentence "Hello, world.".

Write the pipeline:

import towhee

towhee.dc(["Hello, world."]) \
      .text_embedding.dpr(model_name="facebook/dpr-ctx_encoder-single-nq-base")

Factory Constructor

Create the operator via the following factory method

text_embedding.dpr(model_name="")

Parameters:

model_name: str

The model name in string. The default value is "facebook/dpr-ctx_encoder-single-nq-base". You can get the list of supported model names by calling get_model_list from dpr.py.

Interface

The operator takes a text in string as input. It loads tokenizer and pre-trained model using model name. and then return text embedding in ndarray.

Parameters:

text: str

The text in string.

Returns:

numpy.ndarray

The text embedding extracted by model.

Readme

Files and versions

1.9 KiB Raw Blame History

Text Embedding with dpr

Desription

References

Code Example

Factory Constructor

Interface

1.9 KiB Raw Blame History

Text Embedding with dpr

Desription

References

Code Example

Factory Constructor

Interface

1.9 KiB

Raw Blame History

1.9 KiB

Raw Blame History