|  |  |  | # Text Embedding with Transformers
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *author: [Jael Gu](https://github.com/jaelgu)* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Description
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A text embedding operator takes a sentence, paragraph, or document in string as an input | 
					
						
							|  |  |  | and output an embedding vector in ndarray which captures the input's core semantic elements. | 
					
						
							|  |  |  | This operator is implemented with pre-trained models from [Huggingface Transformers](https://huggingface.co/docs/transformers). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Code Example
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Use the pre-trained model 'distilbert-base-cased' | 
					
						
							|  |  |  | to generate a text embedding for the sentence "Hello, world.". | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *Write the pipeline*: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | import towhee | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ( | 
					
						
							|  |  |  |     towhee.dc(["Hello, world."]) | 
					
						
							|  |  |  |           .text_embedding.transformers(model_name="distilbert-base-cased") | 
					
						
							|  |  |  | ) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *Write a same pipeline with explicit inputs/outputs name specifications:* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | import towhee | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ( | 
					
						
							|  |  |  |     towhee.dc['text'](["Hello, world."]) | 
					
						
							|  |  |  |           .text_embedding.transformers['text', 'vec'](model_name="distilbert-base-cased") | 
					
						
							|  |  |  |           .show() | 
					
						
							|  |  |  | ) | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <img src="./result.png" width="800px"/> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Factory Constructor
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Create the operator via the following factory method: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***text_embedding.transformers(model_name="bert-base-uncased")*** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Parameters:** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***model_name***: *str* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The model name in string. | 
					
						
							|  |  |  | The default model name is "bert-base-uncased". | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Supported model names: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Albert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - albert-base-v1 | 
					
						
							|  |  |  |   - albert-large-v1 | 
					
						
							|  |  |  |   - albert-xlarge-v1 | 
					
						
							|  |  |  |   - albert-xxlarge-v1 | 
					
						
							|  |  |  |   - albert-base-v2 | 
					
						
							|  |  |  |   - albert-large-v2 | 
					
						
							|  |  |  |   - albert-xlarge-v2 | 
					
						
							|  |  |  |   - albert-xxlarge-v2 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Bart</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - facebook/bart-large | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Bert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - bert-base-cased | 
					
						
							|  |  |  |   - bert-large-cased | 
					
						
							|  |  |  |   - bert-large-uncased | 
					
						
							|  |  |  |   - bert-base-multilingual-uncased | 
					
						
							|  |  |  |   - bert-base-multilingual-cased | 
					
						
							|  |  |  |   - bert-base-chinese | 
					
						
							|  |  |  |   - bert-base-german-cased | 
					
						
							|  |  |  |   - bert-large-uncased-whole-word-masking | 
					
						
							|  |  |  |   - bert-large-cased-whole-word-masking | 
					
						
							|  |  |  |   - bert-large-uncased-whole-word-masking-finetuned-squad | 
					
						
							|  |  |  |   - bert-large-cased-whole-word-masking-finetuned-squad | 
					
						
							|  |  |  |   - bert-base-cased-finetuned-mrpc | 
					
						
							|  |  |  |   - bert-base-german-dbmdz-cased | 
					
						
							|  |  |  |   - bert-base-german-dbmdz-uncased | 
					
						
							|  |  |  |   - cl-tohoku/bert-base-japanese-whole-word-masking | 
					
						
							|  |  |  |   - cl-tohoku/bert-base-japanese-char | 
					
						
							|  |  |  |   - cl-tohoku/bert-base-japanese-char-whole-word-masking | 
					
						
							|  |  |  |   - TurkuNLP/bert-base-finnish-cased-v1 | 
					
						
							|  |  |  |   - TurkuNLP/bert-base-finnish-uncased-v1 | 
					
						
							|  |  |  |   - wietsedv/bert-base-dutch-cased | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>BertGeneration</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/bert_for_seq_generation_L-24_bbc_encoder | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>BigBird</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/bigbird-roberta-base | 
					
						
							|  |  |  |   - google/bigbird-roberta-large | 
					
						
							|  |  |  |   - google/bigbird-base-trivia-itc | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>BigBirdPegasus</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/bigbird-pegasus-large-arxiv | 
					
						
							|  |  |  |   - google/bigbird-pegasus-large-pubmed | 
					
						
							|  |  |  |   - google/bigbird-pegasus-large-bigpatent | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>CamemBert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - camembert-base | 
					
						
							|  |  |  |   - Musixmatch/umberto-commoncrawl-cased-v1 | 
					
						
							|  |  |  |   - Musixmatch/umberto-wikipedia-uncased-v1 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Canine</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/canine-s | 
					
						
							|  |  |  |   - google/canine-c | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Convbert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - YituTech/conv-bert-base | 
					
						
							|  |  |  |   - YituTech/conv-bert-medium-small | 
					
						
							|  |  |  |   - YituTech/conv-bert-small | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>CTRL</summary> | 
					
						
							|  |  |  |   - ctrl | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>DeBERTa</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - microsoft/deberta-base | 
					
						
							|  |  |  |   - microsoft/deberta-large | 
					
						
							|  |  |  |   - microsoft/deberta-xlarge | 
					
						
							|  |  |  |   - microsoft/deberta-base-mnli | 
					
						
							|  |  |  |   - microsoft/deberta-large-mnli | 
					
						
							|  |  |  |   - microsoft/deberta-xlarge-mnli | 
					
						
							|  |  |  |   - microsoft/deberta-v2-xlarge | 
					
						
							|  |  |  |   - microsoft/deberta-v2-xxlarge | 
					
						
							|  |  |  |   - microsoft/deberta-v2-xlarge-mnli | 
					
						
							|  |  |  |   - microsoft/deberta-v2-xxlarge-mnli | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>DistilBert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - distilbert-base-uncased | 
					
						
							|  |  |  |   - distilbert-base-uncased-distilled-squad | 
					
						
							|  |  |  |   - distilbert-base-cased | 
					
						
							|  |  |  |   - distilbert-base-cased-distilled-squad | 
					
						
							|  |  |  |   - distilbert-base-german-cased | 
					
						
							|  |  |  |   - distilbert-base-multilingual-cased | 
					
						
							|  |  |  |   - distilbert-base-uncased-finetuned-sst-2-english | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Electral</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/electra-small-generator | 
					
						
							|  |  |  |   - google/electra-base-generator | 
					
						
							|  |  |  |   - google/electra-large-generator | 
					
						
							|  |  |  |   - google/electra-small-discriminator | 
					
						
							|  |  |  |   - google/electra-base-discriminator | 
					
						
							|  |  |  |   - google/electra-large-discriminator | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Flaubert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - flaubert/flaubert_small_cased | 
					
						
							|  |  |  |   - flaubert/flaubert_base_uncased | 
					
						
							|  |  |  |   - flaubert/flaubert_base_cased | 
					
						
							|  |  |  |   - flaubert/flaubert_large_cased | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>FNet</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/fnet-base | 
					
						
							|  |  |  |   - google/fnet-large | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>FSMT</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - facebook/wmt19-ru-en | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Funnel</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - funnel-transformer/small | 
					
						
							|  |  |  |   - funnel-transformer/small-base | 
					
						
							|  |  |  |   - funnel-transformer/medium | 
					
						
							|  |  |  |   - funnel-transformer/medium-base | 
					
						
							|  |  |  |   - funnel-transformer/intermediate | 
					
						
							|  |  |  |   - funnel-transformer/intermediate-base | 
					
						
							|  |  |  |   - funnel-transformer/large | 
					
						
							|  |  |  |   - funnel-transformer/large-base | 
					
						
							|  |  |  |   - funnel-transformer/xlarge-base | 
					
						
							|  |  |  |   - funnel-transformer/xlarge | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>GPT</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - openai-gpt | 
					
						
							|  |  |  |   - gpt2 | 
					
						
							|  |  |  |   - gpt2-medium | 
					
						
							|  |  |  |   - gpt2-large | 
					
						
							|  |  |  |   - gpt2-xl | 
					
						
							|  |  |  |   - distilgpt2 | 
					
						
							|  |  |  |   - EleutherAI/gpt-neo-1.3B | 
					
						
							|  |  |  |   - EleutherAI/gpt-j-6B | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>I-Bert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - kssteven/ibert-roberta-base | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>LED</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - allenai/led-base-16384 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>MobileBert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/mobilebert-uncased | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>MPNet</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - microsoft/mpnet-base | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Nystromformer</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - uw-madison/nystromformer-512 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Reformer</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - google/reformer-crime-and-punishment | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Splinter</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - tau/splinter-base | 
					
						
							|  |  |  |   - tau/splinter-base-qass | 
					
						
							|  |  |  |   - tau/splinter-large | 
					
						
							|  |  |  |   - tau/splinter-large-qass | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>SqueezeBert</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - squeezebert/squeezebert-uncased | 
					
						
							|  |  |  |   - squeezebert/squeezebert-mnli | 
					
						
							|  |  |  |   - squeezebert/squeezebert-mnli-headless | 
					
						
							|  |  |  | </details>     | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>TransfoXL</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - transfo-xl-wt103 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>XLM</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - xlm-mlm-en-2048 | 
					
						
							|  |  |  |   - xlm-mlm-ende-1024 | 
					
						
							|  |  |  |   - xlm-mlm-enfr-1024 | 
					
						
							|  |  |  |   - xlm-mlm-enro-1024 | 
					
						
							|  |  |  |   - xlm-mlm-tlm-xnli15-1024 | 
					
						
							|  |  |  |   - xlm-mlm-xnli15-1024 | 
					
						
							|  |  |  |   - xlm-clm-enfr-1024 | 
					
						
							|  |  |  |   - xlm-clm-ende-1024 | 
					
						
							|  |  |  |   - xlm-mlm-17-1280 | 
					
						
							|  |  |  |   - xlm-mlm-100-1280 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>XLMRoberta</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - xlm-roberta-base | 
					
						
							|  |  |  |   - xlm-roberta-large | 
					
						
							|  |  |  |   - xlm-roberta-large-finetuned-conll02-dutch | 
					
						
							|  |  |  |   - xlm-roberta-large-finetuned-conll02-spanish | 
					
						
							|  |  |  |   - xlm-roberta-large-finetuned-conll03-english | 
					
						
							|  |  |  |   - xlm-roberta-large-finetuned-conll03-german | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>XLNet</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - xlnet-base-cased | 
					
						
							|  |  |  |   - xlnet-large-cased | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <details><summary>Yoso</summary> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   - uw-madison/yoso-4096 | 
					
						
							|  |  |  | </details> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | <br /> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Interface
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The operator takes a piece of text in string as input. | 
					
						
							|  |  |  | It loads tokenizer and pre-trained model using model name. | 
					
						
							|  |  |  | and then return text embedding in ndarray. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***__call__(txt)*** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Parameters:** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***txt***: *str* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The text in string. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Returns**: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *numpy.ndarray* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The text embedding extracted by model. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***save_model(format='pytorch', path='default')*** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Save model to local with specified format. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Parameters:** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***format***: *str* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The format of saved model, defaults to 'pytorch'. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***format***: *path* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The path where model is saved to. By default, it will save model to the operator directory. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***supported_model_names(format=None)*** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Get a list of all supported model names or supported model names for specified model format. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | **Parameters:** | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ***format***: *str* | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 	The model format such as 'pytorch', 'torchscript'. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ```python | 
					
						
							|  |  |  | from towhee import ops | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | op = ops.text_embedding.transformers().get_op() | 
					
						
							|  |  |  | full_list = op.supported_model_names() | 
					
						
							|  |  |  | onnx_list = op.supported_model_names(format='onnx') | 
					
						
							|  |  |  | print(f'Onnx-support/Total Models: {len(onnx_list)}/{len(full_list)}') | 
					
						
							|  |  |  | ``` | 
					
						
							|  |  |  |     2022-12-13 16:25:15,916 - 140704500614336 - auto_transformers.py-auto_transformers:68 - WARNING: The operator is initialized without specified model. | 
					
						
							|  |  |  |     Onnx-support/Total Models: 111/126 |