|
@ -1,6 +1,6 @@ |
|
|
# Text Embedding with Transformers |
|
|
# Text Embedding with Transformers |
|
|
|
|
|
|
|
|
*author: Jael Gu* |
|
|
|
|
|
|
|
|
*author: [Jael Gu](https://github.com/jaelgu)* |
|
|
|
|
|
|
|
|
<br /> |
|
|
<br /> |
|
|
|
|
|
|
|
@ -15,7 +15,7 @@ This operator is implemented with pretrained models from [Huggingface Transforme |
|
|
## Code Example |
|
|
## Code Example |
|
|
|
|
|
|
|
|
Use the pretrained model 'distilbert-base-cased' |
|
|
Use the pretrained model 'distilbert-base-cased' |
|
|
to generate a text embedding for the sentence "Hello, world.". |
|
|
|
|
|
|
|
|
to generate a text embedding for the sentence "Hello, world.". |
|
|
|
|
|
|
|
|
*Write the pipeline*: |
|
|
*Write the pipeline*: |
|
|
|
|
|
|
|
@ -38,10 +38,10 @@ Create the operator via the following factory method |
|
|
|
|
|
|
|
|
***model_name***: *str* |
|
|
***model_name***: *str* |
|
|
|
|
|
|
|
|
The model name in string. |
|
|
|
|
|
|
|
|
The model name in string. |
|
|
The default model name is "bert-base-uncased". |
|
|
The default model name is "bert-base-uncased". |
|
|
|
|
|
|
|
|
Supported model names: |
|
|
|
|
|
|
|
|
Supported model names: |
|
|
|
|
|
|
|
|
<details><summary>Albert</summary> |
|
|
<details><summary>Albert</summary> |
|
|
|
|
|
|
|
@ -59,7 +59,7 @@ Supported model names: |
|
|
|
|
|
|
|
|
- facebook/bart-large |
|
|
- facebook/bart-large |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<details><summary>Bert</summary> |
|
|
<details><summary>Bert</summary> |
|
|
|
|
|
|
|
|
- bert-base-cased |
|
|
- bert-base-cased |
|
@ -82,9 +82,9 @@ Supported model names: |
|
|
- TurkuNLP/bert-base-finnish-uncased-v1 |
|
|
- TurkuNLP/bert-base-finnish-uncased-v1 |
|
|
- wietsedv/bert-base-dutch-cased |
|
|
- wietsedv/bert-base-dutch-cased |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<details><summary>BertGeneration</summary> |
|
|
<details><summary>BertGeneration</summary> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- google/bert_for_seq_generation_L-24_bbc_encoder |
|
|
- google/bert_for_seq_generation_L-24_bbc_encoder |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
@ -101,7 +101,7 @@ Supported model names: |
|
|
- google/bigbird-pegasus-large-pubmed |
|
|
- google/bigbird-pegasus-large-pubmed |
|
|
- google/bigbird-pegasus-large-bigpatent |
|
|
- google/bigbird-pegasus-large-bigpatent |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<details><summary>CamemBert</summary> |
|
|
<details><summary>CamemBert</summary> |
|
|
|
|
|
|
|
|
- camembert-base |
|
|
- camembert-base |
|
@ -110,11 +110,11 @@ Supported model names: |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
|
<details><summary>Canine</summary> |
|
|
<details><summary>Canine</summary> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- google/canine-s |
|
|
- google/canine-s |
|
|
- google/canine-c |
|
|
- google/canine-c |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<details><summary>Convbert</summary> |
|
|
<details><summary>Convbert</summary> |
|
|
|
|
|
|
|
|
- YituTech/conv-bert-base |
|
|
- YituTech/conv-bert-base |
|
@ -229,7 +229,7 @@ Supported model names: |
|
|
|
|
|
|
|
|
- uw-madison/nystromformer-512 |
|
|
- uw-madison/nystromformer-512 |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<details><summary>Reformer</summary> |
|
|
<details><summary>Reformer</summary> |
|
|
|
|
|
|
|
|
- google/reformer-crime-and-punishment |
|
|
- google/reformer-crime-and-punishment |
|
@ -281,7 +281,7 @@ Supported model names: |
|
|
|
|
|
|
|
|
<details><summary>XLNet</summary> |
|
|
<details><summary>XLNet</summary> |
|
|
|
|
|
|
|
|
- xlnet-base-cased |
|
|
|
|
|
|
|
|
- xlnet-base-cased |
|
|
- xlnet-large-cased |
|
|
- xlnet-large-cased |
|
|
</details> |
|
|
</details> |
|
|
|
|
|
|
|
@ -303,12 +303,11 @@ and then return text embedding in ndarray. |
|
|
|
|
|
|
|
|
***text***: *str* |
|
|
***text***: *str* |
|
|
|
|
|
|
|
|
The text in string. |
|
|
|
|
|
|
|
|
The text in string. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
**Returns**: |
|
|
**Returns**: |
|
|
|
|
|
|
|
|
*numpy.ndarray* |
|
|
*numpy.ndarray* |
|
|
|
|
|
|
|
|
The text embedding extracted by model. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The text embedding extracted by model. |
|
|