logo
Browse Source

update the operator.

Signed-off-by: wxywb <xy.wang@zilliz.com>
main
wxywb 2 years ago
parent
commit
e00ae18f81
  1. 84
      README.md
  2. 2
      __init__.py
  3. 4
      opus_mt.py
  4. BIN
      result.png

84
README.md

@ -1,2 +1,84 @@
# opus_mt
# Machine Translation with Opus-MT
*author: David Wang*
<br />
## Description
A machine translation operator translates a sentence, paragraph, or document from source language
to the target language. This operator is trained on [OPUS](https://opus.nlpl.eu/) data by Helsinki-NLP.
More detail can be found in [ Helsinki-NLP/Opus-MT ](https://github.com/Helsinki-NLP/Opus-MT).
<br />
## Code Example
Use the pre-trained model 'opus-mt-en-zh'
to generate the Chinese translation for the sentence "Hello, world.".
*Write the pipeline*:
```python
import towhee
(
towhee.dc(["Hello, world."])
.machine_translation.opus_mt(model_name="opus-mt-en-zh")
)
```
*Write a same pipeline with explicit inputs/outputs name specifications:*
```python
import towhee
(
towhee.dc['text'](["Hello, world."])
.machine_translation.opus_mt['text', 'vec'](model_name="opus-mt-en-zh")
.show()
)
```
<img src="./result.png" width="800px"/>
<br />
## Factory Constructor
Create the operator via the following factory method:
***machine_translatioin.opus_mt(model_name="opus-mt-en-zh")***
**Parameters:**
***model_name***: *str*
The model name in string.
The default model name is "opus-mt-en-zh".
Supported model names:
- opus-mt-en-zh
- opus-mt-zh-en
<br />
## Interface
The operator takes a piece of text in string as input.
It loads tokenizer and pre-trained model using model name.
and then return translated text in string.
***__call__(text)***
**Parameters:**
***text***: *str*
​ The source language text in string.
**Returns**:
*str*
​ The target language text.

2
__init__.py

@ -14,5 +14,5 @@
from .opus_mt import OpusMT
def opus_mt(model_name: str):
def opus_mt(model_name: str = 'opus-mt-en-zh'):
return OpusMT(model_name)

4
opus_mt.py

@ -37,8 +37,8 @@ class OpusMT(NNOperator):
self.model = AutoModelForSeq2SeqLM.from_pretrained(config['model'])
self.model.to(self.device)
def __call__(self, data):
input_ids = self.tokenizer(data, return_tensors='pt', padding=True)['input_ids'].to(self.device)
def __call__(self, text):
input_ids = self.tokenizer(text, return_tensors='pt', padding=True)['input_ids'].to(self.device)
outputs = self.model.generate(input_ids)
decoded = self.tokenizer.decode(outputs[0].detach().cpu(), skip_special_tokens=True)
return decoded

BIN
result.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Loading…
Cancel
Save