update the operator.

Signed-off-by: wxywb <xy.wang@zilliz.com>
3 years ago · e00ae18f81
4 changed files with 86 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,84 @@
 # opus_mt
 # Machine Translation with Opus-MT
 *author: David Wang*
 <br />
 ## Description
 A machine translation operator translates a sentence, paragraph, or document from source language 
 to the target language. This operator is trained on [OPUS](https://opus.nlpl.eu/) data by Helsinki-NLP. 
 More detail can be found in [ Helsinki-NLP/Opus-MT ](https://github.com/Helsinki-NLP/Opus-MT). 
 <br />
 ## Code Example
 Use the pre-trained model 'opus-mt-en-zh'
 to generate the Chinese translation for the sentence "Hello, world.".
 *Write the pipeline*:
 ```python
 import towhee
 (
    towhee.dc(["Hello, world."])
          .machine_translation.opus_mt(model_name="opus-mt-en-zh")
 )
 ```
 *Write a same pipeline with explicit inputs/outputs name specifications:*
 ```python
 import towhee
 (
    towhee.dc['text'](["Hello, world."])
          .machine_translation.opus_mt['text', 'vec'](model_name="opus-mt-en-zh")
          .show()
 )
 ```
 <img src="./result.png" width="800px"/>
 <br />
 ## Factory Constructor
 Create the operator via the following factory method:
 ***machine_translatioin.opus_mt(model_name="opus-mt-en-zh")***
 **Parameters:**
 ***model_name***: *str*
 The model name in string.
 The default model name is "opus-mt-en-zh".
 Supported model names:
 - opus-mt-en-zh
 - opus-mt-zh-en
 <br />
 ## Interface
 The operator takes a piece of text in string as input.
 It loads tokenizer and pre-trained model using model name.
 and then return translated text in string.
 ***__call__(text)***
 **Parameters:**
 ***text***: *str*
 	The source language text in string.
 **Returns**:
 *str*
 	The target language text.
--- a/init.py
+++ b/init.py
@ -14,5 +14,5 @@
 from .opus_mt import OpusMT
 def opus_mt(model_name: str):
 def opus_mt(model_name: str = 'opus-mt-en-zh'):
    return OpusMT(model_name)
--- a/opus_mt.py
+++ b/opus_mt.py
@ -37,8 +37,8 @@ class OpusMT(NNOperator):
        self.model = AutoModelForSeq2SeqLM.from_pretrained(config['model'])
        self.model.to(self.device)
    def __call__(self, data): 
        input_ids = self.tokenizer(data, return_tensors='pt', padding=True)['input_ids'].to(self.device)
    def __call__(self, text): 
        input_ids = self.tokenizer(text, return_tensors='pt', padding=True)['input_ids'].to(self.device)
        outputs = self.model.generate(input_ids)
        decoded = self.tokenizer.decode(outputs[0].detach().cpu(), skip_special_tokens=True)
        return decoded
--- a/result.png
+++ b/result.png