update the operator.

Signed-off-by: wxywb <xy.wang@zilliz.com>
3 years ago · e00ae18f81
4 changed files with 86 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,84 @@
-# opus_mt
+# Machine Translation with Opus-MT

+*author: David Wang*
+
+<br />
+
+## Description
+
+A machine translation operator translates a sentence, paragraph, or document from source language 
+to the target language. This operator is trained on [OPUS](https://opus.nlpl.eu/) data by Helsinki-NLP. 
+More detail can be found in [ Helsinki-NLP/Opus-MT ](https://github.com/Helsinki-NLP/Opus-MT). 
+
+<br />
+
+## Code Example
+
+Use the pre-trained model 'opus-mt-en-zh'
+to generate the Chinese translation for the sentence "Hello, world.".
+
+*Write the pipeline*:
+
+```python
+import towhee
+
+(
+    towhee.dc(["Hello, world."])
+          .machine_translation.opus_mt(model_name="opus-mt-en-zh")
+)
+```
+
+*Write a same pipeline with explicit inputs/outputs name specifications:*
+
+```python
+import towhee
+
+(
+    towhee.dc['text'](["Hello, world."])
+          .machine_translation.opus_mt['text', 'vec'](model_name="opus-mt-en-zh")
+          .show()
+)
+```
+
+<img src="./result.png" width="800px"/>
+
+<br />
+
+## Factory Constructor
+
+Create the operator via the following factory method:
+
+***machine_translatioin.opus_mt(model_name="opus-mt-en-zh")***
+
+**Parameters:**
+
+***model_name***: *str*
+
+The model name in string.
+The default model name is "opus-mt-en-zh".
+
+Supported model names:
+ - opus-mt-en-zh
+ - opus-mt-zh-en
+
+<br />
+
+## Interface
+
+The operator takes a piece of text in string as input.
+It loads tokenizer and pre-trained model using model name.
+and then return translated text in string.
+
+***__call__(text)***
+
+**Parameters:**
+
+***text***: *str*
+
+	The source language text in string.
+
+**Returns**:
+
+*str*
+
+	The target language text.
--- a/init.py
+++ b/init.py
@ -14,5 +14,5 @@

 from .opus_mt import OpusMT

-def opus_mt(model_name: str):
+def opus_mt(model_name: str = 'opus-mt-en-zh'):
    return OpusMT(model_name)
--- a/opus_mt.py
+++ b/opus_mt.py
@ -37,8 +37,8 @@ class OpusMT(NNOperator):
        self.model = AutoModelForSeq2SeqLM.from_pretrained(config['model'])
        self.model.to(self.device)

-    def __call__(self, data): 
-        input_ids = self.tokenizer(data, return_tensors='pt', padding=True)['input_ids'].to(self.device)
+    def __call__(self, text): 
+        input_ids = self.tokenizer(text, return_tensors='pt', padding=True)['input_ids'].to(self.device)
        outputs = self.model.generate(input_ids)
        decoded = self.tokenizer.decode(outputs[0].detach().cpu(), skip_special_tokens=True)
        return decoded
--- a/result.png
+++ b/result.png