diff --git a/README.md b/README.md index 81792b8..95817a4 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,84 @@ -# opus_mt +# Machine Translation with Opus-MT +*author: David Wang* + +
+ +## Description + +A machine translation operator translates a sentence, paragraph, or document from source language +to the target language. This operator is trained on [OPUS](https://opus.nlpl.eu/) data by Helsinki-NLP. +More detail can be found in [ Helsinki-NLP/Opus-MT ](https://github.com/Helsinki-NLP/Opus-MT). + +
+ +## Code Example + +Use the pre-trained model 'opus-mt-en-zh' +to generate the Chinese translation for the sentence "Hello, world.". + +*Write the pipeline*: + +```python +import towhee + +( + towhee.dc(["Hello, world."]) + .machine_translation.opus_mt(model_name="opus-mt-en-zh") +) +``` + +*Write a same pipeline with explicit inputs/outputs name specifications:* + +```python +import towhee + +( + towhee.dc['text'](["Hello, world."]) + .machine_translation.opus_mt['text', 'vec'](model_name="opus-mt-en-zh") + .show() +) +``` + + + +
+ +## Factory Constructor + +Create the operator via the following factory method: + +***machine_translatioin.opus_mt(model_name="opus-mt-en-zh")*** + +**Parameters:** + +***model_name***: *str* + +The model name in string. +The default model name is "opus-mt-en-zh". + +Supported model names: + - opus-mt-en-zh + - opus-mt-zh-en + +
+ +## Interface + +The operator takes a piece of text in string as input. +It loads tokenizer and pre-trained model using model name. +and then return translated text in string. + +***__call__(text)*** + +**Parameters:** + +***text***: *str* + +​ The source language text in string. + +**Returns**: + +*str* + +​ The target language text. diff --git a/__init__.py b/__init__.py index 89f28f3..0c60ce3 100644 --- a/__init__.py +++ b/__init__.py @@ -14,5 +14,5 @@ from .opus_mt import OpusMT -def opus_mt(model_name: str): +def opus_mt(model_name: str = 'opus-mt-en-zh'): return OpusMT(model_name) diff --git a/opus_mt.py b/opus_mt.py index df14415..9860433 100644 --- a/opus_mt.py +++ b/opus_mt.py @@ -37,8 +37,8 @@ class OpusMT(NNOperator): self.model = AutoModelForSeq2SeqLM.from_pretrained(config['model']) self.model.to(self.device) - def __call__(self, data): - input_ids = self.tokenizer(data, return_tensors='pt', padding=True)['input_ids'].to(self.device) + def __call__(self, text): + input_ids = self.tokenizer(text, return_tensors='pt', padding=True)['input_ids'].to(self.device) outputs = self.model.generate(input_ids) decoded = self.tokenizer.decode(outputs[0].detach().cpu(), skip_special_tokens=True) return decoded diff --git a/result.png b/result.png new file mode 100644 index 0000000..f4607fc Binary files /dev/null and b/result.png differ