|
|
|
# Machine Translation with Opus-MT
|
|
|
|
|
|
|
|
*author: David Wang*
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
|
|
A machine translation operator translates a sentence, paragraph, or document from source language
|
|
|
|
to the target language. This operator is trained on [OPUS](https://opus.nlpl.eu/) data by Helsinki-NLP.
|
|
|
|
More detail can be found in [ Helsinki-NLP/Opus-MT ](https://github.com/Helsinki-NLP/Opus-MT).
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Code Example
|
|
|
|
|
|
|
|
Use the pre-trained model 'opus-mt-en-zh'
|
|
|
|
to generate the Chinese translation for the sentence "Hello, world.".
|
|
|
|
|
|
|
|
*Write the pipeline*:
|
|
|
|
|
|
|
|
```python
|
|
|
|
import towhee
|
|
|
|
|
|
|
|
(
|
|
|
|
towhee.dc(["Hello, world."])
|
|
|
|
.machine_translation.opus_mt(model_name="opus-mt-en-zh")
|
|
|
|
)
|
|
|
|
```
|
|
|
|
|
|
|
|
*Write a same pipeline with explicit inputs/outputs name specifications:*
|
|
|
|
|
|
|
|
```python
|
|
|
|
import towhee
|
|
|
|
|
|
|
|
(
|
|
|
|
towhee.dc['text'](["Hello, world."])
|
|
|
|
.machine_translation.opus_mt['text', 'vec'](model_name="opus-mt-en-zh")
|
|
|
|
.show()
|
|
|
|
)
|
|
|
|
```
|
|
|
|
|
|
|
|
<img src="./result.png" width="800px"/>
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Factory Constructor
|
|
|
|
|
|
|
|
Create the operator via the following factory method:
|
|
|
|
|
|
|
|
***machine_translatioin.opus_mt(model_name="opus-mt-en-zh")***
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***model_name***: *str*
|
|
|
|
|
|
|
|
The model name in string.
|
|
|
|
The default model name is "opus-mt-en-zh".
|
|
|
|
|
|
|
|
Supported model names:
|
|
|
|
- opus-mt-en-zh
|
|
|
|
- opus-mt-zh-en
|
|
|
|
|
|
|
|
<br />
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
|
|
|
The operator takes a piece of text in string as input.
|
|
|
|
It loads tokenizer and pre-trained model using model name.
|
|
|
|
and then return translated text in string.
|
|
|
|
|
|
|
|
***__call__(text)***
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
***text***: *str*
|
|
|
|
|
|
|
|
The source language text in string.
|
|
|
|
|
|
|
|
**Returns**:
|
|
|
|
|
|
|
|
*str*
|
|
|
|
|
|
|
|
The target language text.
|