diff --git a/README.md b/README.md index 937264b..82c6522 100644 --- a/README.md +++ b/README.md @@ -133,3 +133,38 @@ op = ops.sentence_embedding.transformers().get_op() full_list = op.supported_model_names() onnx_list = op.supported_model_names(format='onnx') ``` + +## Fine-tune +### Requirement +If you want to train this operator, besides dependency in requirements.txt, you need install these dependencies. +```python +! python -m pip install datasets evaluate scikit-learn +``` +### Get started + +Simply speaking, you only need to construct an op instance and pass in some configurations to train the specified task. +```python +import towhee + +bert_op = towhee.ops.sentence_embedding.transformers(model_name='bert-base-uncased').get_op() +data_args = { + 'dataset_name': 'wikitext', + 'dataset_config_name': 'wikitext-2-raw-v1', +} +training_args = { + 'num_train_epochs': 3, # you can add epoch number to get a better metric. + 'per_device_train_batch_size': 8, + 'per_device_eval_batch_size': 8, + 'do_train': True, + 'do_eval': True, + 'output_dir': './tmp/test-mlm', + 'overwrite_output_dir': True +} +bert_op.train(task='mlm', data_args=data_args, training_args=training_args) + +``` +For more infos, refer to the [examples](https://github.com/towhee-io/examples/tree/main/fine_tune/6_train_language_modeling_tasks). + +### Dive deep and customize your training +You can change the [training script](https://towhee.io/text-embedding/transformers/src/branch/main/train_clm_with_hf_trainer.py) in your customer way. +Or your can refer to the original [hugging face transformers training examples](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling).