transformers/benchmark/README.md

# Evaluation

## Model performance in sentence similarity

1. Download SentEval & test data
```bash
git clone https://github.com/facebookresearch/SentEval.git
cd SentEval/data/downstream
./get_transfer_data.bash
```

2. Run test script
```bash
python transformers_test.py MODEL_NAME
```

## QPS Test

Please note that `qps_test.py` uses:
- `localhost:8000`: to connect triton client
- `'Hello, world.''`: as test sentence

```bash
python qps_test --model paraphrase-albert-small-v2 --pipe --onnx --triton --num 100
```

**Args:**
- `--model`: mandatory, string, model name
- `--pipe`: optional, on/off flag to enable qps test for pipe
- `--onnx`: optional, on/off flag to enable qps test for onnx
- `--triton`: optional, on/off flag to enable qps for triton (please make sure that triton client is ready)
- `--num`: optional, integer, defaults to 100, batch size in each loop (10 loops in total)
- `--device`: optional, int, defaults to -1, cuda index or use cpu when -1
Add benchmark Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`# Evaluation`

			`## Model performance in sentence similarity`

			`1. Download SentEval & test data`
			```bash
			`git clone https://github.com/facebookresearch/SentEval.git`
			`cd SentEval/data/downstream`
			`./get_transfer_data.bash`
			```

			`2. Run test script`
			```bash
			`python transformers_test.py MODEL_NAME`
Update benchmark Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			```

			`## QPS Test`

			Please note that `qps_test.py` uses:
			- `localhost:8000`: to connect triton client
			- `'Hello, world.''`: as test sentence

			```bash
			`python qps_test --model paraphrase-albert-small-v2 --pipe --onnx --triton --num 100`
			```

			`Args:`
			- `--model`: mandatory, string, model name
			- `--pipe`: optional, on/off flag to enable qps test for pipe
			- `--onnx`: optional, on/off flag to enable qps test for onnx
			- `--triton`: optional, on/off flag to enable qps for triton (please make sure that triton client is ready)
			- `--num`: optional, integer, defaults to 100, batch size in each loop (10 loops in total)
			- `--device`: optional, int, defaults to -1, cuda index or use cpu when -1