logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

2.0 KiB

Evaluation Method

  • Model performance (WIP)
  • Pipeline speed

Model Performance

Build an image classification system based on similarity search across embeddings.

The core ideas in performance.py:

  1. create a new Milvus collection each time
  2. extract embeddings using a pretrained model with model name specified by --model
  3. specify inference method with --format in value of pytorch or onnx
  4. insert & search embeddings with Milvus collection without index
  5. measure performance with accuracy at top 1, 5, 10
    1. vote for the prediction from topk search results (most frequent one)
    2. compare final prediction with ground truth
    3. calculate percent of correct predictions over all queries

Example Usage

# Option 1:
python performance.py --model MODEL_NAME --format pytorch
python performance.py --model MODEL_NAME --format onnx

# Option 2:
chmod +x performance.sh
./performance.sh

Pipeline Speed

QPS test of the embedding pipeline including steps below:

  1. load image from path (pipe.input)
  2. decode image into arrays (ops.image_decode)
  3. generate image embedding (preprocess, model inference, post-process)

There are 3 methods with different pipeline speeds:

  • Towhee pipe (regular method)
  • Onnxruntime (model inference using onnx at local)
  • TritonServe with onnx enabled (request as client)

Example usage

Please note that qps_test.py uses:

  • localhost:8000: to connect triton client
  • ../towhee/jpeg: as test image path
python qps_test.py --model 'resnet50' --pipe --onnx --triton --num 100 --device cuda:0

Args:

  • --model: mandatory, string, model name
  • --pipe: optional, on/off flag to enable qps test for pipe
  • --onnx: optional, on/off flag to enable qps test for onnx
  • --triton: optional, on/off flag to enable qps for triton (please make sure that triton client is ready)
  • --num: optional, integer, defaults to 100, batch size in each loop (10 loops in total)
  • --device: optional, string, defaults to 'cpu'

2.0 KiB

Evaluation Method

  • Model performance (WIP)
  • Pipeline speed

Model Performance

Build an image classification system based on similarity search across embeddings.

The core ideas in performance.py:

  1. create a new Milvus collection each time
  2. extract embeddings using a pretrained model with model name specified by --model
  3. specify inference method with --format in value of pytorch or onnx
  4. insert & search embeddings with Milvus collection without index
  5. measure performance with accuracy at top 1, 5, 10
    1. vote for the prediction from topk search results (most frequent one)
    2. compare final prediction with ground truth
    3. calculate percent of correct predictions over all queries

Example Usage

# Option 1:
python performance.py --model MODEL_NAME --format pytorch
python performance.py --model MODEL_NAME --format onnx

# Option 2:
chmod +x performance.sh
./performance.sh

Pipeline Speed

QPS test of the embedding pipeline including steps below:

  1. load image from path (pipe.input)
  2. decode image into arrays (ops.image_decode)
  3. generate image embedding (preprocess, model inference, post-process)

There are 3 methods with different pipeline speeds:

  • Towhee pipe (regular method)
  • Onnxruntime (model inference using onnx at local)
  • TritonServe with onnx enabled (request as client)

Example usage

Please note that qps_test.py uses:

  • localhost:8000: to connect triton client
  • ../towhee/jpeg: as test image path
python qps_test.py --model 'resnet50' --pipe --onnx --triton --num 100 --device cuda:0

Args:

  • --model: mandatory, string, model name
  • --pipe: optional, on/off flag to enable qps test for pipe
  • --onnx: optional, on/off flag to enable qps test for onnx
  • --triton: optional, on/off flag to enable qps for triton (please make sure that triton client is ready)
  • --num: optional, integer, defaults to 100, batch size in each loop (10 loops in total)
  • --device: optional, string, defaults to 'cpu'