# Object Detection using Detectron2

*author: [filip-halt](https://github.com/filip-halt), [fzliu](https://github.com/fzliu)*

<br />

## Description

This operator uses Facebook's [Detectron2](https://github.com/facebookresearch/detectron2) library to compute bounding boxes, class labels, and class scores for detected objects in a given image.

<br />

## Code Example

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'img', ops.image_decode())
        .map('img', ('boxes', 'classes', 'scores'), ops.object_detection.detectron2(model_name='retinanet_resnet50'))
        .output('img', 'boxes', 'classes', 'scores')
)

DataCollection(p('./example.jpg')).show()
```

<img src="./result.png" alt="result" height="140px"/>

## Factory Constructor

Create the operator via the following factory method

***object_detection.detectron2(model_name='retinanet_resnet50', thresh=0.5, num_classes=1000, skip_preprocess=False)***

**Parameters:**

***model_name:*** `str`

A string indicating which model to use. Available options:

1. `faster_rcnn_resnet50_c4`
2. `faster_rcnn_resnet50_dc5`
3. `faster_rcnn_resnet50_fpn`
4. `faster_rcnn_resnet101_c4`
5. `faster_rcnn_resnet101_dc5`
6. `faster_rcnn_resnet101_fpn`
7. `faster_rcnn_resnext101`
8. `retinanet_resnet50`
9. `retinanet_resnet101`

***thresh:*** `float`

The threshold value for which an object is detected (default value: `0.5`). Set this value lower to detect more objects at the expense of accuracy, or higher to reduce the total number of detections but increase the quality of detected objects.

### Interface

This operator takes an image as input. It first detects the objects appeared in the image, and generates a bounding box around each object.

**Parameters:**

​  **img**: `towhee._types.Image`
    Image data wrapped in a (as a Towhee `Image`).

**Return**: `List[numpy.ndarray[4], ...], List[str], numpy.ndarray`

The return value is a tuple of `(boxes, classes, scores)`. `boxes` is a list of bounding boxes. Each bounding box is represented as a 1-dimensional numpy array consisting of the top-left and the bottom-right corners, i.e. `numpy.ndarray([x1, y1, x2, y2])`. `classes` is a list of prediction labels for each bounding box. `scores` is a list of confidence scores corresponding to each class and bounding box.


# More Resources

- [Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog](https://zilliz.com/learn/approximate-nearest-neighbor-oh-yeah-ANNOY): Discover the capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor searches for enhanced efficiency and precision.
- [CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog](https://zilliz.com/learn/CLIP-object-detection-merge-AI-vision-with-language-understanding): CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
- [What is a Convolutional Neural Network? An Engineer's Guide](https://zilliz.com/glossary/convolutional-neural-network): Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
- [Using Vector Search to Better Understand Computer Vision Data - Zilliz blog](https://zilliz.com/blog/use-vector-search-to-better-understand-computer-vision-data): How Vector Search improves your understanding of Computer Vision Data
- [Understanding ImageNet: A Key Resource for Computer Vision and AI Research](https://zilliz.com/glossary/imagenet): The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
- [What is Detection Transformers (DETR)?  - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.
- [What is approximate nearest neighbor search (ANNS)?](https://zilliz.com/glossary/anns): Learn how to use Approximate nearest neighbor search (ANNS) for efficient nearest-neighbor search in large datasets.