detectron2/README.md

# Object Detection using Detectron2

*author: [filip-halt](https://github.com/filip-halt), [fzliu](https://github.com/fzliu)*

<br />

## Description

This operator uses Facebook's [Detectron2](https://github.com/facebookresearch/detectron2) library to compute bounding boxes, class labels, and class scores for detected objects in a given image.

<br />

## Code Example

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'img', ops.image_decode())
        .map('img', ('boxes', 'classes', 'scores'), ops.object_detection.detectron2(model_name='retinanet_resnet50'))
        .output('img', 'boxes', 'classes', 'scores')
)

DataCollection(p('./example.jpg')).show()
```

<img src="./result.png" alt="result" height="140px"/>

## Factory Constructor

Create the operator via the following factory method

***object_detection.detectron2(model_name='retinanet_resnet50', thresh=0.5, num_classes=1000, skip_preprocess=False)***

**Parameters:**

***model_name:*** `str`

A string indicating which model to use. Available options:

1. `faster_rcnn_resnet50_c4`
2. `faster_rcnn_resnet50_dc5`
3. `faster_rcnn_resnet50_fpn`
4. `faster_rcnn_resnet101_c4`
5. `faster_rcnn_resnet101_dc5`
6. `faster_rcnn_resnet101_fpn`
7. `faster_rcnn_resnext101`
8. `retinanet_resnet50`
9. `retinanet_resnet101`

***thresh:*** `float`

The threshold value for which an object is detected (default value: `0.5`). Set this value lower to detect more objects at the expense of accuracy, or higher to reduce the total number of detections but increase the quality of detected objects.

### Interface

This operator takes an image as input. It first detects the objects appeared in the image, and generates a bounding box around each object.

**Parameters:**

  **img**: `towhee._types.Image`
    Image data wrapped in a (as a Towhee `Image`).

**Return**: `List[numpy.ndarray[4], ...], List[str], numpy.ndarray`

The return value is a tuple of `(boxes, classes, scores)`. `boxes` is a list of bounding boxes. Each bounding box is represented as a 1-dimensional numpy array consisting of the top-left and the bottom-right corners, i.e. `numpy.ndarray([x1, y1, x2, y2])`. `classes` is a list of prediction labels for each bounding box. `scores` is a list of confidence scores corresponding to each class and bounding box.


# More Resources

- [Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog](https://zilliz.com/learn/approximate-nearest-neighbor-oh-yeah-ANNOY): Discover the capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor searches for enhanced efficiency and precision.
- [CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog](https://zilliz.com/learn/CLIP-object-detection-merge-AI-vision-with-language-understanding): CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
- [What is a Convolutional Neural Network? An Engineer's Guide](https://zilliz.com/glossary/convolutional-neural-network): Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
- [Using Vector Search to Better Understand Computer Vision Data - Zilliz blog](https://zilliz.com/blog/use-vector-search-to-better-understand-computer-vision-data): How Vector Search improves your understanding of Computer Vision Data
- [Understanding ImageNet: A Key Resource for Computer Vision and AI Research](https://zilliz.com/glossary/imagenet): The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
- [What is Detection Transformers (DETR)?  - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.
- [What is approximate nearest neighbor search (ANNS)?](https://zilliz.com/glossary/anns): Learn how to use Approximate nearest neighbor search (ANNS) for efficient nearest-neighbor search in large datasets.
Update files 4 years ago			`# Object Detection using Detectron2`
Initial commit 4 years ago
Update files 4 years ago			`author: [filip-halt](https://github.com/filip-halt), [fzliu](https://github.com/fzliu)`
Update files 4 years ago
Update files 4 years ago			`<br />`
Update files 4 years ago
			`## Description`

			`This operator uses Facebook's [Detectron2](https://github.com/facebookresearch/detectron2) library to compute bounding boxes, class labels, and class scores for detected objects in a given image.`

Update files 4 years ago			`<br />`

Update files 4 years ago			`## Code Example`

			```python
remove dc2 Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`from towhee import pipe, ops, DataCollection`
Update files 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`p = (`
			`pipe.input('path')`
			`.map('path', 'img', ops.image_decode())`
			`.map('img', ('boxes', 'classes', 'scores'), ops.object_detection.detectron2(model_name='retinanet_resnet50'))`
			`.output('img', 'boxes', 'classes', 'scores')`
			`)`
Bugfix and README update 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`DataCollection(p('./example.jpg')).show()`
			```
Bugfix and README update 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`<img src="./result.png" alt="result" height="140px"/>`
Bugfix and README update 4 years ago
Update files 4 years ago			`## Factory Constructor`

			`Create the operator via the following factory method`

			`*object_detection.detectron2(model_name='retinanet_resnet50', thresh=0.5, num_classes=1000, skip_preprocess=False)*`

			`Parameters:`

Bugfix and README update 4 years ago			*model_name:* `str`
Update files 4 years ago
Bugfix and README update 4 years ago			`A string indicating which model to use. Available options:`
Update files 4 years ago
Bugfix and README update 4 years ago			1. `faster_rcnn_resnet50_c4`
			2. `faster_rcnn_resnet50_dc5`
			3. `faster_rcnn_resnet50_fpn`
			4. `faster_rcnn_resnet101_c4`
			5. `faster_rcnn_resnet101_dc5`
			6. `faster_rcnn_resnet101_fpn`
			7. `faster_rcnn_resnext101`
			8. `retinanet_resnet50`
			9. `retinanet_resnet101`

			*thresh:* `float`
Update files 4 years ago
			The threshold value for which an object is detected (default value: `0.5`). Set this value lower to detect more objects at the expense of accuracy, or higher to reduce the total number of detections but increase the quality of detected objects.
Bugfix and README update 4 years ago
			`### Interface`

			`This operator takes an image as input. It first detects the objects appeared in the image, and generates a bounding box around each object.`

			`Parameters:`

			img: `towhee._types.Image`
			Image data wrapped in a (as a Towhee `Image`).

			Return: `List[numpy.ndarray[4], ...], List[str], numpy.ndarray`

Minor update 4 years ago			The return value is a tuple of `(boxes, classes, scores)`. `boxes` is a list of bounding boxes. Each bounding box is represented as a 1-dimensional numpy array consisting of the top-left and the bottom-right corners, i.e. `numpy.ndarray([x1, y1, x2, y2])`. `classes` is a list of prediction labels for each bounding box. `scores` is a list of confidence scores corresponding to each class and bounding box.
Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago

			`# More Resources`

			`- [Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog](https://zilliz.com/learn/approximate-nearest-neighbor-oh-yeah-ANNOY): Discover the capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor searches for enhanced efficiency and precision.`
			`- [CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog](https://zilliz.com/learn/CLIP-object-detection-merge-AI-vision-with-language-understanding): CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.`
			`- [What is a Convolutional Neural Network? An Engineer's Guide](https://zilliz.com/glossary/convolutional-neural-network): Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.`
			`- [Using Vector Search to Better Understand Computer Vision Data - Zilliz blog](https://zilliz.com/blog/use-vector-search-to-better-understand-computer-vision-data): How Vector Search improves your understanding of Computer Vision Data`
			`- [Understanding ImageNet: A Key Resource for Computer Vision and AI Research](https://zilliz.com/glossary/imagenet): The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.`
			`- [What is Detection Transformers (DETR)? - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.`
			`- [What is approximate nearest neighbor search (ANNS)?](https://zilliz.com/glossary/anns): Learn how to use Approximate nearest neighbor search (ANNS) for efficient nearest-neighbor search in large datasets.`