Object Detection using Detectron2

Description

This operator uses Facebook's Detectron2 library to compute bounding boxes, class labels, and class scores for detected objects in a given image.

Code Example

from towhee import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'img', ops.image_decode())
        .map('img', ('boxes', 'classes', 'scores'), ops.object_detection.detectron2(model_name='retinanet_resnet50'))
        .output('img', 'boxes', 'classes', 'scores')
)

DataCollection(p('./example.jpg')).show()

Factory Constructor

Create the operator via the following factory method

object_detection.detectron2(model_name='retinanet_resnet50', thresh=0.5, num_classes=1000, skip_preprocess=False)

Parameters:

model_name: str

A string indicating which model to use. Available options:

faster_rcnn_resnet50_c4
faster_rcnn_resnet50_dc5
faster_rcnn_resnet50_fpn
faster_rcnn_resnet101_c4
faster_rcnn_resnet101_dc5
faster_rcnn_resnet101_fpn
faster_rcnn_resnext101
retinanet_resnet50
retinanet_resnet101

thresh: float

The threshold value for which an object is detected (default value: 0.5). Set this value lower to detect more objects at the expense of accuracy, or higher to reduce the total number of detections but increase the quality of detected objects.

Interface

This operator takes an image as input. It first detects the objects appeared in the image, and generates a bounding box around each object.

Parameters:

img: towhee._types.Image Image data wrapped in a (as a Towhee Image).

Return: List[numpy.ndarray[4], ...], List[str], numpy.ndarray

The return value is a tuple of (boxes, classes, scores). boxes is a list of bounding boxes. Each bounding box is represented as a 1-dimensional numpy array consisting of the top-left and the bottom-right corners, i.e. numpy.ndarray([x1, y1, x2, y2]). classes is a list of prediction labels for each bounding box. scores is a list of confidence scores corresponding to each class and bounding box.

More Resources

Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog: Discover the capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor searches for enhanced efficiency and precision.
CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog: CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
What is a Convolutional Neural Network? An Engineer's Guide: Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
Using Vector Search to Better Understand Computer Vision Data - Zilliz blog: How Vector Search improves your understanding of Computer Vision Data
Understanding ImageNet: A Key Resource for Computer Vision and AI Research: The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
What is Detection Transformers (DETR)? - Zilliz blog: DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.
What is approximate nearest neighbor search (ANNS)?: Learn how to use Approximate nearest neighbor search (ANNS) for efficient nearest-neighbor search in large datasets.

Jael Gu ccca60c898 Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			17 Commits
.gitattributes	840 B	Update files	3 years ago
.gitignore	3.0 KiB	Update files	3 years ago
README.md	4.1 KiB	Add more resources	11 months ago
__init__.py	108 B	fix detectron2 install issue	3 years ago
detectron.py	2.4 KiB	Support cuda	3 years ago
example.jpg	1.5 MiB	Bugfix and README update	3 years ago
requirements.txt	136 B	Update	3 years ago
result.png	103 KiB	Update	3 years ago