The return value is a tuple of `(boxes, classes, scores)`. `boxes` is a list of bounding boxes. Each bounding box is represented as a 1-dimensional numpy array consisting of the top-left and the bottom-right corners, i.e. `numpy.ndarray([x1, y1, x2, y2])`. `classes` is a list of prediction labels for each bounding box. `scores` is a list of confidence scores corresponding to each class and bounding box.
The return value is a tuple of `(boxes, classes, scores)`. `boxes` is a list of bounding boxes. Each bounding box is represented as a 1-dimensional numpy array consisting of the top-left and the bottom-right corners, i.e. `numpy.ndarray([x1, y1, x2, y2])`. `classes` is a list of prediction labels for each bounding box. `scores` is a list of confidence scores corresponding to each class and bounding box.
# More Resources
- [Approximate Nearest Neighbors Oh Yeah (Annoy) - Zilliz blog](https://zilliz.com/learn/approximate-nearest-neighbor-oh-yeah-ANNOY): Discover the capabilities of Annoy, an innovative algorithm revolutionizing approximate nearest neighbor searches for enhanced efficiency and precision.
- [CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog](https://zilliz.com/learn/CLIP-object-detection-merge-AI-vision-with-language-understanding): CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
- [What is a Convolutional Neural Network? An Engineer's Guide](https://zilliz.com/glossary/convolutional-neural-network): Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
- [Using Vector Search to Better Understand Computer Vision Data - Zilliz blog](https://zilliz.com/blog/use-vector-search-to-better-understand-computer-vision-data): How Vector Search improves your understanding of Computer Vision Data
- [Understanding ImageNet: A Key Resource for Computer Vision and AI Research](https://zilliz.com/glossary/imagenet): The large-scale image database with over 14 million annotated images. Learn how this dataset supports advancements in computer vision.
- [What is Detection Transformers (DETR)? - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.
- [What is approximate nearest neighbor search (ANNS)?](https://zilliz.com/glossary/anns): Learn how to use Approximate nearest neighbor search (ANNS) for efficient nearest-neighbor search in large datasets.