@ -75,3 +75,14 @@ The operator takes an image as input. It first detects the objects appeared in t
The return value is a tuple of (boxes, classes, scores). The *boxes* is a list of bounding boxes. Each bounding box is represented by the top-left and the bottom right points, i.e. (x1, y1, x2, y2). The *classes* is a list of prediction labels. The *scores* is a list of the confidence scores.
# More Resources
- [CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog](https://zilliz.com/learn/CLIP-object-detection-merge-AI-vision-with-language-understanding): CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
- [Computer Vision with FiftyOne | Milvus & Zilliz Cloud](https://zilliz.com/product/integrations/FiftyOne): nan
- [What is a Convolutional Neural Network? An Engineer's Guide](https://zilliz.com/glossary/convolutional-neural-network): Convolutional Neural Network is a type of deep neural network that processes images, speeches, and videos. Let's find out more about CNN.
- [Understanding Computer Vision - Zilliz blog](https://zilliz.com/learn/what-is-computer-vision): Computer Vision is a field of Artificial Intelligence that enables machines to capture and interpret visual information from the world just like humans do.
- [Using Vector Search to Better Understand Computer Vision Data - Zilliz blog](https://zilliz.com/blog/use-vector-search-to-better-understand-computer-vision-data): How Vector Search improves your understanding of Computer Vision Data
- [What are Vision Transformers (ViT)? - Zilliz blog](https://zilliz.com/learn/understanding-vision-transformers-vit): Vision Transformers (ViTs) are neural network models that use transformers to perform computer vision tasks like object detection and image classification.
- [What is Detection Transformers (DETR)? - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.