@ -76,6 +76,8 @@ The return value is a tuple of (boxes, classes, scores). The *boxes* is a list o
# More Resources
- [CLIP Object Detection: Merging AI Vision with Language Understanding - Zilliz blog](https://zilliz.com/learn/CLIP-object-detection-merge-AI-vision-with-language-understanding): CLIP Object Detection combines CLIP's text-image understanding with object detection tasks, allowing CLIP to locate and identify objects in images using texts.
@ -85,4 +87,3 @@ The return value is a tuple of (boxes, classes, scores). The *boxes* is a list o
- [Using Vector Search to Better Understand Computer Vision Data - Zilliz blog](https://zilliz.com/blog/use-vector-search-to-better-understand-computer-vision-data): How Vector Search improves your understanding of Computer Vision Data
- [What are Vision Transformers (ViT)? - Zilliz blog](https://zilliz.com/learn/understanding-vision-transformers-vit): Vision Transformers (ViTs) are neural network models that use transformers to perform computer vision tasks like object detection and image classification.
- [What is Detection Transformers (DETR)? - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers.