data2vec
copied
1 changed files with 3 additions and 95 deletions
@ -1,98 +1,6 @@ |
|||||
# Image Embedding with data2vec |
|
||||
|
|
||||
*author: David Wang* |
|
||||
|
|
||||
|
# More Resources |
||||
|
|
||||
<br /> |
|
||||
|
|
||||
|
|
||||
|
|
||||
## Description |
|
||||
|
|
||||
This operator extracts features for image with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. |
|
||||
|
|
||||
<br /> |
|
||||
|
|
||||
|
|
||||
## Code Example |
|
||||
|
|
||||
Load an image from path './towhee.jpg' to generate an image embedding. |
|
||||
|
|
||||
*Write a pipeline with explicit inputs/outputs name specifications:* |
|
||||
|
|
||||
```python |
|
||||
from towhee import pipe, ops, DataCollection |
|
||||
|
|
||||
p = ( |
|
||||
pipe.input('path') |
|
||||
.map('path', 'img', ops.image_decode()) |
|
||||
.map('img', 'vec', ops.image_embedding.data2vec(model_name='facebook/data2vec-vision-base-ft1k')) |
|
||||
.output('img', 'vec') |
|
||||
) |
|
||||
|
|
||||
DataCollection(p('towhee.jpeg')).show() |
|
||||
``` |
|
||||
<img src="./result2.png" alt="result2" style="height:60px;"/> |
|
||||
|
|
||||
|
|
||||
<br /> |
|
||||
|
|
||||
|
|
||||
|
|
||||
## Factory Constructor |
|
||||
|
|
||||
Create the operator via the following factory method |
|
||||
|
|
||||
***data2vec(model_name='facebook/data2vec-vision-base')*** |
|
||||
|
|
||||
**Parameters:** |
|
||||
|
|
||||
|
|
||||
***model_name***: *str* |
|
||||
|
|
||||
The model name in string. |
|
||||
The default value is "facebook/data2vec-vision-base-ft1k". |
|
||||
|
|
||||
Supported model name: |
|
||||
- facebook/data2vec-vision-base-ft1k |
|
||||
- facebook/data2vec-vision-large-ft1k |
|
||||
|
|
||||
<br /> |
|
||||
|
|
||||
|
|
||||
|
|
||||
## Interface |
|
||||
|
|
||||
An image embedding operator takes a [towhee image](link/to/towhee/image/api/doc) as input. |
|
||||
It uses the pre-trained model specified by model name to generate an image embedding in ndarray. |
|
||||
|
|
||||
|
|
||||
**Parameters:** |
|
||||
|
|
||||
***img:*** *towhee.types.Image (a sub-class of numpy.ndarray)* |
|
||||
|
|
||||
The decoded image data in towhee.types.Image (numpy.ndarray). |
|
||||
|
|
||||
|
|
||||
|
|
||||
**Returns:** *numpy.ndarray* |
|
||||
|
|
||||
The image embedding extracted by model. |
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
# More Resources |
|
||||
|
|
||||
- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. |
|
||||
|
|
||||
At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output. |
|
||||
- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models. |
|
||||
- [Transforming Text: The Rise of Sentence Transformers in NLP - Zilliz blog](https://zilliz.com/learn/transforming-text-the-rise-of-sentence-transformers-in-nlp): Everything you need to know about the Transformers model, exploring its architecture, implementation, and limitations |
|
||||
- [What Are Vector Embeddings?](https://zilliz.com/glossary/vector-embeddings): Learn the definition of vector embeddings, how to create vector embeddings, and more. |
|
||||
- [What is Detection Transformers (DETR)? - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers. |
|
||||
- [Image Embeddings for Enhanced Image Search - Zilliz blog](https://zilliz.com/learn/image-embeddings-for-enhanced-image-search): Image Embeddings are the core of modern computer vision algorithms. Understand their implementation and use cases and explore different image embedding models. |
|
||||
- [Enhancing Information Retrieval with Sparse Embeddings | Zilliz Learn - Zilliz blog](https://zilliz.com/learn/enhancing-information-retrieval-learned-sparse-embeddings): Explore the inner workings, advantages, and practical applications of learned sparse embeddings with the Milvus vector database |
|
||||
- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings. |
|
||||
|
|
Loading…
Reference in new issue