data2vec
copied
1 changed files with 3 additions and 95 deletions
@ -1,98 +1,6 @@ |
|||
# Image Embedding with data2vec |
|||
|
|||
*author: David Wang* |
|||
|
|||
# More Resources |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Description |
|||
|
|||
This operator extracts features for image with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. |
|||
|
|||
<br /> |
|||
|
|||
|
|||
## Code Example |
|||
|
|||
Load an image from path './towhee.jpg' to generate an image embedding. |
|||
|
|||
*Write a pipeline with explicit inputs/outputs name specifications:* |
|||
|
|||
```python |
|||
from towhee import pipe, ops, DataCollection |
|||
|
|||
p = ( |
|||
pipe.input('path') |
|||
.map('path', 'img', ops.image_decode()) |
|||
.map('img', 'vec', ops.image_embedding.data2vec(model_name='facebook/data2vec-vision-base-ft1k')) |
|||
.output('img', 'vec') |
|||
) |
|||
|
|||
DataCollection(p('towhee.jpeg')).show() |
|||
``` |
|||
<img src="./result2.png" alt="result2" style="height:60px;"/> |
|||
|
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Factory Constructor |
|||
|
|||
Create the operator via the following factory method |
|||
|
|||
***data2vec(model_name='facebook/data2vec-vision-base')*** |
|||
|
|||
**Parameters:** |
|||
|
|||
|
|||
***model_name***: *str* |
|||
|
|||
The model name in string. |
|||
The default value is "facebook/data2vec-vision-base-ft1k". |
|||
|
|||
Supported model name: |
|||
- facebook/data2vec-vision-base-ft1k |
|||
- facebook/data2vec-vision-large-ft1k |
|||
|
|||
<br /> |
|||
|
|||
|
|||
|
|||
## Interface |
|||
|
|||
An image embedding operator takes a [towhee image](link/to/towhee/image/api/doc) as input. |
|||
It uses the pre-trained model specified by model name to generate an image embedding in ndarray. |
|||
|
|||
|
|||
**Parameters:** |
|||
|
|||
***img:*** *towhee.types.Image (a sub-class of numpy.ndarray)* |
|||
|
|||
The decoded image data in towhee.types.Image (numpy.ndarray). |
|||
|
|||
|
|||
|
|||
**Returns:** *numpy.ndarray* |
|||
|
|||
The image embedding extracted by model. |
|||
|
|||
|
|||
|
|||
|
|||
|
|||
# More Resources |
|||
|
|||
- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. |
|||
|
|||
At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output. |
|||
- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models. |
|||
- [Transforming Text: The Rise of Sentence Transformers in NLP - Zilliz blog](https://zilliz.com/learn/transforming-text-the-rise-of-sentence-transformers-in-nlp): Everything you need to know about the Transformers model, exploring its architecture, implementation, and limitations |
|||
- [What Are Vector Embeddings?](https://zilliz.com/glossary/vector-embeddings): Learn the definition of vector embeddings, how to create vector embeddings, and more. |
|||
- [What is Detection Transformers (DETR)? - Zilliz blog](https://zilliz.com/learn/detection-transformers-detr-end-to-end-object-detection-with-transformers): DETR (DEtection TRansformer) is a deep learning model for end-to-end object detection using transformers. |
|||
- [Image Embeddings for Enhanced Image Search - Zilliz blog](https://zilliz.com/learn/image-embeddings-for-enhanced-image-search): Image Embeddings are the core of modern computer vision algorithms. Understand their implementation and use cases and explore different image embedding models. |
|||
- [Enhancing Information Retrieval with Sparse Embeddings | Zilliz Learn - Zilliz blog](https://zilliz.com/learn/enhancing-information-retrieval-learned-sparse-embeddings): Explore the inner workings, advantages, and practical applications of learned sparse embeddings with the Milvus vector database |
|||
- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings. |
|||
|
Loading…
Reference in new issue