data2vec/README.md

# Audio Embedding with data2vec

*author: David Wang*


<br />


## Description

This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

<br />


## Code Example

Generate embeddings for the audio "test.wav".

*Write a pipeline with explicit inputs/outputs name specifications:*

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'frame', ops.audio_decode.ffmpeg(sample_rate=16000))
        .map('frame', 'vecs', ops.audio_embedding.data2vec(model_name='facebook/data2vec-audio-base-960h'))
        .output('path', 'vecs')
)

DataCollection(p('test.wav')).show()
```

<img src="./result.png" width="800px"/>

<br />


## Factory Constructor

Create the operator via the following factory method

***data2vec(model_name='facebook/data2vec-audio-base')***

**Parameters:**


  ***model_name***: *str*

The model name in string.
The default value is "facebook/data2vec-audio-base-960h".

Supported model name:
-
- facebook/data2vec-audio-base-960h
- facebook/data2vec-audio-large-960h
- facebook/data2vec-audio-base
- facebook/data2vec-audio-base-100h
- facebook/data2vec-audio-base-10m
- facebook/data2vec-audio-large
- facebook/data2vec-audio-large-100h
- facebook/data2vec-audio-large-10m

<br />


## Interface

An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.


**Parameters:**

	***data:*** *List[towhee.types.audio_frame.AudioFrame]*

	Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.

**Returns:** *numpy.ndarray*

   The audio embedding extracted by model.


# More Resources

- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. 

At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.
- [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus.
- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
- [What is BERT (Bidirectional Encoder Representations from Transformers)? - Zilliz blog](https://zilliz.com/learn/what-is-bert): Learn what Bidirectional Encoder Representations from Transformers (BERT) is and how it uses pre-training and fine-tuning to achieve its remarkable performance.
- [Transforming Text: The Rise of Sentence Transformers in NLP - Zilliz blog](https://zilliz.com/learn/transforming-text-the-rise-of-sentence-transformers-in-nlp): Everything you need to know about the Transformers model, exploring its architecture, implementation, and limitations
- [What Are Vector Embeddings?](https://zilliz.com/glossary/vector-embeddings): Learn the definition of vector embeddings, how to create vector embeddings, and more.
- [Vector Database Use Case: Audio Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/audio-similarity-search): Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus).
- [Enhancing Information Retrieval with Sparse Embeddings | Zilliz Learn - Zilliz blog](https://zilliz.com/learn/enhancing-information-retrieval-learned-sparse-embeddings): Explore the inner workings, advantages, and practical applications of learned sparse embeddings with the Milvus vector database
- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.
typo fix. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`# Audio Embedding with data2vec`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`author: David Wang`


			`<br />`



			`## Description`

			`This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.`

			`<br />`


			`## Code Example`

			`Generate embeddings for the audio "test.wav".`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Write a pipeline with explicit inputs/outputs name specifications:`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			```python
Remove dc2 Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`from towhee import pipe, ops, DataCollection`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`p = (`
			`pipe.input('path')`
			`.map('path', 'frame', ops.audio_decode.ffmpeg(sample_rate=16000))`
			`.map('frame', 'vecs', ops.audio_embedding.data2vec(model_name='facebook/data2vec-audio-base-960h'))`
			`.output('path', 'vecs')`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`)`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
			`DataCollection(p('test.wav')).show()`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			```

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`<img src="./result.png" width="800px"/>`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`<br />`



			`## Factory Constructor`

			`Create the operator via the following factory method`

change data2vec_audio to data2vec. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`*data2vec(model_name='facebook/data2vec-audio-base')*`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`Parameters:`


			`*model_name: str*`

			`The model name in string.`
			`The default value is "facebook/data2vec-audio-base-960h".`

			`Supported model name:`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`-`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`- facebook/data2vec-audio-base-960h`
			`- facebook/data2vec-audio-large-960h`
			`- facebook/data2vec-audio-base`
			`- facebook/data2vec-audio-base-100h`
			`- facebook/data2vec-audio-base-10m`
			`- facebook/data2vec-audio-large`
			`- facebook/data2vec-audio-large-100h`
			`- facebook/data2vec-audio-large-10m`

			`<br />`



			`## Interface`

			`An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.`


			`Parameters:`

			`*data:* List[towhee.types.audio_frame.AudioFrame]`

			`Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.`

			`Returns: numpy.ndarray`

			`The audio embedding extracted by model.`
Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 10 months ago

			`# More Resources`

			- [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations.

			`At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.`
			`- [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus.`
			`- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.`
			`- [What is BERT (Bidirectional Encoder Representations from Transformers)? - Zilliz blog](https://zilliz.com/learn/what-is-bert): Learn what Bidirectional Encoder Representations from Transformers (BERT) is and how it uses pre-training and fine-tuning to achieve its remarkable performance.`
			`- [Transforming Text: The Rise of Sentence Transformers in NLP - Zilliz blog](https://zilliz.com/learn/transforming-text-the-rise-of-sentence-transformers-in-nlp): Everything you need to know about the Transformers model, exploring its architecture, implementation, and limitations`
			`- [What Are Vector Embeddings?](https://zilliz.com/glossary/vector-embeddings): Learn the definition of vector embeddings, how to create vector embeddings, and more.`
			`- [Vector Database Use Case: Audio Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/audio-similarity-search): Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus).`
			`- [Enhancing Information Retrieval with Sparse Embeddings \| Zilliz Learn - Zilliz blog](https://zilliz.com/learn/enhancing-information-retrieval-learned-sparse-embeddings): Explore the inner workings, advantages, and practical applications of learned sparse embeddings with the Milvus vector database`
			`- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.`