data2vec/README.md

# Audio Embedding with data2vec

*author: David Wang*


<br />


## Description

This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

<br />


## Code Example

Generate embeddings for the audio "test.wav".

*Write a pipeline with explicit inputs/outputs name specifications:*

```python
from towhee.dc2 import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'frame', ops.audio_decode.ffmpeg(sample_rate=16000))
        .map('frame', 'vecs', ops.audio_embedding.data2vec(model_name='facebook/data2vec-audio-base-960h'))
        .output('path', 'vecs')
)

DataCollection(p('test.wav')).show()
```

<img src="./result.png" width="800px"/>

<br />


## Factory Constructor

Create the operator via the following factory method

***data2vec(model_name='facebook/data2vec-audio-base')***

**Parameters:**


  ***model_name***: *str*

The model name in string.
The default value is "facebook/data2vec-audio-base-960h".

Supported model name:
-
- facebook/data2vec-audio-base-960h
- facebook/data2vec-audio-large-960h
- facebook/data2vec-audio-base
- facebook/data2vec-audio-base-100h
- facebook/data2vec-audio-base-10m
- facebook/data2vec-audio-large
- facebook/data2vec-audio-large-100h
- facebook/data2vec-audio-large-10m

<br />


## Interface

An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.


**Parameters:**

	***data:*** *List[towhee.types.audio_frame.AudioFrame]*

	Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.

**Returns:** *numpy.ndarray*

   The audio embedding extracted by model.
typo fix. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`# Audio Embedding with data2vec`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`author: David Wang`


			`<br />`



			`## Description`

			`This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.`

			`<br />`


			`## Code Example`

			`Generate embeddings for the audio "test.wav".`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Write a pipeline with explicit inputs/outputs name specifications:`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			```python
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`from towhee.dc2 import pipe, ops, DataCollection`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`p = (`
			`pipe.input('path')`
			`.map('path', 'frame', ops.audio_decode.ffmpeg(sample_rate=16000))`
			`.map('frame', 'vecs', ops.audio_embedding.data2vec(model_name='facebook/data2vec-audio-base-960h'))`
			`.output('path', 'vecs')`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`)`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
			`DataCollection(p('test.wav')).show()`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			```

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`<img src="./result.png" width="800px"/>`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`<br />`



			`## Factory Constructor`

			`Create the operator via the following factory method`

change data2vec_audio to data2vec. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`*data2vec(model_name='facebook/data2vec-audio-base')*`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`Parameters:`


			`*model_name: str*`

			`The model name in string.`
			`The default value is "facebook/data2vec-audio-base-960h".`

			`Supported model name:`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`-`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`- facebook/data2vec-audio-base-960h`
			`- facebook/data2vec-audio-large-960h`
			`- facebook/data2vec-audio-base`
			`- facebook/data2vec-audio-base-100h`
			`- facebook/data2vec-audio-base-10m`
			`- facebook/data2vec-audio-large`
			`- facebook/data2vec-audio-large-100h`
			`- facebook/data2vec-audio-large-10m`

			`<br />`



			`## Interface`

			`An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.`


			`Parameters:`

			`*data:* List[towhee.types.audio_frame.AudioFrame]`

			`Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.`

			`Returns: numpy.ndarray`

			`The audio embedding extracted by model.`