data2vec/README.md

# Audio Embedding with data2vec

*author: David Wang*


<br />


## Description

This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.

<br />


## Code Example

Generate embeddings for the audio "test.wav".


 *Write the pipeline in simplified style*:

```python
import towhee

(
    towhee.glob('test.wav')
          .audio_decode.ffmpeg()
          .runas_op(func=lambda x:[y[0] for y in x])
          .audio_embedding.data2vec()
          .show()
)

```

*Write a same pipeline with explicit inputs/outputs name specifications:*

```python
import towhee

(
    towhee.glob['path']('test.wav')
          .audio_decode.ffmpeg['path', 'frames']()
          .runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x])
          .audio_embedding.data2vec['frames', 'vecs'](model_name="facebook/data2vec-audio-base-960h")
          .select['path', 'vecs']()
          .show()
)
```


<br />


## Factory Constructor

Create the operator via the following factory method

***data2vec(model_name='facebook/data2vec-audio-base')***

**Parameters:**


  ***model_name***: *str*

The model name in string.
The default value is "facebook/data2vec-audio-base-960h".

Supported model name:
-
- facebook/data2vec-audio-base-960h
- facebook/data2vec-audio-large-960h
- facebook/data2vec-audio-base
- facebook/data2vec-audio-base-100h
- facebook/data2vec-audio-base-10m
- facebook/data2vec-audio-large
- facebook/data2vec-audio-large-100h
- facebook/data2vec-audio-large-10m

<br />


## Interface

An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.


**Parameters:**

	***data:*** *List[towhee.types.audio_frame.AudioFrame]*

	Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.

**Returns:** *numpy.ndarray*

   The audio embedding extracted by model.
typo fix. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`# Audio Embedding with data2vec`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`author: David Wang`


			`<br />`



			`## Description`

			`This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.`

			`<br />`


			`## Code Example`

			`Generate embeddings for the audio "test.wav".`


			`Write the pipeline in simplified style:`

			```python
			`import towhee`

			`(`
			`towhee.glob('test.wav')`
			`.audio_decode.ffmpeg()`
			`.runas_op(func=lambda x:[y[0] for y in x])`
update the readme. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`.audio_embedding.data2vec()`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`.show()`
			`)`

			```

			`Write a same pipeline with explicit inputs/outputs name specifications:`

			```python
			`import towhee`

			`(`
			`towhee.glob['path']('test.wav')`
			`.audio_decode.ffmpeg['path', 'frames']()`
			`.runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x])`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`.audio_embedding.data2vec['frames', 'vecs'](model_name="facebook/data2vec-audio-base-960h")`
			`.select['path', 'vecs']()`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`.show()`
			`)`
			```


			`<br />`



			`## Factory Constructor`

			`Create the operator via the following factory method`

change data2vec_audio to data2vec. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`*data2vec(model_name='facebook/data2vec-audio-base')*`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago
			`Parameters:`


			`*model_name: str*`

			`The model name in string.`
			`The default value is "facebook/data2vec-audio-base-960h".`

			`Supported model name:`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`-`
init the operator. Signed-off-by: wxywb <xy.wang@zilliz.com> 3 years ago			`- facebook/data2vec-audio-base-960h`
			`- facebook/data2vec-audio-large-960h`
			`- facebook/data2vec-audio-base`
			`- facebook/data2vec-audio-base-100h`
			`- facebook/data2vec-audio-base-10m`
			`- facebook/data2vec-audio-large`
			`- facebook/data2vec-audio-large-100h`
			`- facebook/data2vec-audio-large-10m`

			`<br />`



			`## Interface`

			`An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames.`


			`Parameters:`

			`*data:* List[towhee.types.audio_frame.AudioFrame]`

			`Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s.`

			`Returns: numpy.ndarray`

			`The audio embedding extracted by model.`