vggish/README.md

# Audio Embedding with Vggish

*Author: [Jael Gu](https://github.com/jaelgu)*

<br />

## Description

The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
Each vector represents for an audio clip with a fixed length of around 0.9s.
This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch.
The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).
As suggested, it is suitable to extract features at high level or warm up a larger model.

<br />

## Code Example

Generate embeddings for the audio "test.wav".

*Write a pipeline with explicit inputs/outputs name specifications:*

```python
from towhee import pipe, ops, DataCollection

p = (
    pipe.input('path')
        .map('path', 'frame', ops.audio_decode.ffmpeg())
        .map('frame', 'vecs', ops.audio_embedding.vggish())
        .output('path', 'vecs')
)

DataCollection(p('./test.wav')).show()
```

<img src="./result.png" width="800px"/>

<br />

## Factory Constructor

Create the operator via the following factory method

***audio_embedding.vggish(weights_path=None, framework="pytorch")***

**Parameters:**

*weights_path: str*

The path to model weights. If None, it will load default model weights.

*framework: str*

The framework of model implementation.
Default value is "pytorch" since the model is implemented in Pytorch.

<br />

## Interface

An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.

**Parameters:**

*data: List[towhee.types.audio_frame.AudioFrame]*

Input audio data is a list of towhee audio frames.
The input data should represent for an audio longer than 0.9s.


**Returns**:

*numpy.ndarray*

Audio embeddings in shape (num_clips, 128).
Each embedding stands for features of an audio clip with length of 0.9s.
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`# Audio Embedding with Vggish`

Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Author: [Jael Gu](https://github.com/jaelgu)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`<br />`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`## Description`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
			`The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Each vector represents for an audio clip with a fixed length of around 0.9s.`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch.`
			`The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`As suggested, it is suitable to extract features at high level or warm up a larger model.`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`<br />`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`## Code Example`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Generate embeddings for the audio "test.wav".`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README for new pipe Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 1 year ago			`Write a pipeline with explicit inputs/outputs name specifications:`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			```python
Remove dc2 Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 1 year ago			`from towhee import pipe, ops, DataCollection`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README for new pipe Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 1 year ago			`p = (`
			`pipe.input('path')`
			`.map('path', 'frame', ops.audio_decode.ffmpeg())`
			`.map('frame', 'vecs', ops.audio_embedding.vggish())`
			`.output('path', 'vecs')`
Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`)`
Update README for new pipe Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 1 year ago
			`DataCollection(p('./test.wav')).show()`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			```
Update README for new pipe Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 1 year ago
			`<img src="./result.png" width="800px"/>`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`<br />`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`## Factory Constructor`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Create the operator via the following factory method`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`*audio_embedding.vggish(weights_path=None, framework="pytorch")*`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
			`Parameters:`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`weights_path: str`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`The path to model weights. If None, it will load default model weights.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`framework: str`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`The framework of model implementation.`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Default value is "pytorch" since the model is implemented in Pytorch.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`<br />`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`## Interface`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Parameters:`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`data: List[towhee.types.audio_frame.AudioFrame]`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Input audio data is a list of towhee audio frames.`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`The input data should represent for an audio longer than 0.9s.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Returns:`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`numpy.ndarray`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Audio embeddings in shape (num_clips, 128).`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 2 years ago			`Each embedding stands for features of an audio clip with length of 0.9s.`