vggish/README.md

# Audio Embedding with Vggish

*Author: [Jael Gu](https://github.com/jaelgu)*

<br />

## Description

The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
Each vector represents for an audio clip with a fixed length of around 0.9s.
This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch.
The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).
As suggested, it is suitable to extract features at high level or warm up a larger model.

<br />

## Code Example

Generate embeddings for the audio "test.wav".

*Write a same pipeline with explicit inputs/outputs name specifications:*

```python
from towhee import pipe, ops

p = (
      pipe.input('path')
          .map('path', 'frame', ops.audio_decode.ffmpeg())
          .map('frame', 'vecs', ops.audio_embedding.vggish())
          .output('vecs')
)

p('test.wav').get()[0]
```
    | [-0.4931737, -0.40068552, -0.032327592, ...] shape=(10, 128) |


<br />

## Factory Constructor

Create the operator via the following factory method

***audio_embedding.vggish(weights_path=None, framework="pytorch")***

**Parameters:**

*weights_path: str*

The path to model weights. If None, it will load default model weights.

*framework: str*

The framework of model implementation.
Default value is "pytorch" since the model is implemented in Pytorch.

<br />

## Interface

An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.

**Parameters:**

*data: List[towhee.types.audio_frame.AudioFrame]*

Input audio data is a list of towhee audio frames.
The input data should represent for an audio longer than 0.9s.


**Returns**:

*numpy.ndarray*

Audio embeddings in shape (num_clips, 128).
Each embedding stands for features of an audio clip with length of 0.9s.


# More Resources

- [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus.
- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
- [Audio Retrieval Based on Milvus - Zilliz blog](https://zilliz.com/blog/audio-retrieval-based-on-milvus): Create an audio retrieval system using Milvus, an open-source vector database. Classify and analyze sound data in real time.
- [Evaluating Your Embedding Model - Zilliz blog](https://zilliz.com/learn/evaluating-your-embedding-model): Review some practical examples to evaluate different text embedding models.
- [Vector Database Use Case: Audio Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/audio-similarity-search): Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus).
- [Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar](https://zilliz.com/event/sparse-and-dense-embeddings-webinar): Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
- [Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar](https://zilliz.com/event/sparse-and-dense-embeddings-webinar/success): Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`# Audio Embedding with Vggish`

Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Author: [Jael Gu](https://github.com/jaelgu)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Description`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
			`The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Each vector represents for an audio clip with a fixed length of around 0.9s.`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch.`
			`The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`As suggested, it is suitable to extract features at high level or warm up a larger model.`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Code Example`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Generate embeddings for the audio "test.wav".`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`Write a same pipeline with explicit inputs/outputs name specifications:`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			```python
Update readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago			`from towhee import pipe, ops`

			`p = (`
			`pipe.input('path')`
			`.map('path', 'frame', ops.audio_decode.ffmpeg())`
			`.map('frame', 'vecs', ops.audio_embedding.vggish())`
			`.output('vecs')`
Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`)`
Update readme Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 3 years ago
			`p('test.wav').get()[0]`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			```
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`\| [-0.4931737, -0.40068552, -0.032327592, ...] shape=(10, 128) \|`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Factory Constructor`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Create the operator via the following factory method`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`*audio_embedding.vggish(weights_path=None, framework="pytorch")*`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
			`Parameters:`

Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`weights_path: str`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The path to model weights. If None, it will load default model weights.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`framework: str`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The framework of model implementation.`
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Default value is "pytorch" since the model is implemented in Pytorch.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`<br />`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`## Interface`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Parameters:`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`data: List[towhee.types.audio_frame.AudioFrame]`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Adapt audio-decode/ffmpeg Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Input audio data is a list of towhee audio frames.`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`The input data should represent for an audio longer than 0.9s.`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago

Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Returns:`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`numpy.ndarray`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Audio embeddings in shape (num_clips, 128).`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Each embedding stands for features of an audio clip with length of 0.9s.`
Add more resources Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 1 year ago

			`# More Resources`

			`- [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus.`
			`- [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.`
			`- [Audio Retrieval Based on Milvus - Zilliz blog](https://zilliz.com/blog/audio-retrieval-based-on-milvus): Create an audio retrieval system using Milvus, an open-source vector database. Classify and analyze sound data in real time.`
			`- [Evaluating Your Embedding Model - Zilliz blog](https://zilliz.com/learn/evaluating-your-embedding-model): Review some practical examples to evaluate different text embedding models.`
			`- [Vector Database Use Case: Audio Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/audio-similarity-search): Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus).`
			`- [Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus \| Zilliz Webinar](https://zilliz.com/event/sparse-and-dense-embeddings-webinar): Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.`
			`- [Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus \| Zilliz Webinar](https://zilliz.com/event/sparse-and-dense-embeddings-webinar/success): Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.`
			`- [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.`