vggish/README.md

# Audio Embedding with Vggish

*Author: Jael Gu*


## Desription

The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
This operator is built on top of the VGGish model with Pytorch.
It is originally implemented in [Tensorflow](https://github.com/tensorflow/models/tree/master/research/audioset/vggish).
The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).
As suggested, it is suitable to extract features at high level or warm up a larger model.

```python
import numpy as np
from towhee import ops

audio_encoder = ops.audio_embedding.vggish()

# Path or url as input
audio_embedding = audio_encoder("/audio/path/or/url/")

# Audio data as input
audio_data = np.zeros((441344, 2))
sample_rate = 44100
audio_embedding = audio_encoder(audio_data, sample_rate)
```

## Factory Constructor

Create the operator via the following factory method

***ops.audio_embedding.vggish()***


## Interface

An audio embedding operator generates vectors in numpy.ndarray given an audio file path.


**Parameters:**

	None.


**Returns**: *numpy.ndarray*

	Audio embeddings.


## Code Example

Generate embeddings for the audio "test.wav". 

 *Write the pipeline in simplified style*:

```python
from towhee import dc

dc.glob('test.wav')
  .audio_embedding.vggish()
  .show()
```

*Write a same pipeline with explicit inputs/outputs name specifications:*

```python
from towhee import dc

dc.glob['path']('test.wav')
  .audio_embedding.vggish['path', 'vecs']()
  .select('vecs')
  .show()
```
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`# Audio Embedding with Vggish`

			`Author: Jael Gu`


			`## Desription`

			`The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.`
			`This operator is built on top of the VGGish model with Pytorch.`
			`It is originally implemented in [Tensorflow](https://github.com/tensorflow/models/tree/master/research/audioset/vggish).`
			`The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).`
			`As suggested, it is suitable to extract features at high level or warm up a larger model.`

			```python
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`import numpy as np`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`from towhee import ops`

			`audio_encoder = ops.audio_embedding.vggish()`
Update README Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago
			`# Path or url as input`
			`audio_embedding = audio_encoder("/audio/path/or/url/")`

			`# Audio data as input`
			`audio_data = np.zeros((441344, 2))`
			`sample_rate = 44100`
			`audio_embedding = audio_encoder(audio_data, sample_rate)`
Refactor Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			```

			`## Factory Constructor`

			`Create the operator via the following factory method`

			`*ops.audio_embedding.vggish()*`


			`## Interface`

			`An audio embedding operator generates vectors in numpy.ndarray given an audio file path.`


			`Parameters:`

			`None.`


			`Returns: numpy.ndarray`

			`Audio embeddings.`



			`## Code Example`

			`Generate embeddings for the audio "test.wav".`

			`Write the pipeline in simplified style:`

			```python
			`from towhee import dc`

			`dc.glob('test.wav')`
			`.audio_embedding.vggish()`
			`.show()`
			```

			`Write a same pipeline with explicit inputs/outputs name specifications:`

			```python
			`from towhee import dc`

			`dc.glob['path']('test.wav')`
			`.audio_embedding.vggish['path', 'vecs']()`
			`.select('vecs')`
			`.show()`
			```

Initial commit 4 years ago