|
|
|
# Audio Embedding with Vggish
|
|
|
|
|
|
|
|
*Author: Jael Gu*
|
|
|
|
|
|
|
|
|
|
|
|
## Desription
|
|
|
|
|
|
|
|
The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
|
|
|
|
This operator is built on top of the VGGish model with Pytorch.
|
|
|
|
It is originally implemented in [Tensorflow](https://github.com/tensorflow/models/tree/master/research/audioset/vggish).
|
|
|
|
The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).
|
|
|
|
As suggested, it is suitable to extract features at high level or warm up a larger model.
|
|
|
|
|
|
|
|
```python
|
|
|
|
import numpy as np
|
|
|
|
from towhee import ops
|
|
|
|
|
|
|
|
audio_encoder = ops.audio_embedding.vggish()
|
|
|
|
|
|
|
|
# Path or url as input
|
|
|
|
audio_embedding = audio_encoder("/audio/path/or/url/")
|
|
|
|
|
|
|
|
# Audio data as input
|
|
|
|
audio_data = np.zeros((441344, 2))
|
|
|
|
sample_rate = 44100
|
|
|
|
audio_embedding = audio_encoder(audio_data, sample_rate)
|
|
|
|
```
|
|
|
|
|
|
|
|
## Factory Constructor
|
|
|
|
|
|
|
|
Create the operator via the following factory method
|
|
|
|
|
|
|
|
***ops.audio_embedding.vggish()***
|
|
|
|
|
|
|
|
|
|
|
|
## Interface
|
|
|
|
|
|
|
|
An audio embedding operator generates vectors in numpy.ndarray given an audio file path.
|
|
|
|
|
|
|
|
|
|
|
|
**Parameters:**
|
|
|
|
|
|
|
|
None.
|
|
|
|
|
|
|
|
|
|
|
|
**Returns**: *numpy.ndarray*
|
|
|
|
|
|
|
|
Audio embeddings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Code Example
|
|
|
|
|
|
|
|
Generate embeddings for the audio "test.wav".
|
|
|
|
|
|
|
|
*Write the pipeline in simplified style*:
|
|
|
|
|
|
|
|
```python
|
|
|
|
from towhee import dc
|
|
|
|
|
|
|
|
dc.glob('test.wav')
|
|
|
|
.audio_embedding.vggish()
|
|
|
|
.show()
|
|
|
|
```
|
|
|
|
|
|
|
|
*Write a same pipeline with explicit inputs/outputs name specifications:*
|
|
|
|
|
|
|
|
```python
|
|
|
|
from towhee import dc
|
|
|
|
|
|
|
|
dc.glob['path']('test.wav')
|
|
|
|
.audio_embedding.vggish['path', 'vecs']()
|
|
|
|
.select('vecs')
|
|
|
|
.show()
|
|
|
|
```
|
|
|
|
|
|
|
|
|