logo
Browse Source

Update

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
main
Jael Gu 2 years ago
parent
commit
5941e632a6
  1. 79
      README.md
  2. 3
      vggish.py
  3. 4
      vggish_input.py

79
README.md

@ -6,72 +6,73 @@
## Desription
The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics.
This operator is built on top of the VGGish model with Pytorch.
It is originally implemented in [Tensorflow](https://github.com/tensorflow/models/tree/master/research/audioset/vggish).
The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).
This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch.
The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset).
As suggested, it is suitable to extract features at high level or warm up a larger model.
```python
import numpy as np
from towhee import ops
## Code Example
audio_encoder = ops.audio_embedding.vggish()
Generate embeddings for the audio "test.wav".
# Path or url as input
audio_embedding = audio_encoder("/audio/path/or/url/")
*Write the pipeline in simplified style*:
# Audio data as input
audio_data = np.zeros((2, 441344))
sample_rate = 44100
audio_embedding = audio_encoder(audio_data, sample_rate)
```
```python
from towhee import dc
## Factory Constructor
dc.glob('test.wav')
.audio_decode()
.time_window(range=30)
.audio_embedding.vggish()
.show()
```
Create the operator via the following factory method
*Write a same pipeline with explicit inputs/outputs name specifications:*
***ops.audio_embedding.vggish()***
```python
from towhee import dc
dc.glob['path']('test.wav')
.audio_decode['path', 'audio']()
.time_window['audio', 'frames'](range=30)
.audio_embedding.vggish['frames', 'vecs']()
.select('vecs')
.show()
```
## Interface
## Factory Constructor
An audio embedding operator generates vectors in numpy.ndarray given an audio file path.
Create the operator via the following factory method
***audio_embedding.vggish(weights_path=None, framework="pytorch")***
**Parameters:**
None.
*weights_path: str*
​ The path to model weights. If None, it will load default model weights.
**Returns**: *numpy.ndarray*
​ *framework: str*
​ Audio embeddings in shape (num_clips, 128).
​ The framework of model implementation.
Default value is "pytorch" since the model is implemented in Pytorch.
## Interface
An audio embedding operator generates vectors in numpy.ndarray given an audio file path or a [towhee audio](link/to/AudioFrame/api/doc).
## Code Example
Generate embeddings for the audio "test.wav".
**Parameters:**
*Write the pipeline in simplified style*:
​ *Union[str, towhee.types.Audio]*
```python
from towhee import dc
​ The audio path or link in string.
Or audio input data in towhee audio frames.
dc.glob('test.wav')
.audio_embedding.vggish()
.show()
```
*Write a same pipeline with explicit inputs/outputs name specifications:*
**Returns**:
```python
from towhee import dc
​ *numpy.ndarray*
dc.glob['path']('test.wav')
.audio_embedding.vggish['path', 'vecs']()
.select('vecs')
.show()
```
​ Audio embeddings in shape (num_clips, 128).

3
vggish.py

@ -27,8 +27,7 @@ from towhee.operator.base import NNOperator
from towhee.models.vggish.torch_vggish import VGG
from towhee import register
sys.path.append(str(Path(__file__).parent))
import vggish_input
import .vggish_input
warnings.filterwarnings('ignore')
log = logging.getLogger()

4
vggish_input.py

@ -21,8 +21,8 @@ import torch
import numpy as np
import resampy
import mel_features
import vggish_params
import .mel_features
import .vggish_params
import torchaudio

Loading…
Cancel
Save