diff --git a/README.md b/README.md index 1a1ae63..88f2a39 100644 --- a/README.md +++ b/README.md @@ -6,72 +6,73 @@ ## Desription The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics. -This operator is built on top of the VGGish model with Pytorch. -It is originally implemented in [Tensorflow](https://github.com/tensorflow/models/tree/master/research/audioset/vggish). -The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). +This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch. +The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, it is suitable to extract features at high level or warm up a larger model. -```python -import numpy as np -from towhee import ops +## Code Example -audio_encoder = ops.audio_embedding.vggish() +Generate embeddings for the audio "test.wav". -# Path or url as input -audio_embedding = audio_encoder("/audio/path/or/url/") + *Write the pipeline in simplified style*: -# Audio data as input -audio_data = np.zeros((2, 441344)) -sample_rate = 44100 -audio_embedding = audio_encoder(audio_data, sample_rate) -``` +```python +from towhee import dc -## Factory Constructor +dc.glob('test.wav') + .audio_decode() + .time_window(range=30) + .audio_embedding.vggish() + .show() +``` -Create the operator via the following factory method +*Write a same pipeline with explicit inputs/outputs name specifications:* -***ops.audio_embedding.vggish()*** +```python +from towhee import dc +dc.glob['path']('test.wav') + .audio_decode['path', 'audio']() + .time_window['audio', 'frames'](range=30) + .audio_embedding.vggish['frames', 'vecs']() + .select('vecs') + .show() +``` -## Interface +## Factory Constructor -An audio embedding operator generates vectors in numpy.ndarray given an audio file path. +Create the operator via the following factory method +***audio_embedding.vggish(weights_path=None, framework="pytorch")*** **Parameters:** -​ None. +​ *weights_path: str* +​ The path to model weights. If None, it will load default model weights. -**Returns**: *numpy.ndarray* +​ *framework: str* -​ Audio embeddings in shape (num_clips, 128). +​ The framework of model implementation. +Default value is "pytorch" since the model is implemented in Pytorch. +## Interface +An audio embedding operator generates vectors in numpy.ndarray given an audio file path or a [towhee audio](link/to/AudioFrame/api/doc). -## Code Example -Generate embeddings for the audio "test.wav". +**Parameters:** - *Write the pipeline in simplified style*: +​ *Union[str, towhee.types.Audio]* -```python -from towhee import dc +​ The audio path or link in string. +Or audio input data in towhee audio frames. -dc.glob('test.wav') - .audio_embedding.vggish() - .show() -``` -*Write a same pipeline with explicit inputs/outputs name specifications:* +**Returns**: -```python -from towhee import dc +​ *numpy.ndarray* -dc.glob['path']('test.wav') - .audio_embedding.vggish['path', 'vecs']() - .select('vecs') - .show() -``` +​ Audio embeddings in shape (num_clips, 128). diff --git a/vggish.py b/vggish.py index 69c3e3f..4a91cc5 100644 --- a/vggish.py +++ b/vggish.py @@ -27,8 +27,7 @@ from towhee.operator.base import NNOperator from towhee.models.vggish.torch_vggish import VGG from towhee import register -sys.path.append(str(Path(__file__).parent)) -import vggish_input +import .vggish_input warnings.filterwarnings('ignore') log = logging.getLogger() diff --git a/vggish_input.py b/vggish_input.py index 56e6176..1bc2a13 100644 --- a/vggish_input.py +++ b/vggish_input.py @@ -21,8 +21,8 @@ import torch import numpy as np import resampy -import mel_features -import vggish_params +import .mel_features +import .vggish_params import torchaudio