From 3759ac24e2b8ef748de3c9ec7b67404b8830a3c9 Mon Sep 17 00:00:00 2001 From: Jael Gu Date: Thu, 7 Apr 2022 17:37:24 +0800 Subject: [PATCH] Update README Signed-off-by: Jael Gu --- README.md | 58 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/README.md b/README.md index 666f69b..e639c0d 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,7 @@ *Author: Jael Gu* +
## Desription @@ -11,34 +12,36 @@ This operator is built on top of [VGGish](https://github.com/tensorflow/models/t The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, it is suitable to extract features at high level or warm up a larger model. +
+ ## Code Example Generate embeddings for the audio "test.wav". - *Write the pipeline in simplified style*: +*Write the pipeline in simplified style*: ```python -from towhee import dc +import towhee -dc.glob('test.wav') - .audio_decode() - .time_window(range=10) - .audio_embedding.vggish() - .show() +towhee.glob('test.wav') \ + .audio_decode() \ + .time_window(range=10) \ + .audio_embedding.vggish() \ + .show() ``` | [-0.4931737, -0.40068552, -0.032327592, ...] shape=(10, 128) | *Write a same pipeline with explicit inputs/outputs name specifications:* ```python -from towhee import dc - -dc.glob['path']('test.wav') - .audio_decode['path', 'audio']() - .time_window['audio', 'frames'](range=10) - .audio_embedding.vggish['frames', 'vecs']() - .select('vecs') - .to_vec() +import towhee + +towhee.glob['path']('test.wav') \ + .audio_decode['path', 'audio']() \ + .time_window['audio', 'frames'](range=10) \ + .audio_embedding.vggish['frames', 'vecs']() \ + .select('vecs') \ + .to_vec() ``` [array([[-0.4931737 , -0.40068552, -0.03232759, ..., -0.33428153, 0.1333081 , -0.25221825], @@ -54,6 +57,8 @@ dc.glob['path']('test.wav') [-0.4886143 , -0.40098593, -0.03175077, ..., -0.3325425 , 0.13271847, -0.25159872]], dtype=float32)] +
+ ## Factory Constructor Create the operator via the following factory method @@ -62,34 +67,33 @@ Create the operator via the following factory method **Parameters:** -​ *weights_path: str* +*weights_path: str* -​ The path to model weights. If None, it will load default model weights. +The path to model weights. If None, it will load default model weights. -​ *framework: str* +*framework: str* -​ The framework of model implementation. +The framework of model implementation. Default value is "pytorch" since the model is implemented in Pytorch. -## Interface +
-An audio embedding operator generates vectors in numpy.ndarray given an audio file path or a [towhee audio](link/to/AudioFrame/api/doc). +## Interface +An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames. **Parameters:** -​ *Union[str, towhee.types.Audio]* +*Union[str, towhee.types.Audio (a sub-class of numpy.ndarray)]* -​ The audio path or link in string. +The audio path or link in string. Or audio input data in towhee audio frames. The input data should represent for an audio longer than 0.9s. **Returns**: -​ *numpy.ndarray* +*numpy.ndarray* -​ Audio embeddings in shape (num_clips, 128). +Audio embeddings in shape (num_clips, 128). Each embedding stands for features of an audio clip with length of 0.9s. - -