From 16a513386e1be1f105ad80acdaa7cccd246b305c Mon Sep 17 00:00:00 2001 From: Jael Gu Date: Fri, 1 Apr 2022 15:51:38 +0800 Subject: [PATCH] Update README Signed-off-by: Jael Gu --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 7c4ecf7..13b32ac 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ ## Desription The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics. +Each vector represents for an audio clip with a fixed length of around 0.9s. This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch. The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, it is suitable to extract features at high level or warm up a larger model. @@ -81,6 +82,7 @@ An audio embedding operator generates vectors in numpy.ndarray given an audio fi ​ The audio path or link in string. Or audio input data in towhee audio frames. +The input data should represent for an audio longer than 0.9s. **Returns**: