diff --git a/README.md b/README.md index 7c4ecf7..13b32ac 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ ## Desription The audio embedding operator converts an input audio into a dense vector which can be used to represent the audio clip's semantics. +Each vector represents for an audio clip with a fixed length of around 0.9s. This operator is built on top of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish) with Pytorch. The model is a [VGG](https://arxiv.org/abs/1409.1556) variant pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, it is suitable to extract features at high level or warm up a larger model. @@ -81,6 +82,7 @@ An audio embedding operator generates vectors in numpy.ndarray given an audio fi ​ The audio path or link in string. Or audio input data in towhee audio frames. +The input data should represent for an audio longer than 0.9s. **Returns**: