This pipeline extracts features of a given audio file using a VGGish model implemented in Pytorch. This is a supervised model pre-trained with [AudioSet](https://research.google.com/audioset/), which contains over 2 million sound clips.
> You can refer to [Getting Started with Towhee](https://towhee.io/) for more details. If you have any questions, you can [submit an issue to the towhee repository](https://github.com/towhee-io/towhee/issues).
By default, the pipeline uses [towhee/audio-decoder](https://towhee.io/towhee/audio-decoder) to load audio path as a list of audio frames in ndarray.
Then `time-window` will combine audio frames into a list of ndarray, each of which represents an audio clip in fixed length.
At the end, the [towhee/torch-vggish](https://hub.towhee.io/towhee/torch-vggish)) operator will generate a list of audio embeddings for each audio clip.