From e66de3b745407943a098f371bd9a282f8b8eb24f Mon Sep 17 00:00:00 2001 From: Jael Gu Date: Thu, 23 Dec 2021 19:21:34 +0800 Subject: [PATCH] Add readme Signed-off-by: Jael Gu --- README.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 19a229d..19426bd 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,48 @@ -# audio-embedding-clmr +# Pipeline: Audio Embedding using CLMR -This is another test repo \ No newline at end of file +Authors: Jael Gu + +## Overview + +The pipeline uses a pre-trained CLMR model to extract embeddings of a given audio. It first transforms the input audio to a wave file with sample rate of 22050. Then the model splits the audio data into shorter clips with a fixed length. Finally it generates vectors of each clip, which composes the fingerprint of the input audio. + +## Interface + +**Input Arguments:** + +- filepath: + - the input audio + - supported types: `str` (path to the audio) + +**Pipeline Output:** + +The Operator returns a tuple `Tuple[('embs', numpy.ndarray)]` containing following fields: + +- embs: + - embeddings of input audio + - data type: numpy.ndarray + - shape: (num_clips,512) + +## How to use + +1. Install [Towhee](https://github.com/towhee-io/towhee) + +```bash +$ pip3 install towhee +``` + +> You can refer to [Getting Started with Towhee](https://towhee.io/) for more details. If you have any questions, you can [submit an issue to the towhee repository](https://github.com/towhee-io/towhee/issues). + +2. Run it with Towhee + +```python +>>> from towhee import pipeline + +>>> embedding_pipeline = pipeline('towhee/audio-embedding-clmr') +>>> embedding = embedding_pipeline('path/to/your/audio') +``` + +## How it works + +This pipeline includes a main operator: [audio embedding](https://hub.towhee.io/towhee/audio-embedding-operator-template) (implemented as [towhee/clmr-magnatagatune](https://hub.towhee.io/towhee/clmr-magnatagatune)). The audio embedding operator encodes fixed-length clips of an audio data and finally output a set of vectors of the given audio. +