|
|
@ -13,7 +13,7 @@ The pre-trained model used here is from the paper **PANNs: Large-Scale Pretraine |
|
|
|
|
|
|
|
## Code Example |
|
|
|
|
|
|
|
Predict labels and generate embeddings given the audio path "test.wav". |
|
|
|
Predict labels and generate embeddings given the audio path "test.wav". |
|
|
|
|
|
|
|
*Write the pipeline in simplified style*: |
|
|
|
|
|
|
@ -25,7 +25,7 @@ import towhee |
|
|
|
.audio_decode.ffmpeg() |
|
|
|
.runas_op(func=lambda x:[y[0] for y in x]) |
|
|
|
.audio_classification.panns() |
|
|
|
.show() |
|
|
|
.show() |
|
|
|
) |
|
|
|
``` |
|
|
|
|
|
|
@ -39,9 +39,11 @@ import towhee |
|
|
|
.audio_decode.ffmpeg['path', 'frames']() |
|
|
|
.runas_op['frames', 'frames'](func=lambda x:[y[0] for y in x]) |
|
|
|
.audio_classification.panns['frames', ('labels', 'scores', 'vec')]() |
|
|
|
.show() |
|
|
|
.select['path', 'labels', 'scores', 'vec']() |
|
|
|
.show() |
|
|
|
) |
|
|
|
``` |
|
|
|
<img src="./result.png" width="800px"/> |
|
|
|
|
|
|
|
<br /> |
|
|
|
|
|
|
@ -93,4 +95,3 @@ The input data should represent for an audio longer than 2s. |
|
|
|
- labels: a list of topk predicted labels by model. |
|
|
|
- scores: a list of scores corresponding to labels, representing for possibility. |
|
|
|
- vec: a audio embedding generated by model, shape of which is (2048,) |
|
|
|
|
|
|
|