copied
Readme
Files and versions
Updated 7 months ago
audio-classification
Audio Classification with PANNS
Author: Jael Gu
Description
The audio classification operator classify the given audio data with 527 labels from the large-scale AudioSet dataset. The pre-trained model used here is from the paper PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition (paper link).
Code Example
Predict labels and generate embeddings given the audio path "test.wav".
Write a pipeline with explicit inputs/outputs name specifications:
from towhee import pipe, ops, DataCollection
p = (
pipe.input('path')
.map('path', 'frame', ops.audio_decode.ffmpeg())
.map('frame', ('labels', 'scores', 'vec'), ops.audio_classification.panns())
.output('path', 'labels', 'scores', 'vec')
)
DataCollection(p('./test.wav')).show()

Factory Constructor
Create the operator via the following factory method
audio_classification.panns(weights_path=None, framework='pytorch', sample_rate=32000, topk=5)
Parameters:
weights_path: str
The path to model weights. If None, it will load default model weights.
framework: str
The framework of model implementation. Default value is "pytorch" since the model is implemented in Pytorch.
sample_rate: int
The target sample rate of audio data after convention, defaults to 32000.
topk: int
The number of labels & corresponding scores to be returned, sorting by possibility from high to low. Default value is 5.
Interface
An audio embedding operator generates vectors in numpy.ndarray given towhee audio frames.
Parameters:
data: List[towhee.types.audio_frame.AudioFrame]
Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 2s.
Returns:
labels, scores, vec: Tuple(List[str], List(float), numpy.ndarray)
- labels: a list of topk predicted labels by model.
- scores: a list of scores corresponding to labels, representing for possibility.
- vec: a audio embedding generated by model, shape of which is (2048,)
More Resources
- Cross-Entropy Loss: Unraveling its Role in Machine Learning - Zilliz blog: Cross Entropy loss is used for training classification models.
- Audio Retrieval Based on Milvus - Zilliz blog: Create an audio retrieval system using Milvus, an open-source vector database. Classify and analyze sound data in real time.
- Vector Database Use Case: Audio Similarity Search - Zilliz: Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus).
- Everything You Need to Know About Zero Shot Learning - Zilliz blog: A comprehensive guide to Zero-Shot Learning, covering its methodologies, its relations with similarity search, and popular Zero-Shot Classification Models.
- Neural Networks and Embeddings for Language Models - Zilliz blog: Exploring neural network language models, specifically recurrent neural networks, and taking a sneak peek at how embeddings are generated.
- What is a Generative Adversarial Network? An Easy Guide: Just like we classify animal fossils into domains, kingdoms, and phyla, we classify AI networks, too. At the highest level, we classify AI networks as "discriminative" and "generative." A generative neural network is an AI that creates something new. This differs from a discriminative network, which classifies something that already exists into particular buckets. Kind of like we're doing right now, by bucketing generative adversarial networks (GANs) into appropriate classifications. So, if you were in a situation where you wanted to use textual tags to create a new visual image, like with Midjourney, you'd use a generative network. However, if you had a giant pile of data that you needed to classify and tag, you'd use a discriminative model.
| 12 Commits | ||
---|---|---|---|
|
1.1 KiB
|
3 years ago | |
|
4.3 KiB
|
7 months ago | |
|
668 B
|
3 years ago | |
|
3.7 KiB
|
2 years ago | |
|
46 B
|
2 years ago | |
|
13 KiB
|
2 years ago |