# Audio Embedding with data2vec *author: David Wang*
## Description This operator extracts features for audio with [data2vec](https://arxiv.org/abs/2202.03555). The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture.
## Code Example Generate embeddings for the audio "test.wav". *Write a pipeline with explicit inputs/outputs name specifications:* ```python from towhee import pipe, ops, DataCollection p = ( pipe.input('path') .map('path', 'frame', ops.audio_decode.ffmpeg(sample_rate=16000)) .map('frame', 'vecs', ops.audio_embedding.data2vec(model_name='facebook/data2vec-audio-base-960h')) .output('path', 'vecs') ) DataCollection(p('test.wav')).show() ```
## Factory Constructor Create the operator via the following factory method ***data2vec(model_name='facebook/data2vec-audio-base')*** **Parameters:** ​ ***model_name***: *str* The model name in string. The default value is "facebook/data2vec-audio-base-960h". Supported model name: - - facebook/data2vec-audio-base-960h - facebook/data2vec-audio-large-960h - facebook/data2vec-audio-base - facebook/data2vec-audio-base-100h - facebook/data2vec-audio-base-10m - facebook/data2vec-audio-large - facebook/data2vec-audio-large-100h - facebook/data2vec-audio-large-10m
## Interface An audio embedding operator generates vectors in numpy.ndarray given an audio file path or towhee audio frames. **Parameters:** ​ ***data:*** *List[towhee.types.audio_frame.AudioFrame]* ​ Input audio data is a list of towhee audio frames. The input data should represent for an audio longer than 0.9s. **Returns:** *numpy.ndarray* ​ The audio embedding extracted by model. # More Resources - [What is a Transformer Model? An Engineer's Guide](https://zilliz.com/glossary/transformer-models): A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output. - [Exploring Multimodal Embeddings with FiftyOne and Milvus - Zilliz blog](https://zilliz.com/blog/exploring-multimodal-embeddings-with-fiftyone-and-milvus): This post explored how multimodal embeddings work with Voxel51 and Milvus. - [How to Get the Right Vector Embeddings - Zilliz blog](https://zilliz.com/blog/how-to-get-the-right-vector-embeddings): A comprehensive introduction to vector embeddings and how to generate them with popular open-source models. - [What is BERT (Bidirectional Encoder Representations from Transformers)? - Zilliz blog](https://zilliz.com/learn/what-is-bert): Learn what Bidirectional Encoder Representations from Transformers (BERT) is and how it uses pre-training and fine-tuning to achieve its remarkable performance. - [Transforming Text: The Rise of Sentence Transformers in NLP - Zilliz blog](https://zilliz.com/learn/transforming-text-the-rise-of-sentence-transformers-in-nlp): Everything you need to know about the Transformers model, exploring its architecture, implementation, and limitations - [What Are Vector Embeddings?](https://zilliz.com/glossary/vector-embeddings): Learn the definition of vector embeddings, how to create vector embeddings, and more. - [Vector Database Use Case: Audio Similarity Search - Zilliz](https://zilliz.com/vector-database-use-cases/audio-similarity-search): Building agile and reliable audio similarity search with Zilliz vector database (fully managed Milvus). - [Enhancing Information Retrieval with Sparse Embeddings | Zilliz Learn - Zilliz blog](https://zilliz.com/learn/enhancing-information-retrieval-learned-sparse-embeddings): Explore the inner workings, advantages, and practical applications of learned sparse embeddings with the Milvus vector database - [An Introduction to Vector Embeddings: What They Are and How to Use Them - Zilliz blog](https://zilliz.com/learn/everything-you-should-know-about-vector-embeddings): In this blog post, we will understand the concept of vector embeddings and explore its applications, best practices, and tools for working with embeddings.