LIVIVO - Search results -

Search results

Result 1 - 7 of total 7

Search options

Book ; Online: On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations

McCallum, Matthew C. / Davies, Matthew E. P. / Henkel, Florian / Kim, Jaehun / Sandberg, Samuel E.

2024

Abstract: Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties ... ...

Abstract	Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties of the embedding spaces themselves which are important in nearest neighbor algorithms, commonly used in music search and recommendation. In this work we show that when learning audio representations on music datasets via contrastive learning, musical properties that are typically homogeneous within a track (e.g., key and tempo) are reflected in the locality of neighborhoods in the resulting embedding space. By applying appropriate data augmentation strategies, localisation of such properties can not only be reduced but the localisation of other attributes is increased. For example, locality of features such as pitch and tempo that are less relevant to non-expert listeners, may be mitigated while improving the locality of more salient features such as genre and mood, achieving state-of-the-art performance in nearest neighbor retrieval accuracy. Similarly, we show that the optimal selection of data augmentation strategies for contrastive learning of music audio embeddings is dependent on the downstream task, highlighting this as an important embedding design decision. Comment: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Keywords	Computer Science - Sound ; Computer Science - Information Retrieval ; Computer Science - Machine Learning ; Computer Science - Multimedia ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	780
Publishing date	2024-01-16
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Similar but Faster

McCallum, Matthew C. / Henkel, Florian / Kim, Jaehun / Sandberg, Samuel E. / Davies, Matthew E. P.

Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search

2024

Abstract: Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is ... ...

Abstract	Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces representing specific, yet possibly correlated, attributes can be weighted to emphasize those attributes in downstream tasks. However, no research has been conducted into the independence of these subspaces, nor their manipulation, in order to retrieve tracks that are similar but different in a specific way. Here, we explore the manipulation of tempo in embedding spaces as a case-study towards this goal. We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre. As this translation is specific to tempo it enables retrieval of tracks that are similar but have specifically different tempi. We show that such a function can be used as an efficient data augmentation strategy for both training of downstream tempo predictors, and improved nearest neighbor retrieval of properties largely independent of tempo. Comment: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Keywords	Computer Science - Sound ; Computer Science - Digital Libraries ; Computer Science - Information Retrieval ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	006
Publishing date	2024-01-16
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Tempo estimation as fully self-supervised binary classification

Henkel, Florian / Kim, Jaehun / McCallum, Matthew C. / Sandberg, Samuel E. / Davies, Matthew E. P.

2024

Abstract: This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. ... ...

Abstract	This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. Towards alleviating this issue, we propose a fully self-supervised approach that does not rely on any human labeled data. Our method builds on the fact that generic (music) audio embeddings already encode a variety of properties, including information about tempo, making them easily adaptable for downstream tasks. While recent work in self-supervised tempo estimation aimed to learn a tempo specific representation that was subsequently used to train a supervised classifier, we reformulate the task into the binary classification problem of predicting whether a target track has the same or a different tempo compared to a reference. While the former still requires labeled training data for the final classification model, our approach uses arbitrary unlabeled music data in combination with time-stretching for model training as well as a small set of synthetically created reference samples for predicting the final tempo. Evaluation of our approach in comparison with the state-of-the-art reveals highly competitive performance when the constraint of finding the precise tempo octave is relaxed. Comment: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Keywords	Computer Science - Sound ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	006
Publishing date	2024-01-16
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Tempo vs. Pitch

Morais, Giovana / Davies, Matthew E. P. / Queiroz, Marcelo / Fuentes, Magdalena

understanding self-supervised tempo estimation

2023

Abstract: Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, ... ...

Abstract	Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, environmental sound analysis, and recently in music information retrieval, e.g. for pitch estimation. Particularly in the context of music, there are few insights about the fragility of these models regarding different distributions of data, and how they could be mitigated. In this paper, we explore these questions by dissecting a self-supervised model for pitch estimation adapted for tempo estimation via rigorous experimentation with synthetic data. Specifically, we study the relationship between the input representation and data distribution for self-supervised tempo estimation. Comment: 5 pages, 3 figures, published on 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
Keywords	Computer Science - Sound ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	780 ; 004
Publishing date	2023-04-13
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Symbolic music generation conditioned on continuous-valued emotions

Sulun, Serkan / Davies, Matthew E. P. / Viana, Paula

2022

Abstract: In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and ... ...

Abstract	In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art. Comment: Published in IEEE Access
Keywords	Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Artificial Intelligence ; Computer Science - Multimedia
Publishing date	2022-03-30
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Sulun, Serkan / Davies, Matthew E. P.

2020

Abstract: In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the ... ...

Abstract	In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when training and subsequently testing the network. For two different state of the art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low pass filters during training and leads to improved generalization to unseen filtering conditions at test time. Comment: Qualitative examples on https://serkansulun.com/bwe. Source code on https://github.com/serkansulun/deep-music-enhancer
Keywords	Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning ; Computer Science - Sound
Subject code	780
Publishing date	2020-11-14
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: TIV.lib

Ramires, António / Bernardes, Gilberto / Davies, Matthew E. P. / Serra, Xavier

an open-source library for the tonal description of musical audio

2020

Abstract: In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from ... ...

Abstract	In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed - e.g., harmonic change, dissonance, diatonicity, and musical key. The library is cross-platform, implemented in Python and the graphical programming language Pure Data, and can be used in both online and offline scenarios. Of note is its potential for enhanced Music Information Retrieval, where tonal descriptors sit at the core of numerous methods and applications.
Keywords	Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Sound
Publishing date	2020-08-26
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Your last searches

Search results

Search options

Book ; Online: On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Similar but Faster

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Tempo estimation as fully self-supervised binary classification

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Tempo vs. Pitch

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Symbolic music generation conditioned on continuous-valued emotions

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: TIV.lib

Full text online

More links

Kategorien

Inter-library loan at ZB MED