LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Your last searches

  1. AU="Davies, Matthew E. P."
  2. AU="Cardano, Francesca"
  3. AU="Muller, Thibaut"
  4. AU="Albert, Amy B"
  5. AU="Udupa, Karthik Subramanaya"
  6. AU=Zinzula Luca
  7. AU="Shah, Pallavi"
  8. AU="Kirschnick, Laura Borges"
  9. AU="Nobuyuki Sakai"
  10. AU="Robert van Deursen"
  11. AU="Sentinella, Alexander T"
  12. AU=Budhiraja Rohit
  13. AU="Tonjes, M B"
  14. AU="Vande More, A M"
  15. AU="Gangireddy, Pavan Kumar Reddy"
  16. AU="Marlier, Arnaud"
  17. AU="Nerone, Marta"
  18. AU="Hou, Ping"
  19. AU=Sommerstein Rami AU=Sommerstein Rami
  20. AU="Zocchi, Kent"
  21. AU="Pandey, Jitendra Kumar"
  22. AU=Nzila Alexis
  23. AU="Vallurupalli, Anusha"

Search results

Result 1 - 7 of total 7

Search options

  1. Book ; Online: On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations

    McCallum, Matthew C. / Davies, Matthew E. P. / Henkel, Florian / Kim, Jaehun / Sandberg, Samuel E.

    2024  

    Abstract: Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties ... ...

    Abstract Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties of the embedding spaces themselves which are important in nearest neighbor algorithms, commonly used in music search and recommendation. In this work we show that when learning audio representations on music datasets via contrastive learning, musical properties that are typically homogeneous within a track (e.g., key and tempo) are reflected in the locality of neighborhoods in the resulting embedding space. By applying appropriate data augmentation strategies, localisation of such properties can not only be reduced but the localisation of other attributes is increased. For example, locality of features such as pitch and tempo that are less relevant to non-expert listeners, may be mitigated while improving the locality of more salient features such as genre and mood, achieving state-of-the-art performance in nearest neighbor retrieval accuracy. Similarly, we show that the optimal selection of data augmentation strategies for contrastive learning of music audio embeddings is dependent on the downstream task, highlighting this as an important embedding design decision.

    Comment: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
    Keywords Computer Science - Sound ; Computer Science - Information Retrieval ; Computer Science - Machine Learning ; Computer Science - Multimedia ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Subject code 780
    Publishing date 2024-01-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Similar but Faster

    McCallum, Matthew C. / Henkel, Florian / Kim, Jaehun / Sandberg, Samuel E. / Davies, Matthew E. P.

    Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search

    2024  

    Abstract: Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is ... ...

    Abstract Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces representing specific, yet possibly correlated, attributes can be weighted to emphasize those attributes in downstream tasks. However, no research has been conducted into the independence of these subspaces, nor their manipulation, in order to retrieve tracks that are similar but different in a specific way. Here, we explore the manipulation of tempo in embedding spaces as a case-study towards this goal. We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre. As this translation is specific to tempo it enables retrieval of tracks that are similar but have specifically different tempi. We show that such a function can be used as an efficient data augmentation strategy for both training of downstream tempo predictors, and improved nearest neighbor retrieval of properties largely independent of tempo.

    Comment: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
    Keywords Computer Science - Sound ; Computer Science - Digital Libraries ; Computer Science - Information Retrieval ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Subject code 006
    Publishing date 2024-01-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Tempo estimation as fully self-supervised binary classification

    Henkel, Florian / Kim, Jaehun / McCallum, Matthew C. / Sandberg, Samuel E. / Davies, Matthew E. P.

    2024  

    Abstract: This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. ... ...

    Abstract This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. Towards alleviating this issue, we propose a fully self-supervised approach that does not rely on any human labeled data. Our method builds on the fact that generic (music) audio embeddings already encode a variety of properties, including information about tempo, making them easily adaptable for downstream tasks. While recent work in self-supervised tempo estimation aimed to learn a tempo specific representation that was subsequently used to train a supervised classifier, we reformulate the task into the binary classification problem of predicting whether a target track has the same or a different tempo compared to a reference. While the former still requires labeled training data for the final classification model, our approach uses arbitrary unlabeled music data in combination with time-stretching for model training as well as a small set of synthetically created reference samples for predicting the final tempo. Evaluation of our approach in comparison with the state-of-the-art reveals highly competitive performance when the constraint of finding the precise tempo octave is relaxed.

    Comment: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
    Keywords Computer Science - Sound ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Subject code 006
    Publishing date 2024-01-16
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Tempo vs. Pitch

    Morais, Giovana / Davies, Matthew E. P. / Queiroz, Marcelo / Fuentes, Magdalena

    understanding self-supervised tempo estimation

    2023  

    Abstract: Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, ... ...

    Abstract Self-supervision methods learn representations by solving pretext tasks that do not require human-generated labels, alleviating the need for time-consuming annotations. These methods have been applied in computer vision, natural language processing, environmental sound analysis, and recently in music information retrieval, e.g. for pitch estimation. Particularly in the context of music, there are few insights about the fragility of these models regarding different distributions of data, and how they could be mitigated. In this paper, we explore these questions by dissecting a self-supervised model for pitch estimation adapted for tempo estimation via rigorous experimentation with synthetic data. Specifically, we study the relationship between the input representation and data distribution for self-supervised tempo estimation.

    Comment: 5 pages, 3 figures, published on 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing
    Keywords Computer Science - Sound ; Computer Science - Machine Learning ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Subject code 780 ; 004
    Publishing date 2023-04-13
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Symbolic music generation conditioned on continuous-valued emotions

    Sulun, Serkan / Davies, Matthew E. P. / Viana, Paula

    2022  

    Abstract: In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and ... ...

    Abstract In this paper we present a new approach for the generation of multi-instrument symbolic music driven by musical emotion. The principal novelty of our approach centres on conditioning a state-of-the-art transformer based on continuous-valued valence and arousal labels. In addition, we provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal. We evaluate our approach in a quantitative manner in two ways, first by measuring its note prediction accuracy, and second via a regression task in the valence-arousal plane. Our results demonstrate that our proposed approaches outperform conditioning using control tokens which is representative of the current state of the art.

    Comment: Published in IEEE Access
    Keywords Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Artificial Intelligence ; Computer Science - Multimedia
    Publishing date 2022-03-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

    Sulun, Serkan / Davies, Matthew E. P.

    2020  

    Abstract: In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the ... ...

    Abstract In this paper, we address a sub-topic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low pass filter when training and subsequently testing the network. For two different state of the art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low pass filters during training and leads to improved generalization to unseen filtering conditions at test time.

    Comment: Qualitative examples on https://serkansulun.com/bwe. Source code on https://github.com/serkansulun/deep-music-enhancer
    Keywords Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning ; Computer Science - Sound
    Subject code 780
    Publishing date 2020-11-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: TIV.lib

    Ramires, António / Bernardes, Gilberto / Davies, Matthew E. P. / Serra, Xavier

    an open-source library for the tonal description of musical audio

    2020  

    Abstract: In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from ... ...

    Abstract In this paper, we present TIV.lib, an open-source library for the content-based tonal description of musical audio signals. Its main novelty relies on the perceptually-inspired Tonal Interval Vector space based on the Discrete Fourier transform, from which multiple instantaneous and global representations, descriptors and metrics are computed - e.g., harmonic change, dissonance, diatonicity, and musical key. The library is cross-platform, implemented in Python and the graphical programming language Pure Data, and can be used in both online and offline scenarios. Of note is its potential for enhanced Music Information Retrieval, where tonal descriptors sit at the core of numerous methods and applications.
    Keywords Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Sound
    Publishing date 2020-08-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top