LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Your last searches

  1. AU="Shrem, Yosi"
  2. AU="Broodbakker, Jonno"
  3. AU=Pan Boyu
  4. AU="Chichelnitskiy, Evgeny"
  5. AU="Lin, I-Ling"
  6. AU="Tokarz, Ryan E"
  7. AU=Zarrilli Giovanni
  8. AU="Lipton, Stuart A."

Search results

Result 1 - 4 of total 4

Search options

  1. Book ; Online: Formant Estimation and Tracking using Probabilistic Heat-Maps

    Shrem, Yosi / Kreuk, Felix / Keshet, Joseph

    2022  

    Abstract: Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be ... ...

    Abstract Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. However, when presented with a speech from a different domain than that in which they have been trained on, these methods exhibit a decline in performance, limiting their usage as generic tools. The contribution of this paper is to propose a new network architecture that performs well on a variety of different speaker and speech domains. Our proposed model is composed of a shared encoder that gets as input a spectrogram and outputs a domain-invariant representation. Then, multiple decoders further process this representation, each responsible for predicting a different formant while considering the lower formant predictions. An advantage of our model is that it is based on heatmaps that generate a probability distribution over formant predictions. Results suggest that our proposed model better represents the signal over various domains and leads to better formant frequency tracking and estimation.

    Comment: interspeech 2022
    Keywords Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Subject code 006
    Publishing date 2022-06-23
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: The Vocal Signature of Social Anxiety

    Alon-Ronen, Or / Shrem, Yosi / Keshet, Yossi / Gilboa-Schechtman, Eva

    Exploration using Hypothesis-Testing and Machine-Learning Approaches

    2022  

    Abstract: Background - Social anxiety (SA) is a common and debilitating condition, negatively affecting life quality even at sub-diagnostic thresholds. We sought to characterize SA's acoustic signature using hypothesis-testing and machine learning (ML) approaches. ...

    Abstract Background - Social anxiety (SA) is a common and debilitating condition, negatively affecting life quality even at sub-diagnostic thresholds. We sought to characterize SA's acoustic signature using hypothesis-testing and machine learning (ML) approaches. Methods - Participants formed spontaneous utterances responding to instructions to refuse or consent to commands of alleged peers. Vocal properties (e.g., intensity and duration) of these utterances were analyzed. Results - Our prediction that, as compared to low-SA (n=31), high-SA (n=32) individuals exhibit a less confident vocal speech signature, especially with respect to refusal utterances, was only partially supported by the classical hypothesis-testing approach. However, the results of the ML analyses and specifically the decision tree classifier were consistent with such speech patterns in SA. Using a Gaussian Process (GP) classifier, we were able to distinguish between high- and low-SA individuals with high (75.6%) accuracy and good (.83 AUC) separability. We also expected and found that vocal properties differentiated between refusal and consent utterances. Conclusions - Our findings provide further support for the usefulness of ML approach for the study of psychopathology, highlighting the utility of developing automatic techniques to create behavioral markers of SAD. Clinically, the simplicity and accessibility of these procedures may encourage people to seek professional help.
    Keywords Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Publishing date 2022-07-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article: Using automated acoustic analysis to explore the link between planning and articulation in second language speech production.

    Goldrick, Matthew / Shrem, Yosi / Kilbourn-Ceron, Oriana / Baus, Cristina / Keshet, Joseph

    Language, cognition and neuroscience

    2020  Volume 36, Issue 7, Page(s) 824–839

    Abstract: Speakers learning a second language show systematic differences from native speakers in the retrieval, planning, and articulation of speech. A key challenge in examining the interrelationship between these differences at various stages of production is ... ...

    Abstract Speakers learning a second language show systematic differences from native speakers in the retrieval, planning, and articulation of speech. A key challenge in examining the interrelationship between these differences at various stages of production is the need for manual annotation of fine-grained properties of speech. We introduce a new method for automatically analyzing voice onset time (VOT), a key phonetic feature indexing differences in sound systems cross-linguistically. In contrast to previous approaches, our method allows reliable measurement of prevoicing, a dimension of VOT variation used by many languages. Analysis of VOTs, word durations, and reaction times from German-speaking learners of Spanish (Baus et al., 2013) suggest that while there are links between the factors impacting planning and articulation, these two processes also exhibit some degree of independence. We discuss the implications of these findings for theories of speech production and future research in bilingual language processing.
    Language English
    Publishing date 2020-08-19
    Publishing country England
    Document type Journal Article
    ZDB-ID 2753366-9
    ISSN 2327-3801 ; 2327-3798
    ISSN (online) 2327-3801
    ISSN 2327-3798
    DOI 10.1080/23273798.2020.1805118
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Dr.VOT

    Shrem, Yosi / Goldrick, Matthew / Keshet, Joseph

    Measuring Positive and Negative Voice Onset Time in the Wild

    2019  

    Abstract: Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset ... ...

    Abstract Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset follows the burst, it is positive. In this work, we present a deep-learning model for accurate and reliable measurement of VOT in naturalistic speech. The proposed system addresses two critical issues: it can measure positive and negative VOT equally well, and it is trained to be robust to variation across annotations. Our approach is based on the structured prediction framework, where the feature functions are defined to be RNNs. These learn to capture segmental variation in the signal. Results suggest that our method substantially improves over the current state-of-the-art. In contrast to previous work, our Deep and Robust VOT annotator, Dr.VOT, can successfully estimate negative VOTs while maintaining state-of-the-art performance on positive VOTs. This high level of performance generalizes to new corpora without further retraining. Index Terms: structured prediction, multi-task learning, adversarial training, recurrent neural networks, sequence segmentation.

    Comment: interspeech 2019
    Keywords Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Machine Learning ; Computer Science - Sound ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2019-10-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top