LIVIVO - Search results -

Search results

Result 1 - 7 of total 7

Book ; Online: ED-TTS

Tang, Haobin / Zhang, Xulong / Cheng, Ning / Xiao, Jing / Wang, Jianzong

Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

2024

Abstract: Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of speech prosody. We introduce ED-TTS, a multi-scale emotional speech synthesis ... ...

Abstract	Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of speech prosody. We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels. Specifically, our proposed approach integrates the utterance-level emotion embedding extracted by SER with fine-grained frame-level emotion embedding obtained from SED. These embeddings are used to condition the reverse process of the denoising diffusion probabilistic model (DDPM). Additionally, we employ cross-domain SED to accurately predict soft labels, addressing the challenge of a scarcity of fine-grained emotion-annotated datasets for supervising emotional TTS training. Comment: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)
Keywords	Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Sound
Subject code	004
Publishing date	2024-01-16
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: QI-TTS

Tang, Haobin / Zhang, Xulong / Wang, Jianzong / Cheng, Ning / Xiao, Jing

Questioning Intonation Control for Emotional Speech Synthesis

2023

Abstract: Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver ... ...

Abstract	Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver the speaker's questioning intention while transferring emotion from reference speech. We propose a multi-style extractor to extract style embedding from two different levels. While the sentence level represents emotion, the final syllable level represents intonation. For fine-grained intonation control, we use relative attributes to represent intonation intensity at the syllable level.Experiments have validated the effectiveness of QI-TTS for improving intonation expressiveness in emotional speech synthesis. Comment: Accepted by ICASSP 2023
Keywords	Computer Science - Sound ; Computer Science - Computation and Language ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	430
Publishing date	2023-03-14
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: EmoMix

Tang, Haobin / Zhang, Xulong / Wang, Jianzong / Cheng, Ning / Xiao, Jing

Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

2023

Abstract: There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in ... ...

Abstract	There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in intensity control. To address these limitations, we propose EmoMix, which can generate emotional speech with specified intensity or a mixture of emotions. Specifically, EmoMix is a controllable emotional TTS model based on a diffusion probabilistic model and a pre-trained speech emotion recognition (SER) model used to extract emotion embedding. Mixed emotion synthesis is achieved by combining the noises predicted by diffusion model conditioned on different emotions during only one sampling process at the run-time. We further apply the Neutral and specific primary emotion mixed in varying degrees to control intensity. Experimental results validate the effectiveness of EmoMix for synthesizing mixed emotion and intensity control. Comment: Accepted by 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)
Keywords	Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	410
Publishing date	2023-06-01
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Dilatation Eustachian tuboplasty with a Eustachian tube video endoscope and supporting balloon.

Zhang, Huasong / Zhang, Qing / He, Kunwu / Chen, Minqi / Chen, Yucheng / Su, Dongliang / Tang, Haobin / Lin, Weifen / Chen, Shuhua

The Journal of laryngology and otology

2023 Volume 138, Issue 3, Page(s) 246–252

Abstract: Objective: To evaluate the feasibility and safety of employing a Eustachian tube video endoscope with a supporting balloon as a viable treatment and examination option for patients with Eustachian tube dysfunction.: Methods: A study involving nine ... ...

Abstract	Objective: To evaluate the feasibility and safety of employing a Eustachian tube video endoscope with a supporting balloon as a viable treatment and examination option for patients with Eustachian tube dysfunction. Methods: A study involving nine fresh human cadaver heads was conducted to investigate the potential of balloon dilatation Eustachian tuboplasty using a Eustachian tube video endoscope and a supporting balloon catheter. The Eustachian tube cavity was examined with the Eustachian tube video endoscope during the procedure, which involved the dilatation of the cartilaginous portion of the Eustachian tube with the supporting balloon catheter. Results: The utilisation of the Eustachian tube video endoscope in conjunction with the supporting balloon catheter demonstrated technical ease during the procedure, with no observed damage to essential structures, particularly the Eustachian tube cavity. Conclusion: This newly introduced method of dilatation and examination of the Eustachian tube cavity using a Eustachian tube video endoscope and the supporting balloon is a feasible, safe procedure.
MeSH term(s)	Humans ; Eustachian Tube/surgery ; Dilatation/methods ; Tympanoplasty ; Ear Diseases/diagnosis ; Endoscopes ; Treatment Outcome
Language	English
Publishing date	2023-07-26
Publishing country	England
Document type	Journal Article
ZDB-ID	218299-3
ISSN	1748-5460 ; 0022-2151
ISSN (online)	1748-5460
ISSN	0022-2151
DOI	10.1017/S0022215123001202
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Ul II Zs.64: Show issues

Location:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 2021: Bestellungen von Artikeln über das Online-Bestellformular
ab Jg. 2022: Lesesaal (EG)

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Details ▾
- See ZB MED holdings
- Order with fees

Book ; Online: Dynamic Alignment Mask CTC

Zhang, Xulong / Tang, Haobin / Wang, Jianzong / Cheng, Ning / Luo, Jian / Xiao, Jing

Improved Mask-CTC with Aligned Cross Entropy

2023

Abstract: Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, ... ...

Abstract	Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. The AXE ignores the absolute position alignment between prediction and ground truth sentence and focuses on tokens matching in relative order. The dynamic rectification method makes the model capable of simulating the non-mask but possible wrong tokens, even if they have high confidence. Our experiments on WSJ dataset demonstrated that not only AXE loss but also the rectification method could improve the WER performance of Mask CTC. Comment: Accepted by ICASSP 2023
Keywords	Computer Science - Sound ; Computer Science - Computation and Language ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	501
Publishing date	2023-03-14
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Speech Augmentation Based Unsupervised Learning for Keyword Spotting

Luo, Jian / Wang, Jianzong / Cheng, Ning / Tang, Haobin / Xiao, Jing

2022

Abstract: In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to ... ...

Abstract	In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to conduct the KWS task. CNN layers focus on the local acoustic features, and attention layers model the long-time dependency. To improve the robustness of KWS model, we also proposed an unsupervised learning method. The unsupervised loss is based on the similarity between the original and augmented speech features, as well as the audio reconstructing information. Two speech augmentation methods are explored in the unsupervised learning: speed and intensity. The experiments on Google Speech Commands V2 Dataset demonstrated that our CNN-Attention model has competitive results. Moreover, the augmentation based unsupervised learning could further improve the classification accuracy of KWS task. In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC. Comment: accepted by WCCI 2022
Keywords	Computer Science - Sound ; Computer Science - Computation and Language ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	004
Publishing date	2022-05-28
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: SAR

Wang, Jianzong / Zhang, Xulong / Tang, Haobin / Sun, Aolan / Cheng, Ning / Xiao, Jing

Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

2023

Abstract: In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features predicted from an acoustic model. However, there are always distortions existing in the predicted acoustic features, ... ...

Abstract	In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features predicted from an acoustic model. However, there are always distortions existing in the predicted acoustic features, compared to those of the groundtruth, especially in the common case of poor acoustic modeling due to low-quality training data. To overcome such limits, we propose a Self-supervised learning framework to learn an Anti-distortion acoustic Representation (SAR) to replace human-crafted acoustic features by introducing distortion prior to an auto-encoder pre-training process. The learned acoustic representation from the proposed framework is proved anti-distortion compared to the most commonly used mel-spectrogram through both objective and subjective evaluation. Comment: Accepted by IJCNN2023. 2023 International Joint Conference on Neural Networks (IJCNN2023)
Keywords	Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Publishing date	2023-04-23
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Search results

Search options

Book ; Online: ED-TTS

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: QI-TTS

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: EmoMix

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Article ; Online: Dilatation Eustachian tuboplasty with a Eustachian tube video endoscope and supporting balloon.

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito

Book ; Online: Dynamic Alignment Mask CTC

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: Speech Augmentation Based Unsupervised Learning for Keyword Spotting

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Book ; Online: SAR

Full text online

More links

Kategorien

Inter-library loan at ZB MED