LIVIVO - Suchergebnisse -

Suchergebnis

Treffer 1 - 7 von insgesamt 7

Buch ; Online: ED-TTS

Tang, Haobin / Zhang, Xulong / Cheng, Ning / Xiao, Jing / Wang, Jianzong

Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

2024

Abstract: Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of speech prosody. We introduce ED-TTS, a multi-scale emotional speech synthesis ... ...

Abstract	Existing emotional speech synthesis methods often utilize an utterance-level style embedding extracted from reference audio, neglecting the inherent multi-scale property of speech prosody. We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels. Specifically, our proposed approach integrates the utterance-level emotion embedding extracted by SER with fine-grained frame-level emotion embedding obtained from SED. These embeddings are used to condition the reverse process of the denoising diffusion probabilistic model (DDPM). Additionally, we employ cross-domain SED to accurately predict soft labels, addressing the challenge of a scarcity of fine-grained emotion-annotated datasets for supervising emotional TTS training. Comment: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024)
Schlagwörter	Electrical Engineering and Systems Science - Audio and Speech Processing ; Computer Science - Sound
Thema/Rubrik (Code)	004
Erscheinungsdatum	2024-01-16
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: QI-TTS

Tang, Haobin / Zhang, Xulong / Wang, Jianzong / Cheng, Ning / Xiao, Jing

Questioning Intonation Control for Emotional Speech Synthesis

2023

Abstract: Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver ... ...

Abstract	Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected. In this paper, we propose QI-TTS which aims to better transfer and control intonation to further deliver the speaker's questioning intention while transferring emotion from reference speech. We propose a multi-style extractor to extract style embedding from two different levels. While the sentence level represents emotion, the final syllable level represents intonation. For fine-grained intonation control, we use relative attributes to represent intonation intensity at the syllable level.Experiments have validated the effectiveness of QI-TTS for improving intonation expressiveness in emotional speech synthesis. Comment: Accepted by ICASSP 2023
Schlagwörter	Computer Science - Sound ; Computer Science - Computation and Language ; Electrical Engineering and Systems Science - Audio and Speech Processing
Thema/Rubrik (Code)	430
Erscheinungsdatum	2023-03-14
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: EmoMix

Tang, Haobin / Zhang, Xulong / Wang, Jianzong / Cheng, Ning / Xiao, Jing

Emotion Mixing via Diffusion Models for Emotional Speech Synthesis

2023

Abstract: There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in ... ...

Abstract	There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in intensity control. To address these limitations, we propose EmoMix, which can generate emotional speech with specified intensity or a mixture of emotions. Specifically, EmoMix is a controllable emotional TTS model based on a diffusion probabilistic model and a pre-trained speech emotion recognition (SER) model used to extract emotion embedding. Mixed emotion synthesis is achieved by combining the noises predicted by diffusion model conditioned on different emotions during only one sampling process at the run-time. We further apply the Neutral and specific primary emotion mixed in varying degrees to control intensity. Experimental results validate the effectiveness of EmoMix for synthesizing mixed emotion and intensity control. Comment: Accepted by 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)
Schlagwörter	Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Thema/Rubrik (Code)	410
Erscheinungsdatum	2023-06-01
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Artikel ; Online: Dilatation Eustachian tuboplasty with a Eustachian tube video endoscope and supporting balloon.

Zhang, Huasong / Zhang, Qing / He, Kunwu / Chen, Minqi / Chen, Yucheng / Su, Dongliang / Tang, Haobin / Lin, Weifen / Chen, Shuhua

The Journal of laryngology and otology

2023 Band 138, Heft 3, Seite(n) 246–252

Abstract: Objective: To evaluate the feasibility and safety of employing a Eustachian tube video endoscope with a supporting balloon as a viable treatment and examination option for patients with Eustachian tube dysfunction.: Methods: A study involving nine ... ...

Abstract	Objective: To evaluate the feasibility and safety of employing a Eustachian tube video endoscope with a supporting balloon as a viable treatment and examination option for patients with Eustachian tube dysfunction. Methods: A study involving nine fresh human cadaver heads was conducted to investigate the potential of balloon dilatation Eustachian tuboplasty using a Eustachian tube video endoscope and a supporting balloon catheter. The Eustachian tube cavity was examined with the Eustachian tube video endoscope during the procedure, which involved the dilatation of the cartilaginous portion of the Eustachian tube with the supporting balloon catheter. Results: The utilisation of the Eustachian tube video endoscope in conjunction with the supporting balloon catheter demonstrated technical ease during the procedure, with no observed damage to essential structures, particularly the Eustachian tube cavity. Conclusion: This newly introduced method of dilatation and examination of the Eustachian tube cavity using a Eustachian tube video endoscope and the supporting balloon is a feasible, safe procedure.
Mesh-Begriff(e)	Humans ; Eustachian Tube/surgery ; Dilatation/methods ; Tympanoplasty ; Ear Diseases/diagnosis ; Endoscopes ; Treatment Outcome
Sprache	Englisch
Erscheinungsdatum	2023-07-26
Erscheinungsland	England
Dokumenttyp	Journal Article
ZDB-ID	218299-3
ISSN	1748-5460 ; 0022-2151
ISSN (online)	1748-5460
ISSN	0022-2151
DOI	10.1017/S0022215123001202
Datenquelle	MEDical Literature Analysis and Retrieval System OnLINE

Volltext online

Zugriff für angemeldete ZB MED Nutzerinnen und Nutzer

Zusatzmaterialien

Verfügbar in ZB MED Köln/Königswinter

Ul II Zs.64: Hefte anzeigen

Standort:
Je nach Verfügbarkeit (siehe Angabe bei Bestand)
bis Jg. 2021: Bestellungen von Artikeln über das Online-Bestellformular
ab Jg. 2022: Lesesaal (EG)

Über subito bestellen

Dieser Service ist kostenpflichtig (siehe Lieferbedingungen von subito). Bestellungen, die einen Artikel nebst Supplementary Material umfassen, werden grundsätzlich wie mehrfache Bestellungen bearbeitet. Gebühren fallen in diesen Fällen für jede einzelne Bestellung an.

Details ▾

Buch ; Online: Dynamic Alignment Mask CTC

Zhang, Xulong / Tang, Haobin / Wang, Jianzong / Cheng, Ning / Luo, Jian / Xiao, Jing

Improved Mask-CTC with Aligned Cross Entropy

2023

Abstract: Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, ... ...

Abstract	Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. The AXE ignores the absolute position alignment between prediction and ground truth sentence and focuses on tokens matching in relative order. The dynamic rectification method makes the model capable of simulating the non-mask but possible wrong tokens, even if they have high confidence. Our experiments on WSJ dataset demonstrated that not only AXE loss but also the rectification method could improve the WER performance of Mask CTC. Comment: Accepted by ICASSP 2023
Schlagwörter	Computer Science - Sound ; Computer Science - Computation and Language ; Electrical Engineering and Systems Science - Audio and Speech Processing
Thema/Rubrik (Code)	501
Erscheinungsdatum	2023-03-14
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: Speech Augmentation Based Unsupervised Learning for Keyword Spotting

Luo, Jian / Wang, Jianzong / Cheng, Ning / Tang, Haobin / Xiao, Jing

2022

Abstract: In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to ... ...

Abstract	In this paper, we investigated a speech augmentation based unsupervised learning approach for keyword spotting (KWS) task. KWS is a useful speech application, yet also heavily depends on the labeled data. We designed a CNN-Attention architecture to conduct the KWS task. CNN layers focus on the local acoustic features, and attention layers model the long-time dependency. To improve the robustness of KWS model, we also proposed an unsupervised learning method. The unsupervised loss is based on the similarity between the original and augmented speech features, as well as the audio reconstructing information. Two speech augmentation methods are explored in the unsupervised learning: speed and intensity. The experiments on Google Speech Commands V2 Dataset demonstrated that our CNN-Attention model has competitive results. Moreover, the augmentation based unsupervised learning could further improve the classification accuracy of KWS task. In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC. Comment: accepted by WCCI 2022
Schlagwörter	Computer Science - Sound ; Computer Science - Computation and Language ; Electrical Engineering and Systems Science - Audio and Speech Processing
Thema/Rubrik (Code)	004
Erscheinungsdatum	2022-05-28
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Buch ; Online: SAR

Wang, Jianzong / Zhang, Xulong / Tang, Haobin / Sun, Aolan / Cheng, Ning / Xiao, Jing

Self-Supervised Anti-Distortion Representation for End-To-End Speech Model

2023

Abstract: In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features predicted from an acoustic model. However, there are always distortions existing in the predicted acoustic features, ... ...

Abstract	In recent Text-to-Speech (TTS) systems, a neural vocoder often generates speech samples by solely conditioning on acoustic features predicted from an acoustic model. However, there are always distortions existing in the predicted acoustic features, compared to those of the groundtruth, especially in the common case of poor acoustic modeling due to low-quality training data. To overcome such limits, we propose a Self-supervised learning framework to learn an Anti-distortion acoustic Representation (SAR) to replace human-crafted acoustic features by introducing distortion prior to an auto-encoder pre-training process. The learned acoustic representation from the proposed framework is proved anti-distortion compared to the most commonly used mel-spectrogram through both objective and subjective evaluation. Comment: Accepted by IJCNN2023. 2023 International Joint Conference on Neural Networks (IJCNN2023)
Schlagwörter	Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Erscheinungsdatum	2023-04-23
Erscheinungsland	us
Dokumenttyp	Buch ; Online
Datenquelle	BASE - Bielefeld Academic Search Engine (Lebenswissenschaftliche Auswahl)

Volltext online

Volltext online

Zusatzmaterialien

Fernleihe an ZB MED

Sie können sich den gewünschten Titel als lokale Nutzerin oder lokaler Nutzer von ZB MED direkt an den Standort Köln schicken lassen.

Zum Seitenanfang

Ihre letzten Suchen

Suchergebnis

Suchoptionen

Buch ; Online: ED-TTS

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Buch ; Online: QI-TTS

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Buch ; Online: EmoMix

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Artikel ; Online: Dilatation Eustachian tuboplasty with a Eustachian tube video endoscope and supporting balloon.

Volltext online

Zusatzmaterialien

Kategorien

Verfügbar in ZB MED Köln/Königswinter

Über subito bestellen

Buch ; Online: Dynamic Alignment Mask CTC

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Buch ; Online: Speech Augmentation Based Unsupervised Learning for Keyword Spotting

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED

Buch ; Online: SAR

Volltext online

Zusatzmaterialien

Kategorien

Fernleihe an ZB MED