LIVIVO - Search results -

Search results

Result 1 - 10 of total 12

Book ; Online: SQuId

Sellam, Thibault / Bapna, Ankur / Camp, Joshua / Mackinnon, Diana / Parikh, Ankur P. / Riesa, Jason

Measuring Speech Naturalness in Many Languages

2022

Abstract: Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We ... ...

Abstract	Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 locales-the largest effort of this type to date. The main insight is that training one model on many locales consistently outperforms mono-locale baselines. We present our task, the model, and show that it outperforms a competitive baseline based on w2v-BERT and VoiceMOS by 50.0%. We then demonstrate the effectiveness of cross-locale transfer during fine-tuning and highlight its effect on zero-shot locales, i.e., locales for which there is no fine-tuning data. Through a series of analyses, we highlight the role of non-linguistic effects such as sound artifacts in cross-locale transfer. Finally, we present the effect of our design decision, e.g., model size, pre-training diversity, and language rebalancing with several ablation experiments. Comment: Accepted at ICASSP 2023, with additional material in the appendix
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Subject code	410
Publishing date	2022-10-12
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Improving Multilingual Models with Language-Clustered Vocabularies

Chung, Hyung Won / Garrette, Dan / Tan, Kiat Chuan / Riesa, Jason

2020

Abstract: State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. ... ...

Abstract	State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. In this work, we introduce a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocabularies. Our experiments show improvements across languages on key multilingual benchmark tasks TyDi QA (+2.9 F1), XNLI (+2.1\%), and WikiAnn NER (+2.8 F1) and factor of 8 reduction in out-of-vocabulary rate, all without increasing the size of the model or data. Comment: Published in the main conference of EMNLP 2020
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	410
Publishing date	2020-10-24
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: FLEURS

Conneau, Alexis / Ma, Min / Khanuja, Simran / Zhang, Yu / Axelrod, Vera / Dalmia, Siddharth / Riesa, Jason / Rivera, Clara / Bapna, Ankur

Few-shot Learning Evaluation of Universal Representations of Speech

2022

Abstract: We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours ... ...

Abstract	We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Publishing date	2022-05-24
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: FRMT

Riley, Parker / Dozat, Timothy / Botha, Jan A. / Garcia, Xavier / Garrette, Dan / Riesa, Jason / Firat, Orhan / Constant, Noah

A Benchmark for Few-Shot Region-Aware Machine Translation

2022

Abstract: We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and ... ...

Abstract	We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models. Our dataset and evaluation code are publicly available: https://bit.ly/frmt-task Comment: Published in TACL Vol. 11 (2023)
Keywords	Computer Science - Computation and Language
Publishing date	2022-10-01
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Finding Fast Transformers

Tsai, Henry / Ooi, Jayden / Ferng, Chun-Sung / Chung, Hyung Won / Riesa, Jason

One-Shot Neural Architecture Search by Component Composition

2020

Abstract: Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient algorithm to search ... ...

Abstract	Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient algorithm to search for fast models while maintaining model quality. We describe a novel approach to decompose the Transformer architecture into smaller components, and propose a sampling-based one-shot architecture search method to find an optimal model for inference. The model search process is more efficient than alternatives, adding only a small overhead to training time. By applying our methods to BERT-base architectures, we achieve 10% to 30% speedup for pre-trained BERT and 70% speedup on top of a previous state-of-the-art distilled BERT model on Cloud TPU-v2 with a generally acceptable drop in performance.
Keywords	Computer Science - Machine Learning ; Statistics - Machine Learning
Subject code	006
Publishing date	2020-08-15
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Small and Practical BERT Models for Sequence Labeling

Tsai, Henry / Riesa, Jason / Johnson, Melvin / Arivazhagan, Naveen / Li, Xin / Archer, Amelia

2019

Abstract: We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final model is 6x ... ...

Abstract	We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final model is 6x smaller and 27x faster, and has higher accuracy than a state-of-the-art multilingual baseline. We show that our model especially outperforms on low-resource languages, and works on codemixed input text without being explicitly trained on codemixed examples. We showcase the effectiveness of our method by reporting on part-of-speech tagging and morphological prediction on 70 treebanks and 48 languages. Comment: 11 pages including appendices; accepted to appear at EMNLP-IJCNLP 2019
Keywords	Computer Science - Computation and Language
Publishing date	2019-08-30
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

Siddhant, Aditya / Johnson, Melvin / Tsai, Henry / Arivazhagan, Naveen / Riesa, Jason / Bapna, Ankur / Firat, Orhan / Raman, Karthik

2019

Abstract: The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages ... ...

Abstract	The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.
Keywords	Computer Science - Computation and Language
Publishing date	2019-09-01
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: SLAM

Bapna, Ankur / Chung, Yu-an / Wu, Nan / Gulati, Anmol / Jia, Ye / Clark, Jonathan H. / Johnson, Melvin / Riesa, Jason / Conneau, Alexis / Zhang, Yu

A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

2021

Abstract: Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of ... ...

Abstract	Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-training within a single model. We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. To further align our model representations across modalities, we leverage alignment losses, specifically Translation Language Modeling (TLM) and Speech Text Matching (STM) that make use of supervised speech-text recognition data. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST~2 speech translation, by around 1 BLEU compared to single-modality pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks. On four GLUE tasks and text-normalization, we observe evidence of capacity limitations and interference between the two modalities, leading to degraded performance compared to an equivalent text-only model, while still being competitive with BERT. Through extensive empirical analysis we also demonstrate the importance of the choice of objective function for speech pre-training, and the beneficial effect of adding additional supervised signals on the quality of the learned representations.
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	410
Publishing date	2021-10-19
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: XTREME-S

Conneau, Alexis / Bapna, Ankur / Zhang, Yu / Ma, Min / von Platen, Patrick / Lozhkov, Anton / Cherry, Colin / Jia, Ye / Rivera, Clara / Kale, Mihir / Van Esch, Daan / Axelrod, Vera / Khanuja, Simran / Clark, Jonathan H. / Firat, Orhan / Auli, Michael / Ruder, Sebastian / Riesa, Jason / Johnson, Melvin

Evaluating Cross-lingual Speech Representations

2022

Abstract: We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages ... ...

Abstract	We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s. Comment: Minor fix: language code for Filipino (Tagalog), "tg" -> "tl"
Keywords	Computer Science - Computation and Language
Publishing date	2022-03-21
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Google USM

Zhang, Yu / Han, Wei / Qin, James / Wang, Yongqiang / Bapna, Ankur / Chen, Zhehuai / Chen, Nanxin / Li, Bo / Axelrod, Vera / Wang, Gary / Meng, Zhong / Hu, Ke / Rosenberg, Andrew / Prabhavalkar, Rohit / Park, Daniel S. / Haghani, Parisa / Riesa, Jason / Perng, Ginger / Soltau, Hagen /

Strohman, Trevor / Ramabhadran, Bhuvana / Sainath, Tara / Moreno, Pedro / Chiu, Chung-Cheng / Schalkwyk, Johan / Beaufays, Françoise / Wu, Yonghui

Scaling Automatic Speech Recognition Beyond 100 Languages

2023

Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million ( ...

Abstract	We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages. Comment: 20 pages, 7 figures, 8 tables
Keywords	Computer Science - Computation and Language ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
Publishing date	2023-03-02
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED