LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 12

Search options

  1. Book ; Online: SQuId

    Sellam, Thibault / Bapna, Ankur / Camp, Joshua / Mackinnon, Diana / Parikh, Ankur P. / Riesa, Jason

    Measuring Speech Naturalness in Many Languages

    2022  

    Abstract: Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We ... ...

    Abstract Much of text-to-speech research relies on human evaluation, which incurs heavy costs and slows down the development process. The problem is particularly acute in heavily multilingual applications, where recruiting and polling judges can take weeks. We introduce SQuId (Speech Quality Identification), a multilingual naturalness prediction model trained on over a million ratings and tested in 65 locales-the largest effort of this type to date. The main insight is that training one model on many locales consistently outperforms mono-locale baselines. We present our task, the model, and show that it outperforms a competitive baseline based on w2v-BERT and VoiceMOS by 50.0%. We then demonstrate the effectiveness of cross-locale transfer during fine-tuning and highlight its effect on zero-shot locales, i.e., locales for which there is no fine-tuning data. Through a series of analyses, we highlight the role of non-linguistic effects such as sound artifacts in cross-locale transfer. Finally, we present the effect of our design decision, e.g., model size, pre-training diversity, and language rebalancing with several ablation experiments.

    Comment: Accepted at ICASSP 2023, with additional material in the appendix
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Subject code 410
    Publishing date 2022-10-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Improving Multilingual Models with Language-Clustered Vocabularies

    Chung, Hyung Won / Garrette, Dan / Tan, Kiat Chuan / Riesa, Jason

    2020  

    Abstract: State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. ... ...

    Abstract State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. In this work, we introduce a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several automatically derived language clusters, thus balancing the trade-off between cross-lingual subword sharing and language-specific vocabularies. Our experiments show improvements across languages on key multilingual benchmark tasks TyDi QA (+2.9 F1), XNLI (+2.1\%), and WikiAnn NER (+2.8 F1) and factor of 8 reduction in out-of-vocabulary rate, all without increasing the size of the model or data.

    Comment: Published in the main conference of EMNLP 2020
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning
    Subject code 410
    Publishing date 2020-10-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: FLEURS

    Conneau, Alexis / Ma, Min / Khanuja, Simran / Zhang, Yu / Axelrod, Vera / Dalmia, Siddharth / Riesa, Jason / Rivera, Clara / Bapna, Ankur

    Few-shot Learning Evaluation of Universal Representations of Speech

    2022  

    Abstract: We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours ... ...

    Abstract We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Publishing date 2022-05-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: FRMT

    Riley, Parker / Dozat, Timothy / Botha, Jan A. / Garcia, Xavier / Garrette, Dan / Riesa, Jason / Firat, Orhan / Constant, Noah

    A Benchmark for Few-Shot Region-Aware Machine Translation

    2022  

    Abstract: We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and ... ...

    Abstract We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distractor terms. We explore automatic evaluation metrics for FRMT and validate their correlation with expert human evaluation across both region-matched and mismatched rating scenarios. Finally, we present a number of baseline models for this task, and offer guidelines for how researchers can train, evaluate, and compare their own models. Our dataset and evaluation code are publicly available: https://bit.ly/frmt-task

    Comment: Published in TACL Vol. 11 (2023)
    Keywords Computer Science - Computation and Language
    Publishing date 2022-10-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Finding Fast Transformers

    Tsai, Henry / Ooi, Jayden / Ferng, Chun-Sung / Chung, Hyung Won / Riesa, Jason

    One-Shot Neural Architecture Search by Component Composition

    2020  

    Abstract: Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient algorithm to search ... ...

    Abstract Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient algorithm to search for fast models while maintaining model quality. We describe a novel approach to decompose the Transformer architecture into smaller components, and propose a sampling-based one-shot architecture search method to find an optimal model for inference. The model search process is more efficient than alternatives, adding only a small overhead to training time. By applying our methods to BERT-base architectures, we achieve 10% to 30% speedup for pre-trained BERT and 70% speedup on top of a previous state-of-the-art distilled BERT model on Cloud TPU-v2 with a generally acceptable drop in performance.
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2020-08-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Small and Practical BERT Models for Sequence Labeling

    Tsai, Henry / Riesa, Jason / Johnson, Melvin / Arivazhagan, Naveen / Li, Xin / Archer, Amelia

    2019  

    Abstract: We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final model is 6x ... ...

    Abstract We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final model is 6x smaller and 27x faster, and has higher accuracy than a state-of-the-art multilingual baseline. We show that our model especially outperforms on low-resource languages, and works on codemixed input text without being explicitly trained on codemixed examples. We showcase the effectiveness of our method by reporting on part-of-speech tagging and morphological prediction on 70 treebanks and 48 languages.

    Comment: 11 pages including appendices; accepted to appear at EMNLP-IJCNLP 2019
    Keywords Computer Science - Computation and Language
    Publishing date 2019-08-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

    Siddhant, Aditya / Johnson, Melvin / Tsai, Henry / Arivazhagan, Naveen / Riesa, Jason / Bapna, Ankur / Firat, Orhan / Raman, Karthik

    2019  

    Abstract: The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages ... ...

    Abstract The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream classification and sequence labeling tasks covering a diverse set of over 50 languages. We compare against a strong baseline, multilingual BERT (mBERT), in different cross-lingual transfer learning scenarios and show gains in zero-shot transfer in 4 out of these 5 tasks.
    Keywords Computer Science - Computation and Language
    Publishing date 2019-09-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: SLAM

    Bapna, Ankur / Chung, Yu-an / Wu, Nan / Gulati, Anmol / Jia, Ye / Clark, Jonathan H. / Johnson, Melvin / Riesa, Jason / Conneau, Alexis / Zhang, Yu

    A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

    2021  

    Abstract: Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of ... ...

    Abstract Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a variety of domains and languages. This paper takes the universality of unsupervised language pre-training one step further, by unifying speech and text pre-training within a single model. We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech. To further align our model representations across modalities, we leverage alignment losses, specifically Translation Language Modeling (TLM) and Speech Text Matching (STM) that make use of supervised speech-text recognition data. We demonstrate that incorporating both speech and text data during pre-training can significantly improve downstream quality on CoVoST~2 speech translation, by around 1 BLEU compared to single-modality pre-trained models, while retaining close to SotA performance on LibriSpeech and SpeechStew ASR tasks. On four GLUE tasks and text-normalization, we observe evidence of capacity limitations and interference between the two modalities, leading to degraded performance compared to an equivalent text-only model, while still being competitive with BERT. Through extensive empirical analysis we also demonstrate the importance of the choice of objective function for speech pre-training, and the beneficial effect of adding additional supervised signals on the quality of the learned representations.
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning
    Subject code 410
    Publishing date 2021-10-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: XTREME-S

    Conneau, Alexis / Bapna, Ankur / Zhang, Yu / Ma, Min / von Platen, Patrick / Lozhkov, Anton / Cherry, Colin / Jia, Ye / Rivera, Clara / Kale, Mihir / Van Esch, Daan / Axelrod, Vera / Khanuja, Simran / Clark, Jonathan H. / Firat, Orhan / Auli, Michael / Ruder, Sebastian / Riesa, Jason / Johnson, Melvin

    Evaluating Cross-lingual Speech Representations

    2022  

    Abstract: We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages ... ...

    Abstract We introduce XTREME-S, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, speech-to-text translation and retrieval. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. Datasets and fine-tuning scripts are made easily accessible at https://hf.co/datasets/google/xtreme_s.

    Comment: Minor fix: language code for Filipino (Tagalog), "tg" -> "tl"
    Keywords Computer Science - Computation and Language
    Publishing date 2022-03-21
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Google USM

    Zhang, Yu / Han, Wei / Qin, James / Wang, Yongqiang / Bapna, Ankur / Chen, Zhehuai / Chen, Nanxin / Li, Bo / Axelrod, Vera / Wang, Gary / Meng, Zhong / Hu, Ke / Rosenberg, Andrew / Prabhavalkar, Rohit / Park, Daniel S. / Haghani, Parisa / Riesa, Jason / Perng, Ginger / Soltau, Hagen /
    Strohman, Trevor / Ramabhadran, Bhuvana / Sainath, Tara / Moreno, Pedro / Chiu, Chung-Cheng / Schalkwyk, Johan / Beaufays, Françoise / Wu, Yonghui

    Scaling Automatic Speech Recognition Beyond 100 Languages

    2023  

    Abstract: We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million ( ...

    Abstract We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.

    Comment: 20 pages, 7 figures, 8 tables
    Keywords Computer Science - Computation and Language ; Computer Science - Sound ; Electrical Engineering and Systems Science - Audio and Speech Processing
    Publishing date 2023-03-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top