LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 32

Search options

  1. Book ; Online: RMBR

    Zhang, Yidan / Wan, Yu / Liu, Dayiheng / Yang, Baosong / He, Zhenan

    A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

    2022  

    Abstract: Beam search is the most widely used decoding method for neural machine translation (NMT). In practice, the top-1 candidate with the highest log-probability among the n candidates is selected as the preferred one. However, this top-1 candidate may not be ... ...

    Abstract Beam search is the most widely used decoding method for neural machine translation (NMT). In practice, the top-1 candidate with the highest log-probability among the n candidates is selected as the preferred one. However, this top-1 candidate may not be the best overall translation among the n-best list. Recently, Minimum Bayes Risk (MBR) decoding has been proposed to improve the quality for NMT, which seeks for a consensus translation that is closest on average to other candidates from the n-best list. We argue that MBR still suffers from the following problems: The utility function only considers the lexical-level similarity between candidates; The expected utility considers the entire n-best list which is time-consuming and inadequate candidates in the tail list may hurt the performance; Only the relationship between candidates is considered. To solve these issues, we design a regularized MBR reranking framework (RMBR), which considers semantic-based similarity and computes the expected utility for each candidate by truncating the list. We expect the proposed framework to further consider the translation quality and model uncertainty of each candidate. Thus the proposed quality regularizer and uncertainty regularizer are incorporated into the framework. Extensive experiments on multiple translation tasks demonstrate the effectiveness of our method.

    Comment: 10 pages
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2022-02-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Noisy Pair Corrector for Dense Retrieval

    Zhang, Hang / Gong, Yeyun / He, Xingwei / Liu, Dayiheng / Guo, Daya / Lv, Jiancheng / Guo, Jian

    2023  

    Abstract: Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, training pairs in real-world applications are usually collected automatically, which ...

    Abstract Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, training pairs in real-world applications are usually collected automatically, which inevitably introduces mismatched-pair noise. In this paper, we explore an interesting and challenging problem in dense retrieval, how to train an effective model with mismatched-pair noise. To solve this problem, we propose a novel approach called Noisy Pair Corrector (NPC), which consists of a detection module and a correction module. The detection module estimates noise pairs by calculating the perplexity between annotated positive and easy negative documents. The correction module utilizes an exponential moving average (EMA) model to provide a soft supervised signal, aiding in mitigating the effects of noise. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS. Experimental results show that NPC achieves excellent performance in handling both synthetic and realistic noise.

    Comment: Findings of EMNLP 2023
    Keywords Computer Science - Computation and Language
    Subject code 006
    Publishing date 2023-11-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Let's be Humorous

    Zhang, Hang / Liu, Dayiheng / Lv, Jiancheng / Luo, Cheng

    Knowledge Enhanced Humor Generation

    2020  

    Abstract: The generation of humor is an under-explored and challenging problem. Previous works mainly utilize templates or replace phrases to generate humor. However, few works focus on freer forms and the background knowledge of humor. The linguistic theory of ... ...

    Abstract The generation of humor is an under-explored and challenging problem. Previous works mainly utilize templates or replace phrases to generate humor. However, few works focus on freer forms and the background knowledge of humor. The linguistic theory of humor defines the structure of a humor sentence as set-up and punchline. In this paper, we explore how to generate a punchline given the set-up with the relevant knowledge. We propose a framework that can fuse the knowledge to end-to-end models. To our knowledge, this is the first attempt to generate punchlines with knowledge enhanced model. Furthermore, we create the first humor-knowledge dataset. The experimental results demonstrate that our method can make use of knowledge to generate fluent, funny punchlines, which outperforms several baselines.
    Keywords Computer Science - Computation and Language
    Publishing date 2020-04-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Prediction, Selection, and Generation

    Luo, Cheng / Liu, Dayiheng / Li, Chanjuan / Lu, Li / Lv, Jiancheng

    Exploration of Knowledge-Driven Conversation System

    2021  

    Abstract: In open-domain conversational systems, it is important but challenging to leverage background knowledge. We can use the incorporation of knowledge to make the generation of dialogue controllable, and can generate more diverse sentences that contain real ... ...

    Abstract In open-domain conversational systems, it is important but challenging to leverage background knowledge. We can use the incorporation of knowledge to make the generation of dialogue controllable, and can generate more diverse sentences that contain real knowledge. In this paper, we combine the knowledge bases and pre-training model to propose a knowledge-driven conversation system. The system includes modules such as dialogue topic prediction, knowledge matching and dialogue generation. Based on this system, we study the performance factors that maybe affect the generation of knowledge-driven dialogue: topic coarse recall algorithm, number of knowledge choices, generation model choices, etc., and finally made the system reach state-of-the-art. These experimental results will provide some guiding significance for the future research of this task. As far as we know, this is the first work to study and analyze the effects of the related factors.
    Keywords Computer Science - Artificial Intelligence
    Subject code 004
    Publishing date 2021-04-23
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Competency-Aware Neural Machine Translation

    Zhang, Pei / Yang, Baosong / Wei, Haoran / Liu, Dayiheng / Fan, Kai / Si, Luo / Xie, Jun

    Can Machine Translation Know its Own Translation Quality?

    2022  

    Abstract: Neural machine translation (NMT) is often criticized for failures that happen without awareness. The lack of competency awareness makes NMT untrustworthy. This is in sharp contrast to human translators who give feedback or conduct further investigations ... ...

    Abstract Neural machine translation (NMT) is often criticized for failures that happen without awareness. The lack of competency awareness makes NMT untrustworthy. This is in sharp contrast to human translators who give feedback or conduct further investigations whenever they are in doubt about predictions. To fill this gap, we propose a novel competency-aware NMT by extending conventional NMT with a self-estimator, offering abilities to translate a source sentence and estimate its competency. The self-estimator encodes the information of the decoding procedure and then examines whether it can reconstruct the original semantics of the source sentence. Experimental results on four translation tasks demonstrate that the proposed method not only carries out translation tasks intact but also delivers outstanding performance on quality estimation. Without depending on any reference or annotated data typically required by state-of-the-art metric and quality estimation methods, our model yields an even higher correlation with human quality judgments than a variety of aforementioned methods, such as BLEURT, COMET, and BERTScore. Quantitative and qualitative analyses show better robustness of competency awareness in our model.

    Comment: accepted to EMNLP 2022
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence
    Subject code 410
    Publishing date 2022-11-24
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Tailor

    Yang, Kexin / Liu, Dayiheng / Lei, Wenqiang / Yang, Baosong / Xue, Mingfeng / Chen, Boxing / Xie, Jun

    A Prompt-Based Approach to Attribute-Based Controlled Text Generation

    2022  

    Abstract: Attribute-based Controlled Text Generation (CTG) refers to generating sentences that satisfy desirable attributes (e.g., emotions and topics). Existing works often utilize fine-tuning or resort to extra attribute classifiers, yet suffer from storage and ... ...

    Abstract Attribute-based Controlled Text Generation (CTG) refers to generating sentences that satisfy desirable attributes (e.g., emotions and topics). Existing works often utilize fine-tuning or resort to extra attribute classifiers, yet suffer from storage and inference time increases. To address these concerns, we explore attribute-based CTG in a prompt-based manner. In short, the proposed Tailor represents each attribute as a pre-trained continuous vector (i.e., single-attribute prompt) and guides the generation of a fixed PLM switch to a pre-specified attribute. We experimentally find that these prompts can be simply concatenated as a whole to multi-attribute CTG without any re-training, yet raises problems of fluency decrease and position sensitivity. To this end, Tailor provides a multi-attribute prompt mask and a re-indexing position-ids sequence to bridge the gap between the training (one prompt for each task) and testing stage (concatenating more than one prompt). To further enhance such single-attribute prompt combinations, Tailor also introduces a trainable prompt connector, which can be concatenated with any two single-attribute prompts to multi-attribute text generation. Experiments on 11 attribute-specific generation tasks demonstrate strong performances of Tailor on both single-attribute and multi-attribute CTG, with 0.08\% training parameters of a GPT-2.
    Keywords Computer Science - Computation and Language
    Publishing date 2022-04-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Draft, Command, and Edit

    Yang, Kexin / Liu, Dayiheng / Lei, Wenqiang / Yang, Baosong / Qu, Qian / Lv, Jiancheng

    Controllable Text Editing in E-Commerce

    2022  

    Abstract: Product description generation is a challenging and under-explored task. Most such work takes a set of product attributes as inputs then generates a description from scratch in a single pass. However, this widespread paradigm might be limited when facing ...

    Abstract Product description generation is a challenging and under-explored task. Most such work takes a set of product attributes as inputs then generates a description from scratch in a single pass. However, this widespread paradigm might be limited when facing the dynamic wishes of users on constraining the description, such as deleting or adding the content of a user-specified attribute based on the previous version. To address this challenge, we explore a new draft-command-edit manner in description generation, leading to the proposed new task-controllable text editing in E-commerce. More specifically, we allow systems to receive a command (deleting or adding) from the user and then generate a description by flexibly modifying the content based on the previous version. It is easier and more practical to meet the new needs by modifying previous versions than generating from scratch. Furthermore, we design a data augmentation method to remedy the low resource challenge in this task, which contains a model-based and a rule-based strategy to imitate the edit by humans. To accompany this new task, we present a human-written draft-command-edit dataset called E-cEdits and a new metric "Attribute Edit". Our experimental results show that using the new data augmentation method outperforms baselines to a greater extent in both automatic and human evaluations.
    Keywords Computer Science - Computation and Language
    Subject code 004
    Publishing date 2022-08-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Towards Fine-Grained Information

    Bao, Keqin / Wan, Yu / Liu, Dayiheng / Yang, Baosong / Lei, Wenqiang / He, Xiangnan / Wong, Derek F. / Xie, Jun

    Identifying the Type and Location of Translation Errors

    2023  

    Abstract: Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose ... ...

    Abstract Fine-grained information on translation errors is helpful for the translation evaluation community. Existing approaches can not synchronously consider error position and type, failing to integrate the error information of both. In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs. Besides, we build an FG-TED model to predict the \textbf{addition} and \textbf{omission} errors -- two typical translation accuracy errors. First, we use a word-level classification paradigm to form our model and use the shortcut learning reduction to relieve the influence of monolingual features. Besides, we construct synthetic datasets for model training, and relieve the disagreement of data labeling in authoritative datasets, making the experimental benchmark concordant. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results on the restored dataset. Our model also delivers more reliable predictions on low-resource and transfer scenarios than existing baselines. The related datasets and the source code will be released in the future.
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2023-02-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

    Dong, Guanting / Yuan, Hongyi / Lu, Keming / Li, Chengpeng / Xue, Mingfeng / Liu, Dayiheng / Wang, Wei / Yuan, Zheng / Zhou, Chang / Zhou, Jingren

    2023  

    Abstract: Large language models (LLMs) with enormous pre-training tokens and parameter amounts emerge abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). The open- ... ...

    Abstract Large language models (LLMs) with enormous pre-training tokens and parameter amounts emerge abilities, including math reasoning, code generation, and instruction following. These abilities are further enhanced by supervised fine-tuning (SFT). The open-source community has studied on ad-hoc SFT for each ability, while proprietary LLMs are versatile for all abilities. It is important to investigate how to unlock them with multiple abilities via SFT. In this study, we specifically focus on the data composition between mathematical reasoning, code generation, and general human-aligning abilities during SFT. From a scaling perspective, we investigate the relationship between model abilities and various factors including data amounts, data composition ratio, model parameters, and SFT strategies. Our experiments reveal that different abilities exhibit different scaling patterns, and larger models generally show superior performance with the same amount of data. Mathematical reasoning and code generation improve as data amounts increase consistently, while the general ability is enhanced with about a thousand samples and improves slowly. We find data composition results in various abilities improvements with low data amounts, while conflicts of abilities with high data amounts. Our experiments further show that composition data amount impacts performance, while the influence of composition ratio is insignificant. Regarding the SFT strategies, we evaluate sequential learning multiple abilities are prone to catastrophic forgetting. Our proposed Dual-stage Mixed Fine-tuning (DMT) strategy learns specialized abilities first and then learns general abilities with a small amount of specialized data to prevent forgetting, offering a promising solution to learn multiple abilities with different scaling patterns.

    Comment: 16 pages, 8 figures
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2023-10-09
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Bridging the Domain Gaps in Context Representations for k-Nearest Neighbor Neural Machine Translation

    Cao, Zhiwei / Yang, Baosong / Lin, Huan / Wu, Suhang / Wei, Xiangpeng / Liu, Dayiheng / Xie, Jun / Zhang, Min / Su, Jinsong

    2023  

    Abstract: k$-Nearest neighbor machine translation ($k$NN-MT) has attracted increasing attention due to its ability to non-parametrically adapt to new translation domains. By using an upstream NMT model to traverse the downstream training corpus, it is equipped ... ...

    Abstract $k$-Nearest neighbor machine translation ($k$NN-MT) has attracted increasing attention due to its ability to non-parametrically adapt to new translation domains. By using an upstream NMT model to traverse the downstream training corpus, it is equipped with a datastore containing vectorized key-value pairs, which are retrieved during inference to benefit translation. However, there often exists a significant gap between upstream and downstream domains, which hurts the retrieval accuracy and the final translation quality. To deal with this issue, we propose a novel approach to boost the datastore retrieval of $k$NN-MT by reconstructing the original datastore. Concretely, we design a reviser to revise the key representations, making them better fit for the downstream domain. The reviser is trained using the collected semantically-related key-queries pairs, and optimized by two proposed losses: one is the key-queries semantic distance ensuring each revised key representation is semantically related to its corresponding queries, and the other is an L2-norm loss encouraging revised key representations to effectively retain the knowledge learned by the upstream NMT model. Extensive experiments on domain adaptation tasks demonstrate that our method can effectively boost the datastore retrieval and translation quality of $k$NN-MT.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/RevisedKey-knn-mt}.}

    Comment: Accepted to ACL 2023
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2023-05-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top