LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 53

Search options

  1. Book ; Online: Conformalizing Machine Translation Evaluation

    Zerva, Chrysoula / Martins, André F. T.

    2023  

    Abstract: Several uncertainty estimation methods have been recently proposed for machine translation evaluation. While these methods can provide a useful indication of when not to trust model predictions, we show in this paper that the majority of them tend to ... ...

    Abstract Several uncertainty estimation methods have been recently proposed for machine translation evaluation. While these methods can provide a useful indication of when not to trust model predictions, we show in this paper that the majority of them tend to underestimate model uncertainty, and as a result they often produce misleading confidence intervals that do not cover the ground truth. We propose as an alternative the use of conformal prediction, a distribution-free method to obtain confidence intervals with a theoretically established guarantee on coverage. First, we demonstrate that split conformal prediction can ``correct'' the confidence intervals of previous methods to yield a desired coverage level. Then, we highlight biases in estimated confidence intervals, both in terms of the translation language pairs and the quality of translations. We apply conditional conformal prediction techniques to obtain calibration subsets for each data subgroup, leading to equalized coverage.
    Keywords Computer Science - Computation and Language
    Subject code 310
    Publishing date 2023-06-09
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Non-Exchangeable Conformal Language Generation with Nearest Neighbors

    Ulmer, Dennis / Zerva, Chrysoula / Martins, André F. T.

    2024  

    Abstract: Quantifying uncertainty in automatically generated text is important for letting humans check potential hallucinations and making systems more reliable. Conformal prediction is an attractive framework to provide predictions imbued with statistical ... ...

    Abstract Quantifying uncertainty in automatically generated text is important for letting humans check potential hallucinations and making systems more reliable. Conformal prediction is an attractive framework to provide predictions imbued with statistical guarantees, however, its application to text generation is challenging since any i.i.d. assumptions are not realistic. In this paper, we bridge this gap by leveraging recent results on non-exchangeable conformal prediction, which still ensures bounds on coverage. The result, non-exchangeable conformal nucleus sampling, is a novel extension of the conformal prediction framework to generation based on nearest neighbors. Our method can be used post-hoc for an arbitrary model without extra training and supplies token-level, calibrated prediction sets equipped with statistical guarantees. Experiments in machine translation and language modeling show encouraging results in generation quality. By also producing tighter prediction sets with good coverage, we thus give a more theoretically principled way to perform sampling with conformal guarantees.
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2024-02-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Reconciling the Discrete-Continuous Divide

    Martins, André F. T.

    Towards a Mathematical Theory of Sparse Communication

    2021  

    Abstract: Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols. Reconciling these two forms of communication is desirable to generate human-readable interpretations or to learn ... ...

    Abstract Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols. Reconciling these two forms of communication is desirable to generate human-readable interpretations or to learn discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the hard concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids. Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce a new entropy function that includes the discrete and differential entropies as particular cases, and has an interpretation in terms of code optimality, as well as two other information-theoretic counterparts that generalize the mutual information and Kullback-Leibler divergences. Finally, we introduce "mixed languages" as strings of hybrid symbols and a new mixed weighted finite state automaton that recognizes a class of regular mixed languages, generalizing closure properties of regular languages.
    Keywords Computer Science - Machine Learning
    Subject code 519
    Publishing date 2021-04-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: BLEU Meets COMET

    Glushkova, Taisiya / Zerva, Chrysoula / Martins, André F. T.

    Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

    2023  

    Abstract: Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such ... ...

    Abstract Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers. In contrast, traditional evaluation metrics, such as BLEU or chrF, which measure lexical or character overlap between translation hypotheses and human references, have lower correlations with human judgements but are sensitive to such deviations. In this paper, we investigate several ways of combining the two approaches in order to increase robustness of state-of-the-art evaluation methods to translations with critical errors. We show that by using additional information during training, such as sentence-level features and word-level tags, the trained metrics improve their capability to penalize translations with specific troublesome phenomena, which leads to gains in correlation with human judgments and on recent challenge sets on several language pairs.

    Comment: Accepted at EAMT 2023
    Keywords Computer Science - Computation and Language
    Subject code 400
    Publishing date 2023-05-30
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Smoothing and Shrinking the Sparse Seq2Seq Search Space

    Peters, Ben / Martins, André F. T.

    2021  

    Abstract: Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is ... ...

    Abstract Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax -- the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since they can shrink the search space by assigning zero probability to bad hypotheses, but their ability to handle word-level tasks with transformers has never been tested. In this work, we show that entmax-based models effectively solve the cat got your tongue problem, removing a major source of model error for neural machine translation. In addition, we generalize label smoothing, a critical regularization technique, to the broader family of Fenchel-Young losses, which includes both cross-entropy and the entmax losses. Our resulting label-smoothed entmax loss models set a new state of the art on multilingual grapheme-to-phoneme conversion and deliver improvements and better calibration properties on cross-lingual morphological inflection and machine translation for 6 language pairs.

    Comment: NAACL 2021
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2021-03-18
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Aligning Neural Machine Translation Models

    Ramos, Miguel Moura / Fernandes, Patrick / Farinhas, António / Martins, André F. T.

    Human Feedback in Training and Inference

    2023  

    Abstract: Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a language model, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving large ... ...

    Abstract Reinforcement learning from human feedback (RLHF) is a recent technique to improve the quality of the text generated by a language model, making it closer to what humans would generate. A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs. In machine translation (MT), where metrics trained from human annotations can readily be used as reward models, recent methods using minimum Bayes risk decoding and reranking have succeeded in improving the final quality of translation. In this study, we comprehensively explore and compare techniques for integrating quality metrics as reward models into the MT pipeline. This includes using the reward model for data filtering, during the training phase through RL, and at inference time by employing reranking techniques, and we assess the effects of combining these in a unified approach. Our experimental results, conducted across multiple translation tasks, underscore the crucial role of effective data filtering, based on estimated quality, in harnessing the full potential of RL in enhancing MT quality. Furthermore, our findings demonstrate the effectiveness of combining RL training with reranking techniques, showcasing substantial improvements in translation quality.

    Comment: 14 pages, work-in-progress
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2023-11-15
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: CREST

    Treviso, Marcos / Ross, Alexis / Guerreiro, Nuno M. / Martins, André F. T.

    A Joint Framework for Rationalization and Counterfactual Text Generation

    2023  

    Abstract: Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models. However, prior work has not explored how these methods can be integrated to combine ... ...

    Abstract Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models. However, prior work has not explored how these methods can be integrated to combine their complementary advantages. We overcome this limitation by introducing CREST (ContRastive Edits with Sparse raTionalization), a joint framework for selective rationalization and counterfactual text generation, and show that this framework leads to improvements in counterfactual quality, model robustness, and interpretability. First, CREST generates valid counterfactuals that are more natural than those produced by previous methods, and subsequently can be used for data augmentation at scale, reducing the need for human-generated examples. Second, we introduce a new loss function that leverages CREST counterfactuals to regularize selective rationales and show that this regularization improves both model robustness and rationale quality, compared to methods that do not leverage CREST counterfactuals. Our results demonstrate that CREST successfully bridges the gap between selective rationales and counterfactual examples, addressing the limitations of existing methods and providing a more comprehensive view of a model's predictions.

    Comment: Accepted at ACL 2023 (main)
    Keywords Computer Science - Computation and Language
    Subject code 006
    Publishing date 2023-05-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: An Empirical Study of Translation Hypothesis Ensembling with Large Language Models

    Farinhas, António / de Souza, José G. C. / Martins, André F. T.

    2023  

    Abstract: Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output. In this paper, we investigate how hypothesis ensembling can improve the quality of the generated text for the specific ... ...

    Abstract Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output. In this paper, we investigate how hypothesis ensembling can improve the quality of the generated text for the specific problem of LLM-based machine translation. We experiment with several techniques for ensembling hypotheses produced by LLMs such as ChatGPT, LLaMA, and Alpaca. We provide a comprehensive study along multiple dimensions, including the method to generate hypotheses (multiple prompts, temperature-based sampling, and beam search) and the strategy to produce the final translation (instruction-based, quality-based reranking, and minimum Bayes risk (MBR) decoding). Our results show that MBR decoding is a very effective method, that translation quality can be improved using a small number of samples, and that instruction tuning has a strong impact on the relation between the diversity of the hypotheses and the sampling temperature.

    Comment: EMNLP 2023 (main conference)
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2023-10-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Non-Exchangeable Conformal Risk Control

    Farinhas, António / Zerva, Chrysoula / Ulmer, Dennis / Martins, André F. T.

    2023  

    Abstract: Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ... ...

    Abstract Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth. While the original formulation assumes data exchangeability, some extensions handle non-exchangeable data, which is often the case in many real-world scenarios. In parallel, some progress has been made in conformal methods that provide statistical guarantees for a broader range of objectives, such as bounding the best $F_1$-score or minimizing the false negative rate in expectation. In this paper, we leverage and extend these two lines of work by proposing non-exchangeable conformal risk control, which allows controlling the expected value of any monotone loss function when the data is not exchangeable. Our framework is flexible, makes very few assumptions, and allows weighting the data based on its relevance for a given test example; a careful choice of weights may result on tighter bounds, making our framework useful in the presence of change points, time series, or other forms of distribution drift. Experiments with both synthetic and real world data show the usefulness of our method.

    Comment: ICLR 2024
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Subject code 310
    Publishing date 2023-10-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Looking for a Needle in a Haystack

    Guerreiro, Nuno M. / Voita, Elena / Martins, André F. T.

    A Comprehensive Study of Hallucinations in Neural Machine Translation

    2022  

    Abstract: Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial ... ...

    Abstract Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristics. In this paper, we set foundations for the study of NMT hallucinations. First, we work in a natural setting, i.e., in-domain data without artificial noise neither in training nor in inference. Next, we annotate a dataset of over 3.4k sentences indicating different kinds of critical errors and hallucinations. Then, we turn to detection methods and both revisit methods used previously and propose using glass-box uncertainty-based detectors. Overall, we show that for preventive settings, (i) previously used methods are largely inadequate, (ii) sequence log-probability works best and performs on par with reference-based methods. Finally, we propose DeHallucinator, a simple method for alleviating hallucinations at test time that significantly reduces the hallucinatory rate. To ease future research, we release our annotated dataset for WMT18 German-English data, along with the model, training data, and code.

    Comment: Accepted at EACL23 (main)
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2022-08-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top