LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 20

Search options

  1. Article ; Online: Exhaustivity and Anti-Exhaustivity in the RSA Framework: Testing the Effect of Prior Beliefs.

    Cremers, Alexandre / Wilcox, Ethan G / Spector, Benjamin

    Cognitive science

    2023  Volume 47, Issue 5, Page(s) e13286

    Abstract: During communication, the interpretation of utterances is sensitive to a listener's probabilistic prior beliefs. In this paper, we focus on the influence of prior beliefs on so-called exhaustivity interpretations, whereby a sentence such as Mary came is ... ...

    Abstract During communication, the interpretation of utterances is sensitive to a listener's probabilistic prior beliefs. In this paper, we focus on the influence of prior beliefs on so-called exhaustivity interpretations, whereby a sentence such as Mary came is understood to mean that only Mary came. Two theoretical origins for exhaustivity effects have been proposed in the previous literature. On the one hand are perspectives that view these inferences as the result of a purely pragmatic process (as in the classical Gricean view, and more recent Bayesian approaches); on the other hand are proposals that treat them as the result of an encapsulated semantic mechanism (Chierchia, Fox & Spector 2012). We gain traction on adjudicating between these two approaches with new theoretical and experimental evidence, focusing on the behavior of different models for exhaustivity effects, all of which fit under the Rational Speech Act modeling framework (RSA, Frank & Goodman, 2012). Some (but not all!) of these models include an encapsulated semantic mechanism. Theoretically, we demonstrate that many RSA models predict not only exhaustivity, but also anti-exhaustivity, whereby "Mary came" would convey that Mary and someone else came. We evaluate these models against data obtained in a new study which tested the effects of prior beliefs on both production and comprehension, improving on previous empirical work. We find that the models which have the best fit to human behavior include an encapsulated exhaustivity mechanism. We conclude that, on the one hand, in the division of labor between semantics and pragmatics, semantics plays a larger role than is often thought, but, on the other hand, the tradeoff between informativity and cost which characterizes all RSA models does play a central role for genuine pragmatic effects.
    MeSH term(s) Humans ; Bayes Theorem ; Language ; Semantics ; Communication ; Speech ; Comprehension
    Language English
    Publishing date 2023-05-05
    Publishing country United States
    Document type Journal Article ; Research Support, Non-U.S. Gov't
    ZDB-ID 2002940-8
    ISSN 1551-6709 ; 0364-0213
    ISSN (online) 1551-6709
    ISSN 0364-0213
    DOI 10.1111/cogs.13286
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Book ; Online: Call for Papers -- The BabyLM Challenge

    Warstadt, Alex / Choshen, Leshem / Mueller, Aaron / Williams, Adina / Wilcox, Ethan / Zhuang, Chengxu

    Sample-efficient pretraining on a developmentally plausible corpus

    2023  

    Abstract: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low- ... ...

    Abstract We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children. The task has three tracks, two of which restrict the training data to pre-released datasets of 10M and 100M words and are dedicated to explorations of approaches such as architectural variations, self-supervised objectives, or curriculum learning. The final track only restricts the amount of text used, allowing innovation in the choice of the data, its domain, and even its modality (i.e., data from sources other than text is welcome). We will release a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, including targeted syntactic evaluations and natural language understanding.
    Keywords Computer Science - Computation and Language
    Subject code 420
    Publishing date 2023-01-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Book ; Online: Controlled Text Generation with Natural Language Instructions

    Zhou, Wangchunshu / Jiang, Yuchen Eleanor / Wilcox, Ethan / Cotterell, Ryan / Sachan, Mrinmaya

    2023  

    Abstract: Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training. Nevertheless, it is notoriously difficult to control their generation to satisfy the various ... ...

    Abstract Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training. Nevertheless, it is notoriously difficult to control their generation to satisfy the various constraints required by different applications. In this work, we present InstructCTG, a controlled text generation framework that incorporates different constraints by conditioning on natural language descriptions and demonstrations of the constraints. In particular, we first extract the underlying constraints of natural texts through a combination of off-the-shelf NLP tools and simple heuristics. We then verbalize the constraints into natural language instructions to form weakly supervised training data. By prepending natural language descriptions of the constraints and a few demonstrations, we fine-tune a pre-trained language model to incorporate various types of constraints. Compared to existing search-based or score-based methods, InstructCTG is more flexible to different constraint types and has a much smaller impact on the generation quality and speed because it does not modify the decoding procedure. Additionally, InstructCTG allows the model to adapt to new constraints without re-training through the use of few-shot task generalization and in-context learning abilities of instruction-tuned language models.

    Comment: ICML 2023
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
    Subject code 004
    Publishing date 2023-04-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Book ; Online: Quantifying the redundancy between prosody and text

    Wolf, Lukas / Pimentel, Tiago / Fedorenko, Evelina / Cotterell, Ryan / Warstadt, Alex / Wilcox, Ethan / Regev, Tamar

    2023  

    Abstract: Prosody -- the suprasegmental component of speech, including pitch, loudness, and tempo -- carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. ... ...

    Abstract Prosody -- the suprasegmental component of speech, including pitch, loudness, and tempo -- carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. We use large language models (LLMs) to estimate how much information is redundant between prosody and the words themselves. Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features, including intensity, duration, pauses, and pitch contours. Furthermore, a word's prosodic information is redundant with both the word itself and the context preceding as well as following it. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words. Along with this paper, we release a general-purpose data processing pipeline for quantifying the relationship between linguistic information and extra-linguistic features.

    Comment: Published at The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Information Theory ; Computer Science - Machine Learning
    Subject code 400
    Publishing date 2023-11-28
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Testing the Predictions of Surprisal Theory in 11 Languages

    Wilcox, Ethan Gotlieb / Pimentel, Tiago / Meister, Clara / Cotterell, Ryan / Levy, Roger P.

    2023  

    Abstract: A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, ... ...

    Abstract A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, i.e. its negative log-probability given a context. While evidence supporting the predictions of Surprisal Theory have been replicated widely, most have focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times; (ii) whether expected surprisal, i.e. contextual entropy, is predictive of reading times; (iii) and whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.

    Comment: This is a pre-MIT Press publication version of the paper
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2023-07-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: WhisBERT

    Wolf, Lukas / Tuckute, Greta / Kotar, Klemen / Hosseini, Eghbal / Regev, Tamar / Wilcox, Ethan / Warstadt, Alex

    Multimodal Text-Audio Language Modeling on 100M Words

    2023  

    Abstract: Training on multiple modalities of input can augment the capabilities of a language model. Here, we ask whether such a training regime can improve the quality and efficiency of these systems as well. We focus on text--audio and introduce Whisbert, which ... ...

    Abstract Training on multiple modalities of input can augment the capabilities of a language model. Here, we ask whether such a training regime can improve the quality and efficiency of these systems as well. We focus on text--audio and introduce Whisbert, which is inspired by the text--image approach of FLAVA (Singh et al., 2022). In accordance with Babylm guidelines (Warstadt et al., 2023), we pretrain Whisbert on a dataset comprising only 100 million words plus their corresponding speech from the word-aligned version of the People's Speech dataset (Galvez et al., 2021). To assess the impact of multimodality, we compare versions of the model that are trained on text only and on both audio and text simultaneously. We find that while Whisbert is able to perform well on multimodal masked modeling and surpasses the Babylm baselines in most benchmark tasks, it struggles to optimize its complex objective and outperform its text-only Whisbert baseline.

    Comment: Published at the BabyLM Challenge, a shared task co-sponsored by CMCL 2023 and CoNLL 2023, hosted by EMNLP 2023
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence
    Subject code 410
    Publishing date 2023-12-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Revisiting the Optimality of Word Lengths

    Pimentel, Tiago / Meister, Clara / Wilcox, Ethan Gotlieb / Mahowald, Kyle / Cotterell, Ryan

    2023  

    Abstract: Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this claim by showing that words' lengths are inversely correlated with their ... ...

    Abstract Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this claim by showing that words' lengths are inversely correlated with their frequencies. Communicative cost, however, can be operationalized in different ways. Piantadosi et al. (2011) claim that cost should be measured as the distance between an utterance's information rate and channel capacity, which we dub the channel capacity hypothesis (CCH) here. Following this logic, they then proposed that a word's length should be proportional to the expected value of its surprisal (negative log-probability in context). In this work, we show that Piantadosi et al.'s derivation does not minimize CCH's cost, but rather a lower bound, which we term CCH-lower. We propose a novel derivation, suggesting an improved way to minimize CCH's cost. Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio. Experimentally, we compare these three communicative cost functions: Zipf's, CCH-lower , and CCH. Across 13 languages and several experimental settings, we find that length is better predicted by frequency than either of the other hypotheses. In fact, when surprisal's expectation, or expectation plus variance-to-mean ratio, is estimated using better language models, it leads to worse word length predictions. We take these results as evidence that Zipf's longstanding hypothesis holds.

    Comment: Published at EMNLP 2023
    Keywords Computer Science - Computation and Language
    Subject code 400
    Publishing date 2023-12-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: On the Efficacy of Sampling Adapters

    Meister, Clara / Pimentel, Tiago / Malagutti, Luca / Wilcox, Ethan G. / Cotterell, Ryan

    2023  

    Abstract: Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, ... ...

    Abstract Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, such as nucleus or top-k sampling, have been introduced and are now ubiquitously used in language generation systems. We propose a unified framework for understanding these techniques, which we term sampling adapters. Sampling adapters often lead to qualitatively better text, which raises the question: From a formal perspective, how are they changing the (sub)word-level distributions of language generation models? And why do these local changes lead to higher-quality text? We argue that the shift they enforce can be viewed as a trade-off between precision and recall: while the model loses its ability to produce certain strings, its precision rate on desirable text increases. While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution. Further, these measures correlate with higher sequence-level quality scores, specifically, Mauve.

    Comment: ACL 2023 Main Conference Proceedings
    Keywords Computer Science - Computation and Language
    Subject code 005
    Publishing date 2023-07-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Investigating Novel Verb Learning in BERT

    Thrush, Tristan / Wilcox, Ethan / Levy, Roger

    Selectional Preference Classes and Alternation-Based Syntactic Generalization

    2020  

    Abstract: Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during training. We address ... ...

    Abstract Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during training. We address this issue by deploying a novel word-learning paradigm to test BERT's few-shot learning capabilities for two aspects of English verbs: alternations and classes of selectional preferences. For the former, we fine-tune BERT on a single frame in a verbal-alternation pair and ask whether the model expects the novel verb to occur in its sister frame. For the latter, we fine-tune BERT on an incomplete selectional network of verbal objects and ask whether it expects unattested but plausible verb/object pairs. We find that BERT makes robust grammatical generalizations after just one or two instances of a novel word in fine-tuning. For the verbal alternation tests, we find that the model displays behavior that is consistent with a transitivity bias: verbs seen few times are expected to take direct objects, but verbs seen with direct objects are not expected to occur intransitively.

    Comment: Accepted to BlackboxNLP 2020
    Keywords Computer Science - Computation and Language ; Computer Science - Machine Learning
    Subject code 430
    Publishing date 2020-11-04
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: On the Effect of Anticipation on Reading Times

    Pimentel, Tiago / Meister, Clara / Wilcox, Ethan G. / Levy, Roger / Cotterell, Ryan

    2022  

    Abstract: Over the past two decades, numerous studies have demonstrated how less predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new ... ...

    Abstract Over the past two decades, numerous studies have demonstrated how less predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new word and allocate time to process it as required. We argue that prior results are also compatible with a reading process that is at least partially anticipatory: Readers could make predictions about a future word and allocate time to process it based on their expectation. In this work, we operationalize this anticipation as a word's contextual entropy. We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking. Experimentally, across datasets and analyses, we find substantial evidence for effects of contextual entropy over surprisal on a word's reading time (RT): in fact, entropy is sometimes better than surprisal in predicting a word's RT. Spillover effects, however, are generally not captured by entropy, but only by surprisal. Further, we hypothesize four cognitive mechanisms through which contextual entropy could impact RTs -- three of which we are able to design experiments to analyze. Overall, our results support a view of reading that is not just responsive, but also anticipatory.

    Comment: This is a pre-MIT Press publication version of the paper. Code is available in https://github.com/rycolab/anticipation-on-reading-times
    Keywords Computer Science - Computation and Language
    Subject code 028
    Publishing date 2022-11-25
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top