LIVIVO - Search results -

Search results

Result 1 - 10 of total 20

Search options

Article ; Online: Exhaustivity and Anti-Exhaustivity in the RSA Framework: Testing the Effect of Prior Beliefs.

Cremers, Alexandre / Wilcox, Ethan G / Spector, Benjamin

2023 Volume 47, Issue 5, Page(s) e13286

Abstract: During communication, the interpretation of utterances is sensitive to a listener's probabilistic prior beliefs. In this paper, we focus on the influence of prior beliefs on so-called exhaustivity interpretations, whereby a sentence such as Mary came is ... ...

Abstract	During communication, the interpretation of utterances is sensitive to a listener's probabilistic prior beliefs. In this paper, we focus on the influence of prior beliefs on so-called exhaustivity interpretations, whereby a sentence such as Mary came is understood to mean that only Mary came. Two theoretical origins for exhaustivity effects have been proposed in the previous literature. On the one hand are perspectives that view these inferences as the result of a purely pragmatic process (as in the classical Gricean view, and more recent Bayesian approaches); on the other hand are proposals that treat them as the result of an encapsulated semantic mechanism (Chierchia, Fox & Spector 2012). We gain traction on adjudicating between these two approaches with new theoretical and experimental evidence, focusing on the behavior of different models for exhaustivity effects, all of which fit under the Rational Speech Act modeling framework (RSA, Frank & Goodman, 2012). Some (but not all!) of these models include an encapsulated semantic mechanism. Theoretically, we demonstrate that many RSA models predict not only exhaustivity, but also anti-exhaustivity, whereby "Mary came" would convey that Mary and someone else came. We evaluate these models against data obtained in a new study which tested the effects of prior beliefs on both production and comprehension, improving on previous empirical work. We find that the models which have the best fit to human behavior include an encapsulated exhaustivity mechanism. We conclude that, on the one hand, in the division of labor between semantics and pragmatics, semantics plays a larger role than is often thought, but, on the other hand, the tradeoff between informativity and cost which characterizes all RSA models does play a central role for genuine pragmatic effects.
MeSH term(s)	Humans ; Bayes Theorem ; Language ; Semantics ; Communication ; Speech ; Comprehension
Language	English
Publishing date	2023-05-05
Publishing country	United States
Document type	Journal Article ; Research Support, Non-U.S. Gov't
ZDB-ID	2002940-8
ISSN	1551-6709 ; 0364-0213
ISSN (online)	1551-6709
ISSN	0364-0213
DOI	10.1111/cogs.13286
Database	MEDical Literature Analysis and Retrieval System OnLINE

Full text online

Accessible to users with ZB MED library card

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Details ▾
- Full text online
- Order with fees

Book ; Online: Call for Papers -- The BabyLM Challenge

Warstadt, Alex / Choshen, Leshem / Mueller, Aaron / Williams, Adina / Wilcox, Ethan / Zhuang, Chengxu

Sample-efficient pretraining on a developmentally plausible corpus

2023

Abstract: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low- ... ...

Abstract	We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children. The task has three tracks, two of which restrict the training data to pre-released datasets of 10M and 100M words and are dedicated to explorations of approaches such as architectural variations, self-supervised objectives, or curriculum learning. The final track only restricts the amount of text used, allowing innovation in the choice of the data, its domain, and even its modality (i.e., data from sources other than text is welcome). We will release a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, including targeted syntactic evaluations and natural language understanding.
Keywords	Computer Science - Computation and Language
Subject code	420
Publishing date	2023-01-27
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Controlled Text Generation with Natural Language Instructions

Zhou, Wangchunshu / Jiang, Yuchen Eleanor / Wilcox, Ethan / Cotterell, Ryan / Sachan, Mrinmaya

2023

Abstract: Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training. Nevertheless, it is notoriously difficult to control their generation to satisfy the various ... ...

Abstract	Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training. Nevertheless, it is notoriously difficult to control their generation to satisfy the various constraints required by different applications. In this work, we present InstructCTG, a controlled text generation framework that incorporates different constraints by conditioning on natural language descriptions and demonstrations of the constraints. In particular, we first extract the underlying constraints of natural texts through a combination of off-the-shelf NLP tools and simple heuristics. We then verbalize the constraints into natural language instructions to form weakly supervised training data. By prepending natural language descriptions of the constraints and a few demonstrations, we fine-tune a pre-trained language model to incorporate various types of constraints. Compared to existing search-based or score-based methods, InstructCTG is more flexible to different constraint types and has a much smaller impact on the generation quality and speed because it does not modify the decoding procedure. Additionally, InstructCTG allows the model to adapt to new constraints without re-training through the use of few-shot task generalization and in-context learning abilities of instruction-tuned language models. Comment: ICML 2023
Keywords	Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Machine Learning
Subject code	004
Publishing date	2023-04-27
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Quantifying the redundancy between prosody and text

Wolf, Lukas / Pimentel, Tiago / Fedorenko, Evelina / Cotterell, Ryan / Warstadt, Alex / Wilcox, Ethan / Regev, Tamar

2023

Abstract: Prosody -- the suprasegmental component of speech, including pitch, loudness, and tempo -- carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. ... ...

Abstract	Prosody -- the suprasegmental component of speech, including pitch, loudness, and tempo -- carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. We use large language models (LLMs) to estimate how much information is redundant between prosody and the words themselves. Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features, including intensity, duration, pauses, and pitch contours. Furthermore, a word's prosodic information is redundant with both the word itself and the context preceding as well as following it. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words. Along with this paper, we release a general-purpose data processing pipeline for quantifying the relationship between linguistic information and extra-linguistic features. Comment: Published at The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Keywords	Computer Science - Computation and Language ; Computer Science - Artificial Intelligence ; Computer Science - Information Theory ; Computer Science - Machine Learning
Subject code	400
Publishing date	2023-11-28
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Testing the Predictions of Surprisal Theory in 11 Languages

Wilcox, Ethan Gotlieb / Pimentel, Tiago / Meister, Clara / Cotterell, Ryan / Levy, Roger P.

2023

Abstract: A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, ... ...

Abstract	A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, i.e. its negative log-probability given a context. While evidence supporting the predictions of Surprisal Theory have been replicated widely, most have focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times; (ii) whether expected surprisal, i.e. contextual entropy, is predictive of reading times; (iii) and whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages. Comment: This is a pre-MIT Press publication version of the paper
Keywords	Computer Science - Computation and Language
Subject code	410
Publishing date	2023-07-07
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: WhisBERT

Wolf, Lukas / Tuckute, Greta / Kotar, Klemen / Hosseini, Eghbal / Regev, Tamar / Wilcox, Ethan / Warstadt, Alex

Multimodal Text-Audio Language Modeling on 100M Words

2023

Abstract: Training on multiple modalities of input can augment the capabilities of a language model. Here, we ask whether such a training regime can improve the quality and efficiency of these systems as well. We focus on text--audio and introduce Whisbert, which ... ...

Abstract	Training on multiple modalities of input can augment the capabilities of a language model. Here, we ask whether such a training regime can improve the quality and efficiency of these systems as well. We focus on text--audio and introduce Whisbert, which is inspired by the text--image approach of FLAVA (Singh et al., 2022). In accordance with Babylm guidelines (Warstadt et al., 2023), we pretrain Whisbert on a dataset comprising only 100 million words plus their corresponding speech from the word-aligned version of the People's Speech dataset (Galvez et al., 2021). To assess the impact of multimodality, we compare versions of the model that are trained on text only and on both audio and text simultaneously. We find that while Whisbert is able to perform well on multimodal masked modeling and surpasses the Babylm baselines in most benchmark tasks, it struggles to optimize its complex objective and outperform its text-only Whisbert baseline. Comment: Published at the BabyLM Challenge, a shared task co-sponsored by CMCL 2023 and CoNLL 2023, hosted by EMNLP 2023
Keywords	Computer Science - Computation and Language ; Computer Science - Artificial Intelligence
Subject code	410
Publishing date	2023-12-05
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Revisiting the Optimality of Word Lengths

Pimentel, Tiago / Meister, Clara / Wilcox, Ethan Gotlieb / Mahowald, Kyle / Cotterell, Ryan

2023

Abstract: Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this claim by showing that words' lengths are inversely correlated with their ... ...

Abstract	Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs. Under the assumption that cost is given by an utterance's length, he supported this claim by showing that words' lengths are inversely correlated with their frequencies. Communicative cost, however, can be operationalized in different ways. Piantadosi et al. (2011) claim that cost should be measured as the distance between an utterance's information rate and channel capacity, which we dub the channel capacity hypothesis (CCH) here. Following this logic, they then proposed that a word's length should be proportional to the expected value of its surprisal (negative log-probability in context). In this work, we show that Piantadosi et al.'s derivation does not minimize CCH's cost, but rather a lower bound, which we term CCH-lower. We propose a novel derivation, suggesting an improved way to minimize CCH's cost. Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio. Experimentally, we compare these three communicative cost functions: Zipf's, CCH-lower , and CCH. Across 13 languages and several experimental settings, we find that length is better predicted by frequency than either of the other hypotheses. In fact, when surprisal's expectation, or expectation plus variance-to-mean ratio, is estimated using better language models, it leads to worse word length predictions. We take these results as evidence that Zipf's longstanding hypothesis holds. Comment: Published at EMNLP 2023
Keywords	Computer Science - Computation and Language
Subject code	400
Publishing date	2023-12-06
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: On the Efficacy of Sampling Adapters

Meister, Clara / Pimentel, Tiago / Malagutti, Luca / Wilcox, Ethan G. / Cotterell, Ryan

2023

Abstract: Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, ... ...

Abstract	Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, such as nucleus or top-k sampling, have been introduced and are now ubiquitously used in language generation systems. We propose a unified framework for understanding these techniques, which we term sampling adapters. Sampling adapters often lead to qualitatively better text, which raises the question: From a formal perspective, how are they changing the (sub)word-level distributions of language generation models? And why do these local changes lead to higher-quality text? We argue that the shift they enforce can be viewed as a trade-off between precision and recall: while the model loses its ability to produce certain strings, its precision rate on desirable text increases. While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution. Further, these measures correlate with higher sequence-level quality scores, specifically, Mauve. Comment: ACL 2023 Main Conference Proceedings
Keywords	Computer Science - Computation and Language
Subject code	005
Publishing date	2023-07-07
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: Investigating Novel Verb Learning in BERT

Thrush, Tristan / Wilcox, Ethan / Levy, Roger

Selectional Preference Classes and Alternation-Based Syntactic Generalization

2020

Abstract: Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during training. We address ... ...

Abstract	Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during training. We address this issue by deploying a novel word-learning paradigm to test BERT's few-shot learning capabilities for two aspects of English verbs: alternations and classes of selectional preferences. For the former, we fine-tune BERT on a single frame in a verbal-alternation pair and ask whether the model expects the novel verb to occur in its sister frame. For the latter, we fine-tune BERT on an incomplete selectional network of verbal objects and ask whether it expects unattested but plausible verb/object pairs. We find that BERT makes robust grammatical generalizations after just one or two instances of a novel word in fine-tuning. For the verbal alternation tests, we find that the model displays behavior that is consistent with a transitivity bias: verbs seen few times are expected to take direct objects, but verbs seen with direct objects are not expected to occur intransitively. Comment: Accepted to BlackboxNLP 2020
Keywords	Computer Science - Computation and Language ; Computer Science - Machine Learning
Subject code	430
Publishing date	2020-11-04
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Book ; Online: On the Effect of Anticipation on Reading Times

Pimentel, Tiago / Meister, Clara / Wilcox, Ethan G. / Levy, Roger / Cotterell, Ryan

2022

Abstract: Over the past two decades, numerous studies have demonstrated how less predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new ... ...

Abstract	Over the past two decades, numerous studies have demonstrated how less predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new word and allocate time to process it as required. We argue that prior results are also compatible with a reading process that is at least partially anticipatory: Readers could make predictions about a future word and allocate time to process it based on their expectation. In this work, we operationalize this anticipation as a word's contextual entropy. We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking. Experimentally, across datasets and analyses, we find substantial evidence for effects of contextual entropy over surprisal on a word's reading time (RT): in fact, entropy is sometimes better than surprisal in predicting a word's RT. Spillover effects, however, are generally not captured by entropy, but only by surprisal. Further, we hypothesize four cognitive mechanisms through which contextual entropy could impact RTs -- three of which we are able to design experiments to analyze. Overall, our results support a view of reading that is not just responsive, but also anticipatory. Comment: This is a pre-MIT Press publication version of the paper. Code is available in https://github.com/rycolab/anticipation-on-reading-times
Keywords	Computer Science - Computation and Language
Subject code	028
Publishing date	2022-11-25
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

To top

Full text online

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED

Full text online

More links

Kategorien

Inter-library loan at ZB MED