LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 188

Search options

  1. Book ; Online: To Diverge or Not to Diverge

    Luo, Jiaming / Cherry, Colin / Foster, George

    A Morphosyntactic Perspective on Machine Translation vs Human Translation

    2024  

    Abstract: We conduct a large-scale fine-grained comparative analysis of machine translations (MT) against human translations (HT) through the lens of morphosyntactic divergence. Across three language pairs and two types of divergence defined as the structural ... ...

    Abstract We conduct a large-scale fine-grained comparative analysis of machine translations (MT) against human translations (HT) through the lens of morphosyntactic divergence. Across three language pairs and two types of divergence defined as the structural difference between the source and the target, MT is consistently more conservative than HT, with less morphosyntactic diversity, more convergent patterns, and more one-to-one alignments. Through analysis on different decoding algorithms, we attribute this discrepancy to the use of beam search that biases MT towards more convergent patterns. This bias is most amplified when the convergent pattern appears around 50% of the time in training data. Lastly, we show that for a majority of morphosyntactic divergences, their presence in HT is correlated with decreased MT performance, presenting a greater challenge for MT systems.

    Comment: TACL, pre-MIT Press publication version
    Keywords Computer Science - Computation and Language
    Publishing date 2024-01-02
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Searching for Needles in a Haystack

    Briakou, Eleftheria / Cherry, Colin / Foster, George

    On the Role of Incidental Bilingualism in PaLM's Translation Capability

    2023  

    Abstract: Large, multilingual language models exhibit surprisingly good zero- or few-shot machine translation capabilities, despite having never seen the intentionally-included translation examples provided to typical neural translation systems. We investigate the ...

    Abstract Large, multilingual language models exhibit surprisingly good zero- or few-shot machine translation capabilities, despite having never seen the intentionally-included translation examples provided to typical neural translation systems. We investigate the role of incidental bilingualism -- the unintentional consumption of bilingual signals, including translation examples -- in explaining the translation capabilities of large language models, taking the Pathways Language Model (PaLM) as a case study. We introduce a mixed-method approach to measure and understand incidental bilingualism at scale. We show that PaLM is exposed to over 30 million translation pairs across at least 44 languages. Furthermore, the amount of incidental bilingual content is highly correlated with the amount of monolingual in-language content for non-English languages. We relate incidental bilingual content to zero-shot prompts and show that it can be used to mine new prompts to improve PaLM's out-of-English zero-shot translation quality. Finally, in a series of small-scale ablations, we show that its presence has a substantial impact on translation capabilities, although this impact diminishes with model scale.

    Comment: Accepted at ACL 2023
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2023-05-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: P1' specificity of the S219V/R203G mutant tobacco etch virus protease.

    Golda, Mária / Hoffka, Gyula / Cherry, Scott / Tropea, Joseph E / Lountos, George T / Waugh, David S / Wlodawer, Alexander / Tőzsér, József / Mótyán, János András

    Proteins

    2024  

    Abstract: Proteases that recognize linear amino acid sequences with high specificity became indispensable tools of recombinant protein technology for the removal of various fusion tags. Due to its stringent sequence specificity, the catalytic domain of the nuclear ...

    Abstract Proteases that recognize linear amino acid sequences with high specificity became indispensable tools of recombinant protein technology for the removal of various fusion tags. Due to its stringent sequence specificity, the catalytic domain of the nuclear inclusion cysteine protease of tobacco etch virus (TEV PR) is also a widely applied reagent for enzymatic removal of fusion tags. For this reason, efforts have been made to improve its stability and modify its specificity. For example, P1' autoproteolytic cleavage-resistant mutant (S219V) TEV PR was found not only to be nearly impervious to self-inactivation, but also exhibited greater stability and catalytic efficiency than the wild-type enzyme. An R203G substitution has been reported to further relax the P1' specificity of the enzyme, however, these results were obtained from crude intracellular assays. Until now, there has been no rigorous comparison of the P1' specificity of the S219V and S219V/R203G mutants in vitro, under carefully controlled conditions. Here, we compare the P1' amino acid preferences of these single and double TEV PR mutants. The in vitro analysis was performed by using recombinant protein substrates representing 20 P1' variants of the consensus TENLYFQ*SGT cleavage site, and synthetic oligopeptide substrates were also applied to study a limited set of the most preferred variants. In addition, the enzyme-substrate interactions were analyzed in silico. The results indicate highly similar P1' preferences for both enzymes, many side-chains can be accommodated by the S1' binding sites, but the kinetic assays revealed lower catalytic efficiency for the S219V/R203G than for the S219V mutant.
    Language English
    Publishing date 2024-04-26
    Publishing country United States
    Document type Journal Article
    ZDB-ID 806683-8
    ISSN 1097-0134 ; 0887-3585
    ISSN (online) 1097-0134
    ISSN 0887-3585
    DOI 10.1002/prot.26693
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Article ; Online: Structural basis for cell type specific DNA binding of C/EBPβ: The case of cell cycle inhibitor p15INK4b promoter.

    Lountos, George T / Cherry, Scott / Tropea, Joseph E / Wlodawer, Alexander / Miller, Maria

    Journal of structural biology

    2022  Volume 214, Issue 4, Page(s) 107918

    Abstract: C/EBPβ is a key regulator of numerous cellular processes, but it can also contribute to tumorigenesis and viral diseases. It binds to specific DNA sequences (C/EBP sites) and interacts with other transcription factors to control expression of multiple ... ...

    Abstract C/EBPβ is a key regulator of numerous cellular processes, but it can also contribute to tumorigenesis and viral diseases. It binds to specific DNA sequences (C/EBP sites) and interacts with other transcription factors to control expression of multiple eukaryotic genes in a tissue and cell-type dependent manner. A body of evidence has established that cell-type-specific regulatory information is contained in the local DNA sequence of the binding motif. In human epithelial cells, C/EBPβ is an essential cofactor for TGFβ signaling in the case of Smad2/3/4 and FoxO-dependent induction of the cell cycle inhibitor, p15INK4b. In the TGFβ-responsive region 2 of the p15INK4b promoter, the Smad binding site is flanked by a C/EBP site, CTTAA•GAAAG, which differs from the canonical, palindromic ATTGC•GCAAT motif. The X-ray crystal structure of C/EBPβ bound to the p15INK4b promoter fragment shows how GCGC-to-AAGA substitution generates changes in the intermolecular interactions in the protein-DNA interface that enhances C/EBPβ binding specificity, limits possible epigenetic regulation of the promoter, and generates a DNA element with a unique pattern of methyl groups in the major groove. Significantly, CT/GA dinucleotides located at the 5'ends of the double stranded element maintain local narrowing of the DNA minor groove width that is necessary for DNA recognition. Our results suggest that C/EBPβ would accept all forms of modified cytosine in the context of the CpT site. This contrasts with the effect on the consensus motif, where C/EBPβ binding is modestly increased by cytosine methylation, but substantially decreased by hydroxymethylation.
    MeSH term(s) Humans ; CCAAT-Enhancer-Binding Protein-beta/genetics ; Epigenesis, Genetic ; Cell Cycle ; Cytosine ; DNA/genetics
    Chemical Substances CCAAT-Enhancer-Binding Protein-beta ; Cytosine (8J337D1HZY) ; DNA (9007-49-2)
    Language English
    Publishing date 2022-11-04
    Publishing country United States
    Document type Journal Article ; Research Support, N.I.H., Intramural ; Research Support, N.I.H., Extramural ; Research Support, U.S. Gov't, Non-P.H.S.
    ZDB-ID 1032718-6
    ISSN 1095-8657 ; 1047-8477
    ISSN (online) 1095-8657
    ISSN 1047-8477
    DOI 10.1016/j.jsb.2022.107918
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Assessment of Capsule Endoscopy Utilizing Capsocam Plus in Patients With Suspected Small Bowel Disease Including Pilot Study With Remote Access Patients During Pandemic.

    Enns, Chasyn / Galorport, Cherry / Ou, George / Enns, Robert

    Journal of the Canadian Association of Gastroenterology

    2021  Volume 4, Issue 6, Page(s) 269–273

    Abstract: Background: CapsoCam Plus is a capsule endoscopy (CE) system that utilizes four cameras to capture a panoramic view. This has theoretical advantage over conventional forward-viewing CE with limited field of view. Its ease of administration without ... ...

    Abstract Background: CapsoCam Plus is a capsule endoscopy (CE) system that utilizes four cameras to capture a panoramic view. This has theoretical advantage over conventional forward-viewing CE with limited field of view. Its ease of administration without requiring any additional equipment during the recording also provides a unique opportunity for patients to self-administer the test. We aimed to evaluate real-life experience using this novel system and to determine feasibility of a remote access program.
    Methods: Retrospective chart review was conducted for consecutive adult outpatients who underwent CE using CapsoCam Plus. Patients with significant challenges for in-person procedures were selected for remote access through mail courier services. Gastric transit time, small bowel transit time, completion rate, diagnostic yield and adverse events were compared between remote access versus usual practice.
    Results: Ninety-four patients (52.1% male) were included, with 28 in remote access program. Most common indication was gastrointestinal bleeding (85.1%). Complete examination was achieved in 87 patients. Five (5.3%) patients' capsule remained in stomach during the recording, while two (2.1%) patients missed capsule retrieval. Median small bowel and gastric transit times were 231.9 (interquartile range [IQR] 169.5-308.2) and 27.6 (IQR 13.8-63.5) minutes, respectively. Diagnostic yield was 23.4%. There was no difference in completion rate or transit times between two groups, but diagnostic yield was higher in remote access group (odds ratio 3.80, 95% confidence interval 1.28-11.31). One patient required elective endoscopic retrieval of capsule.
    Conclusion: CapsoCam Plus can be safely administered remotely with a high degree of success, which may facilitate timely investigations while limiting nonessential physical interactions during pandemic.
    Language English
    Publishing date 2021-01-09
    Publishing country England
    Document type Journal Article
    ZDB-ID 2940642-0
    ISSN 2515-2092 ; 2515-2084
    ISSN (online) 2515-2092
    ISSN 2515-2084
    DOI 10.1093/jcag/gwaa042
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Book ; Online: Assessing Reference-Free Peer Evaluation for Machine Translation

    Agrawal, Sweta / Foster, George / Freitag, Markus / Cherry, Colin

    2021  

    Abstract: Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains. It has been recently shown that the probabilities given by a large, multilingual ... ...

    Abstract Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains. It has been recently shown that the probabilities given by a large, multilingual model can achieve state of the art results when used as a reference-free metric. We experiment with various modifications to this model and demonstrate that by scaling it up we can match the performance of BLEU. We analyze various potential weaknesses of the approach and find that it is surprisingly robust and likely to offer reasonable performance across a broad spectrum of domains and different system qualities.

    Comment: NAACL 2021
    Keywords Computer Science - Computation and Language
    Publishing date 2021-04-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Thinking Slow about Latency Evaluation for Simultaneous Machine Translation

    Cherry, Colin / Foster, George

    2019  

    Abstract: Simultaneous machine translation attempts to translate a source sentence before it is finished being spoken, with applications to translation of spoken language for live streaming and conversation. Since simultaneous systems trade quality to reduce ... ...

    Abstract Simultaneous machine translation attempts to translate a source sentence before it is finished being spoken, with applications to translation of spoken language for live streaming and conversation. Since simultaneous systems trade quality to reduce latency, having an effective and interpretable latency metric is crucial. We introduce a variant of the recently proposed Average Lagging (AL) metric, which we call Differentiable Average Lagging (DAL). It distinguishes itself by being differentiable and internally consistent to its underlying mathematical model.
    Keywords Computer Science - Computation and Language
    Publishing date 2019-05-31
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Science and Practice of Pressure Ulcer Management

    Romanelli, Marco / Cherry, George / Clark, Michael / Colin, Denis / Defloor, Tom

    2006  

    Author's details edited by Marco Romanelli, Michael Clark, George Cherry, Denis Colin, Tom Defloor
    Keywords Dermatology ; Diabetes ; Nursing ; Surgery ; Vascular Surgery
    Language English
    Publisher Springer-Verlag London Limited
    Publishing place London
    Document type Book ; Online
    HBZ-ID TT050387307
    ISBN 978-1-85233-839-8 ; 978-1-8462-8134-1 ; 1-85233-839-3 ; 1-8462-8134-2
    DOI 10.1007/1-84628-134-2
    Database ZB MED Catalogue: Medicine, Health, Nutrition, Environment, Agriculture

    Kategorien

  9. Book ; Online: Inference Strategies for Machine Translation with Conditional Masking

    Kreutzer, Julia / Foster, George / Cherry, Colin

    2020  

    Abstract: Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy ...

    Abstract Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks.

    Comment: EMNLP 2020
    Keywords Computer Science - Computation and Language
    Publishing date 2020-10-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Re-translation versus Streaming for Simultaneous Translation

    Arivazhagan, Naveen / Cherry, Colin / Macherey, Wolfgang / Foster, George

    2020  

    Abstract: There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis ... ...

    Abstract There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find re-translation to be as good or better than state-of-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation's ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model.

    Comment: IWSLT 2020
    Keywords Computer Science - Computation and Language
    Subject code 410
    Publishing date 2020-04-07
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top