LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Your last searches

  1. AU="Ji, Ziwei"
  2. AU="Lim, K E"
  3. AU="Foresti, C."
  4. AU="Czimer, Dávid"
  5. AU="Nayak, Naren"
  6. AU="Khan, Jahidur Rahman"
  7. AU="Huber, Tobias B"
  8. AU="Özbek, Süha Süreyya"
  9. AU="Elujoba, Anthony A"
  10. AU="Lucas, Brian P"
  11. AU="Ngabo, Lucien"
  12. AU="M Elizabeth H. Hammond"
  13. AU="Poppe, Katrina"
  14. AU=Du Ping
  15. AU=Adorno E AU=Adorno E
  16. AU="Rehn, Alexandra"
  17. AU="Senff-Ribeiro, Andrea"

Search results

Result 1 - 10 of total 33

Search options

  1. Book ; Online: Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

    Lovenia, Holy / Dai, Wenliang / Cahyawijaya, Samuel / Ji, Ziwei / Fung, Pascale

    2023  

    Abstract: Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects. However, the absence of a general measurement for evaluating object ... ...

    Abstract Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects. However, the absence of a general measurement for evaluating object hallucination in VL models has hindered our understanding and ability to mitigate this issue. In this work, we present NOPE (Negative Object Presence Evaluation), a novel benchmark designed to assess object hallucination in VL models through visual question answering (VQA). We propose a cost-effective and scalable approach utilizing large language models to generate 29.5k synthetic negative pronoun (NegP) data of high quality for NOPE. We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions, where the ground truth answers are denoted as NegP (e.g., "none"). Additionally, we evaluate their standard performance on visual questions on 9 other VQA datasets. Through our experiments, we demonstrate that no VL model is immune to the vulnerability of object hallucination, as all models achieve accuracy below 10\% on NegP. Furthermore, we uncover that lexically diverse visual questions, question types with large scopes, and scene-relevant objects capitalize the risk of object hallucination in VL models.
    Keywords Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Computation and Language
    Subject code 004
    Publishing date 2023-10-08
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  2. Book ; Online: Directional convergence and alignment in deep learning

    Ji, Ziwei / Telgarsky, Matus

    2020  

    Abstract: In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training ... ...

    Abstract In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max-pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the DenseNet). If the network further has locally Lipschitz gradients, we show that these gradients also converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization, convergence of saliency maps, and a few other settings. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-{\L}ojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.

    Comment: To appear, NeuRIPS 2020
    Keywords Computer Science - Machine Learning ; Computer Science - Neural and Evolutionary Computing ; Mathematics - Optimization and Control ; Statistics - Machine Learning
    Publishing date 2020-06-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Inhibition of thioredoxin-1 enhances the toxicity of glycolysis inhibitor 2-deoxyglucose by downregulating SLC1A5 expression in colorectal cancer cells.

    Tang, Tianbin / Fang, Daoquan / Ji, Ziwei / Zhong, Zuyue / Zhou, Baojian / Ye, Lechi / Jiang, Lei / Sun, Xuecheng

    Cellular oncology (Dordrecht)

    2023  Volume 47, Issue 2, Page(s) 607–621

    Abstract: Background: Targeting glycolysis in cancer is an attractive approach for therapeutic intervention. 2-Deoxyglucose (2DG) is a synthetic glucose analog that inhibits glycolysis. However, its efficacy is limited by the systemic toxicity at high doses. ... ...

    Abstract Background: Targeting glycolysis in cancer is an attractive approach for therapeutic intervention. 2-Deoxyglucose (2DG) is a synthetic glucose analog that inhibits glycolysis. However, its efficacy is limited by the systemic toxicity at high doses. Understanding the mechanism of 2DG resistance is important for further use of this drug in cancer treatment.
    Methods: The expression of thioredoxin-1 (Trx-1) in colorectal cancer (CRC) cells treated with 2DG was detected by Western blotting. The effect of Trx-1 on the cytotoxicity of 2DG in CRC cells was examined in vitro and in vivo. The molecular mechanism involved in Trx-1-mediated activation of the SLC1A5 gene promoter activity was elucidated using in vitro models.
    Results: Inhibition glycolysis with 2DG increased the expression of Trx-1 in CRC cells. Overexpression of Trx-1 decreased the cytotoxicity of 2DG, whereas knockdown of Trx-1 by shRNA significantly increased the cytotoxicity of 2DG in CRC cells. The Trx-1 inhibitor PX-12 increased the cytotoxicity of 2DG on CRC cells both in vitro and in vivo. In addition, Trx-1 promoted SLC1A5 expression by increasing the promoter activity of the SLC1A5 gene by binding to SP1. We also found that the SLC1A5 expression was upregulated in CRC tissues, and inhibition of SLC1A5 significantly enhanced the inhibitory effect of 2DG on the growth of CRC cells in vitro and in vivo. Overexpression of SLC1A5 reduced the cytotoxicity of 2DG in combination with PX-12 treatment in CRC cells.
    Conclusion: Our results demonstrate a novel adaptive mechanism of glycolytic inhibition in which Trx-1 increases GSH levels by regulating SLC1A5 to rescue cytotoxicity induced by 2DG in CRC cells. Inhibition of glycolysis in combination with inhibition of Trx-1 or SLC1A5 may be a promising strategy for the treatment of CRC.
    MeSH term(s) Humans ; Deoxyglucose/pharmacology ; Thioredoxins/metabolism ; Thioredoxins/genetics ; Colorectal Neoplasms/genetics ; Colorectal Neoplasms/pathology ; Colorectal Neoplasms/metabolism ; Colorectal Neoplasms/drug therapy ; Glycolysis/drug effects ; Down-Regulation/drug effects ; Cell Line, Tumor ; Animals ; Mice, Nude ; Gene Expression Regulation, Neoplastic/drug effects ; Promoter Regions, Genetic/genetics ; Mice ; Mice, Inbred BALB C ; Sp1 Transcription Factor/metabolism ; Xenograft Model Antitumor Assays ; Disulfides ; Imidazoles
    Chemical Substances Deoxyglucose (9G2MP84A8W) ; Thioredoxins (52500-60-4) ; Sp1 Transcription Factor ; 1-methylpropyl-2-imidazolyl disulfide (8PQ9CZ8BTJ) ; Disulfides ; Imidazoles
    Language English
    Publishing date 2023-10-23
    Publishing country Netherlands
    Document type Journal Article
    ZDB-ID 2595109-9
    ISSN 2211-3436 ; 1875-8606 ; 2211-3428
    ISSN (online) 2211-3436
    ISSN 1875-8606 ; 2211-3428
    DOI 10.1007/s13402-023-00887-6
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Plausible May Not Be Faithful

    Dai, Wenliang / Liu, Zihan / Ji, Ziwei / Su, Dan / Fung, Pascale

    Probing Object Hallucination in Vision-Language Pre-training

    2022  

    Abstract: Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we systematically study the object hallucination problem from three aspects. First, ...

    Abstract Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we systematically study the object hallucination problem from three aspects. First, we examine recent state-of-the-art VLP models, showing that they still hallucinate frequently, and models achieving better scores on standard metrics (e.g., CIDEr) could be more unfaithful. Second, we investigate how different types of image encoding in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate that token-level image-text alignment and controlled generation are crucial to reducing hallucination. Based on that, we propose a simple yet effective VLP loss named ObjMLM to further mitigate object hallucination. Results show that it reduces object hallucination by up to 17.4% when tested on two benchmarks (COCO Caption for in-domain and NoCaps for out-of-domain evaluation).

    Comment: Accepted at EACL 2023
    Keywords Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition
    Subject code 004
    Publishing date 2022-10-14
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Book ; Online: Contrastive Learning for Inference in Dialogue

    Ishii, Etsuko / Xu, Yan / Wilie, Bryan / Ji, Ziwei / Lovenia, Holy / Chung, Willy / Fung, Pascale

    2023  

    Abstract: Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in ... ...

    Abstract Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker. While recent large language models show remarkable advances in inference tasks, their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning. In this paper, we analyze the behavior of the models based on the task difficulty defined by the semantic information gap -- which distinguishes inductive and deductive reasoning (Johnson-Laird, 1988, 1993). Our analysis reveals that the disparity in information between dialogue contexts and desired inferences poses a significant challenge to the inductive inference process. To mitigate this information gap, we investigate a contrastive learning approach by feeding negative samples. Our experiments suggest negative samples help models understand what is wrong and improve their inference generations.

    Comment: Accepted to EMNLP2023
    Keywords Computer Science - Computation and Language
    Subject code 160
    Publishing date 2023-10-19
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  6. Book ; Online: Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

    Ji, Ziwei / Telgarsky, Matus

    2019  

    Abstract: Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample ... ...

    Abstract Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size $n$, the (inverse) target error $1/\epsilon$, and the (inverse) failure probability $1/\delta$. This work shows that $\widetilde{O}(1/\epsilon)$ iterations of gradient descent with $\widetilde{\Omega}(1/\epsilon^2)$ training examples on two-layer ReLU networks of any width exceeding $\mathrm{polylog}(n,1/\epsilon,1/\delta)$ suffice to achieve a test misclassification error of $\epsilon$. The analysis further relies upon a margin property of the limiting kernel, which is guaranteed positive, and can distinguish between true labels and random labels.
    Keywords Computer Science - Machine Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning
    Subject code 512
    Publishing date 2019-09-26
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  7. Book ; Online: Generalization bounds via distillation

    Hsu, Daniel / Ji, Ziwei / Telgarsky, Matus / Wang, Lan

    2021  

    Abstract: This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller ... ...

    Abstract This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds. The main contribution is an analysis showing that the original network inherits this good generalization bound from its distillation, assuming the use of well-behaved data augmentation. This bound is presented both in an abstract and in a concrete form, the latter complemented by a reduction technique to handle modern computation graphs featuring convolutional layers, fully-connected layers, and skip connections, to name a few. To round out the story, a (looser) classical uniform convergence analysis of compression is also presented, as well as a variety of experiments on cifar and mnist demonstrating similar generalization performance between the original network and its distillation.

    Comment: To appear, ICLR 2021
    Keywords Computer Science - Machine Learning
    Publishing date 2021-04-12
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  8. Book ; Online: Characterizing the implicit bias via a primal-dual analysis

    Ji, Ziwei / Telgarsky, Matus

    2019  

    Abstract: This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior ... ...

    Abstract This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with $n$ training examples and $t$ gradient descent steps, our dual analysis further allows us to prove an $O(\ln(n)/\ln(t))$ convergence rate to the $\ell_2$ maximum margin direction, when a constant step size is used. This rate is tight in both $n$ and $t$, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$ margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate.
    Keywords Computer Science - Machine Learning ; Mathematics - Optimization and Control ; Statistics - Machine Learning
    Subject code 510
    Publishing date 2019-06-11
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  9. Book ; Online: Actor-critic is implicitly biased towards high entropy optimal policies

    Hu, Yuzheng / Ji, Ziwei / Telgarsky, Matus

    2021  

    Abstract: We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover prefers high ... ...

    Abstract We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover prefers high entropy optimal policies. To demonstrate the strength of this bias, the algorithm not only has no regularization, no projections, and no exploration like $\epsilon$-greedy, but is moreover trained on a single trajectory with no resets. The key consequence of the high entropy bias is that uniform mixing assumptions on the MDP, which exist in some form in all prior work, can be dropped: the implicit regularization of the high entropy bias is enough to ensure that all chains mix and an optimal policy is reached with high probability. As auxiliary contributions, this work decouples concerns between the actor and critic by writing the actor update as an explicit mirror descent, provides tools to uniformly bound mixing times within KL balls of policy space, and provides a projection-free TD analysis with its own implicit bias which can be run from an unmixed starting distribution.

    Comment: v2 primarily improved the proofs, with minimal changes to the body
    Keywords Computer Science - Machine Learning
    Subject code 519
    Publishing date 2021-10-21
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Early-stopped neural networks are consistent

    Ji, Ziwei / Li, Justin D. / Telgarsky, Matus

    2021  

    Abstract: This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this ... ...

    Abstract This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any univariate classifier satisfying a local interpolation property is inconsistent.
    Keywords Computer Science - Machine Learning ; Statistics - Machine Learning
    Publishing date 2021-06-10
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top