LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 133

Search options

  1. Article ; Online: Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach.

    Wu, Zhoutong / Xiao, Mingqing / Fang, Cong / Lin, Zhouchen

    IEEE transactions on pattern analysis and machine intelligence

    2024  Volume PP

    Abstract: Universal approximation capability, also referred to as universality, is an important property of deep neural networks, endowing them with the potency to accurately represent the underlying target function in learning tasks. In practice, the architecture ...

    Abstract Universal approximation capability, also referred to as universality, is an important property of deep neural networks, endowing them with the potency to accurately represent the underlying target function in learning tasks. In practice, the architecture of deep neural networks largely influences the performance of the models. However, most existing methodologies for designing neural architectures, such as the heuristic manual design or neural architecture search, ignore the universal approximation property, thus losing a potential safeguard about the performance. In this paper, we propose a unified framework to design the architectures of deep neural networks with a universality guarantee based on first-order optimization algorithms, where the forward pass is interpreted as the updates of an optimization algorithm. The (explicit or implicit) network is designed by replacing each gradient term in the algorithm with a learnable module similar to a two-layer network or its derivatives Specifically, we explore the realm of width-bounded neural networks, a common practical scenario, showcasing their universality. Moreover, adding operations of normalization, downsampling, and upsampling does not hurt the universality. To the best of our knowledge, this is the first work that width-bounded networks with universal approximation guarantee can be designed in a principled way. Our framework can inspire a variety of neural architectures including some renowned structures such as ResNet and DenseNet, as well as novel innovations. The experimental results on image classification problems demonstrate that the newly inspired networks are competitive and surpass the baselines of ResNet, DenseNet, as well as the advanced ConvNeXt and ViT, testifying to the effectiveness of our framework.
    Language English
    Publishing date 2024-03-25
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2024.3380007
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Article ; Online: Towards Understanding Convergence and Generalization of AdamW.

    Zhou, Pan / Xie, Xingyu / Lin, Zhouchen / Yan, Shuicheng

    IEEE transactions on pattern analysis and machine intelligence

    2024  Volume PP

    Abstract: AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used ... ...

    Abstract AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used l
    Language English
    Publishing date 2024-03-27
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2024.3382294
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  3. Book ; Online: On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm

    Li, Huan / Lin, Zhouchen

    Better Dependence on the Dimension

    2024  

    Abstract: Although adaptive gradient methods have been extensively used in deep learning, their convergence rates have not been thoroughly studied, particularly with respect to their dependence on the dimension. This paper considers the classical RMSProp and its ... ...

    Abstract Although adaptive gradient methods have been extensively used in deep learning, their convergence rates have not been thoroughly studied, particularly with respect to their dependence on the dimension. This paper considers the classical RMSProp and its momentum extension and establishes the convergence rate of $\frac{1}{T}\sum_{k=1}^TE\left[\|\nabla f(x^k)\|_1\right]\leq O(\frac{\sqrt{d}}{T^{1/4}})$ measured by $\ell_1$ norm without the bounded gradient assumption, where $d$ is the dimension of the optimization variable and $T$ is the iteration number. Since $\|x\|_2\ll\|x\|_1\leq\sqrt{d}\|x\|_2$ for problems with extremely large $d$, our convergence rate can be considered to be analogous to the $\frac{1}{T}\sum_{k=1}^TE\left[\|\nabla f(x^k)\|_2\right]\leq O(\frac{1}{T^{1/4}})$ one of SGD measured by $\ell_1$ norm.
    Keywords Mathematics - Optimization and Control ; Computer Science - Artificial Intelligence
    Subject code 519
    Publishing date 2024-02-01
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  4. Article ; Online: Sampling complex topology structures for spiking neural networks.

    Yan, Shen / Meng, Qingyan / Xiao, Mingqing / Wang, Yisen / Lin, Zhouchen

    Neural networks : the official journal of the International Neural Network Society

    2024  Volume 172, Page(s) 106121

    Abstract: Spiking Neural Networks (SNNs) have been considered a potential competitor to Artificial Neural Networks (ANNs) due to their high biological plausibility and energy efficiency. However, the architecture design of SNN has not been well studied. Previous ... ...

    Abstract Spiking Neural Networks (SNNs) have been considered a potential competitor to Artificial Neural Networks (ANNs) due to their high biological plausibility and energy efficiency. However, the architecture design of SNN has not been well studied. Previous studies either use ANN architectures or directly search for SNN architectures under a highly constrained search space. In this paper, we aim to introduce much more complex connection topologies to SNNs to further exploit the potential of SNN architectures. To this end, we propose the topology-aware search space, which is the first search space that enables a more diverse and flexible design for both the spatial and temporal topology of the SNN architecture. Then, to efficiently obtain architecture from our search space, we propose the spatio-temporal topology sampling (STTS) algorithm. By leveraging the benefits of random sampling, STTS can yield powerful architecture without the need for an exhaustive search process, making it significantly more efficient than alternative search strategies. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate the effectiveness of our method. Notably, we obtain 70.79% top-1 accuracy on ImageNet with only 4 time steps, 1.79% higher than the second best model. Our code is available under https://github.com/stiger1000/Random-Sampling-SNN.
    MeSH term(s) Neural Networks, Computer ; Algorithms
    Language English
    Publishing date 2024-01-10
    Publishing country United States
    Document type Journal Article
    ZDB-ID 740542-x
    ISSN 1879-2782 ; 0893-6080
    ISSN (online) 1879-2782
    ISSN 0893-6080
    DOI 10.1016/j.neunet.2024.106121
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  5. Article ; Online: Efficient learning of Scale-Adaptive Nearly Affine Invariant Networks.

    Shen, Zhengyang / Qiu, Yeqing / Liu, Jialun / He, Lingshen / Lin, Zhouchen

    Neural networks : the official journal of the International Neural Network Society

    2024  Volume 174, Page(s) 106229

    Abstract: Recent research has demonstrated the significance of incorporating invariance into neural networks. However, existing methods require direct sampling over the entire transformation set, notably computationally taxing for large groups like the affine ... ...

    Abstract Recent research has demonstrated the significance of incorporating invariance into neural networks. However, existing methods require direct sampling over the entire transformation set, notably computationally taxing for large groups like the affine group. In this study, we propose a more efficient approach by addressing the invariances of the subgroups within a larger group. For tackling affine invariance, we split it into the Euclidean group E(n) and uni-axial scaling group US(n), handling invariance individually. We employ an E(n)-invariant model for E(n)-invariance and average model outputs over data augmented from a US(n) distribution for US(n)-invariance. Our method maintains a favorable computational complexity of O(N
    MeSH term(s) Neural Networks, Computer ; Learning
    Language English
    Publishing date 2024-03-11
    Publishing country United States
    Document type Journal Article
    ZDB-ID 740542-x
    ISSN 1879-2782 ; 0893-6080
    ISSN (online) 1879-2782
    ISSN 0893-6080
    DOI 10.1016/j.neunet.2024.106229
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article: Efficient and generalizable cross-patient epileptic seizure detection through a spiking neural network.

    Zhang, Zongpeng / Xiao, Mingqing / Ji, Taoyun / Jiang, Yuwu / Lin, Tong / Zhou, Xiaohua / Lin, Zhouchen

    Frontiers in neuroscience

    2024  Volume 17, Page(s) 1303564

    Abstract: Introduction: Epilepsy is a global chronic disease that brings pain and inconvenience to patients, and an electroencephalogram (EEG) is the main analytical tool. For clinical aid that can be applied to any patient, an automatic cross-patient epilepsy ... ...

    Abstract Introduction: Epilepsy is a global chronic disease that brings pain and inconvenience to patients, and an electroencephalogram (EEG) is the main analytical tool. For clinical aid that can be applied to any patient, an automatic cross-patient epilepsy seizure detection algorithm is of great significance. Spiking neural networks (SNNs) are modeled on biological neurons and are energy-efficient on neuromorphic hardware, which can be expected to better handle brain signals and benefit real-world, low-power applications. However, automatic epilepsy seizure detection rarely considers SNNs.
    Methods: In this article, we have explored SNNs for cross-patient seizure detection and discovered that SNNs can achieve comparable state-of-the-art performance or a performance that is even better than artificial neural networks (ANNs). We propose an EEG-based spiking neural network (EESNN) with a recurrent spiking convolution structure, which may better take advantage of temporal and biological characteristics in EEG signals.
    Results: We extensively evaluate the performance of different SNN structures, training methods, and time settings, which builds a solid basis for understanding and evaluation of SNNs in seizure detection. Moreover, we show that our EESNN model can achieve energy reduction by several orders of magnitude compared with ANNs according to the theoretical estimation.
    Discussion: These results show the potential for building high-performance, low-power neuromorphic systems for seizure detection and also broaden real-world application scenarios of SNNs.
    Language English
    Publishing date 2024-01-10
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2411902-7
    ISSN 1662-453X ; 1662-4548
    ISSN (online) 1662-453X
    ISSN 1662-4548
    DOI 10.3389/fnins.2023.1303564
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article ; Online: SPIDE: A purely spike-based method for training feedback spiking neural networks.

    Xiao, Mingqing / Meng, Qingyan / Zhang, Zongpeng / Wang, Yisen / Lin, Zhouchen

    Neural networks : the official journal of the International Neural Network Society

    2023  Volume 161, Page(s) 9–24

    Abstract: Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware. However, most supervised SNN training methods, such as conversion from artificial neural networks ... ...

    Abstract Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware. However, most supervised SNN training methods, such as conversion from artificial neural networks or direct training with surrogate gradients, require complex computation rather than spike-based operations of spiking neurons during training. In this paper, we study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method, implicit differentiation on the equilibrium state (IDE), for supervised learning with purely spike-based computation, which demonstrates the potential for energy-efficient training of SNNs. Specifically, we introduce ternary spiking neuron couples and prove that implicit differentiation can be solved by spikes based on this design, so the whole training procedure, including both forward and backward passes, is made as event-driven spike computation, and weights are updated locally with two-stage average firing rates. Then we propose to modify the reset membrane potential to reduce the approximation error of spikes. With these key components, we can train SNNs with flexible structures in a small number of time steps and with firing sparsity during training, and the theoretical estimation of energy costs demonstrates the potential for high efficiency. Meanwhile, experiments show that even with these constraints, our trained models can still achieve competitive results on MNIST, CIFAR-10, CIFAR-100, and CIFAR10-DVS.
    MeSH term(s) Feedback ; Action Potentials/physiology ; Neural Networks, Computer ; Membrane Potentials ; Computers
    Language English
    Publishing date 2023-01-24
    Publishing country United States
    Document type Journal Article
    ZDB-ID 740542-x
    ISSN 1879-2782 ; 0893-6080
    ISSN (online) 1879-2782
    ISSN 0893-6080
    DOI 10.1016/j.neunet.2023.01.026
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article ; Online: Learning Deep Sparse Regularizers With Applications to Multi-View Clustering and Semi-Supervised Classification.

    Wang, Shiping / Chen, Zhaoliang / Du, Shide / Lin, Zhouchen

    IEEE transactions on pattern analysis and machine intelligence

    2022  Volume 44, Issue 9, Page(s) 5042–5055

    Abstract: Sparsity-constrained optimization problems are common in machine learning, such as sparse coding, low-rank minimization and compressive sensing. However, most of previous studies focused on constructing various hand-crafted sparse regularizers, while ... ...

    Abstract Sparsity-constrained optimization problems are common in machine learning, such as sparse coding, low-rank minimization and compressive sensing. However, most of previous studies focused on constructing various hand-crafted sparse regularizers, while little work was devoted to learning adaptive sparse regularizers from given input data for specific tasks. In this paper, we propose a deep sparse regularizer learning model that learns data-driven sparse regularizers adaptively. Via the proximal gradient algorithm, we find that the sparse regularizer learning is equivalent to learning a parameterized activation function. This encourages us to learn sparse regularizers in the deep learning framework. Therefore, we build a neural network composed of multiple blocks, each being differentiable and reusable. All blocks contain learnable piecewise linear activation functions which correspond to the sparse regularizer to be learned. Furthermore, the proposed model is trained with back propagation, and all parameters in this model are learned end-to-end. We apply our framework to multi-view clustering and semi-supervised classification tasks to learn a latent compact representation. Experimental results demonstrate the superiority of the proposed framework over state-of-the-art multi-view learning models.
    Language English
    Publishing date 2022-08-04
    Publishing country United States
    Document type Journal Article
    ISSN 1939-3539
    ISSN (online) 1939-3539
    DOI 10.1109/TPAMI.2021.3082632
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Book ; Online: Restarted Nonconvex Accelerated Gradient Descent

    Li, Huan / Lin, Zhouchen

    No More Polylogarithmic Factor in the $O(\epsilon^{-7/4})$ Complexity

    2022  

    Abstract: This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz continuous gradient and Hessian. We propose two simple accelerated gradient methods, restarted accelerated gradient descent (AGD) and restarted heavy ball (HB) ... ...

    Abstract This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz continuous gradient and Hessian. We propose two simple accelerated gradient methods, restarted accelerated gradient descent (AGD) and restarted heavy ball (HB) method, and establish that our methods achieve an $\epsilon$-approximate first-order stationary point within $O(\epsilon^{-7/4})$ number of gradient evaluations by elementary proofs. Theoretically, our complexity does not hide any polylogarithmic factors, and thus it improves over the best known one by the $O(\log\frac{1}{\epsilon})$ factor. Our algorithms are simple in the sense that they only consist of Nesterov's classical AGD or Polyak's HB iterations, as well as a restart mechanism. They do not invoke negative curvature exploitation or minimization of regularized surrogate functions as the subroutines. In contrast with existing analysis, our elementary proofs use less advanced techniques and do not invoke the analysis of strongly convex AGD or HB. Code is avaliable at https://github.com/lihuanML/RestartAGD.
    Keywords Mathematics - Optimization and Control ; Computer Science - Machine Learning
    Subject code 510
    Publishing date 2022-01-27
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Book ; Online: Code Prompting

    Hu, Yi / Yang, Haotong / Lin, Zhouchen / Zhang, Muhan

    a Neural Symbolic Method for Complex Reasoning in Large Language Models

    2023  

    Abstract: Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning, which can cause ... ...

    Abstract Large language models (LLMs) have scaled up to unlock a wide range of complex reasoning tasks with the aid of various prompting methods. However, current prompting methods generate natural language intermediate steps to help reasoning, which can cause imperfect task reduction and confusion. To mitigate such limitations, we explore code prompting, a neural symbolic prompting method with both zero-shot and few-shot versions which triggers code as intermediate steps. We conduct experiments on 7 widely-used benchmarks involving symbolic reasoning and arithmetic reasoning. Code prompting generally outperforms chain-of-thought (CoT) prompting. To further understand the performance and limitations of code prompting, we perform extensive ablation studies and error analyses, and identify several exclusive advantages of using symbolic promptings compared to natural language. We also consider the ensemble of code prompting and CoT prompting to combine the strengths of both. Finally, we show through experiments how code annotations and their locations affect code prompting.
    Keywords Computer Science - Computation and Language ; Computer Science - Artificial Intelligence
    Subject code 401
    Publishing date 2023-05-29
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

To top