LIVIVO - The Search Portal for Life Sciences

zur deutschen Oberfläche wechseln
Advanced search

Search results

Result 1 - 10 of total 38

Search options

  1. Article: Algorithm for Training Neural Networks on Resistive Device Arrays.

    Gokmen, Tayfun / Haensch, Wilfried

    Frontiers in neuroscience

    2020  Volume 14, Page(s) 103

    Abstract: Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training ... ...

    Abstract Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training accuracy on this imminent analog hardware, however, strongly depends on the switching characteristics of the cross-point elements. One of the key requirements is that these resistive devices must change conductance in a symmetrical fashion when subjected to positive or negative pulse stimuli. Here, we present a new training algorithm, so-called the "Tiki-Taka" algorithm, that eliminates this stringent symmetry requirement. We show that device asymmetry introduces an unintentional implicit cost term into the SGD algorithm, whereas in the "Tiki-Taka" algorithm a coupled dynamical system simultaneously minimizes the original objective function of the neural network and the unintentional cost term due to device asymmetry in a self-consistent fashion. We tested the validity of this new algorithm on a range of network architectures such as fully connected, convolutional and LSTM networks. Simulation results on these various networks show that the accuracy achieved using the conventional SGD algorithm with symmetric (ideal) device switching characteristics is matched in accuracy achieved using the "Tiki-Taka" algorithm with non-symmetric (non-ideal) device switching characteristics. Moreover, all the operations performed on the arrays are still parallel and therefore the implementation cost of this new algorithm on array architectures is minimal; and it maintains the aforementioned power and speed benefits. These algorithmic improvements are crucial to relax the material specification and to realize technologically viable resistive crossbar arrays that outperform digital accelerators for similar training tasks.
    Language English
    Publishing date 2020-02-26
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2411902-7
    ISSN 1662-453X ; 1662-4548
    ISSN (online) 1662-453X
    ISSN 1662-4548
    DOI 10.3389/fnins.2020.00103
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  2. Book ; Online: LRMP

    Nallathambi, Abinand / Bose, Christin David / Haensch, Wilfried / Raghunathan, Anand

    Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

    2023  

    Abstract: In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves ... ...

    Abstract In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained NVM-based IMC accelerators. LRMP uses a combination of reinforcement learning and integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.8-9$\times$ latency and 11.8-19$\times$ throughput improvement at iso-accuracy.
    Keywords Computer Science - Hardware Architecture
    Publishing date 2023-12-05
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  3. Article ; Online: Compute in-Memory with Non-Volatile Elements for Neural Networks: A Review from a Co-Design Perspective.

    Haensch, Wilfried / Raghunathan, Anand / Roy, Kaushik / Chakrabarti, Bhaswar / Phatak, Charudatta M / Wang, Cheng / Guha, Supratik

    Advanced materials (Deerfield Beach, Fla.)

    2023  Volume 35, Issue 37, Page(s) e2204944

    Abstract: Deep learning has become ubiquitous, touching daily lives across the globe. Today, traditional computer architectures are stressed to their limits in efficiently executing the growing complexity of data and models. Compute-in-memory (CIM) can potentially ...

    Abstract Deep learning has become ubiquitous, touching daily lives across the globe. Today, traditional computer architectures are stressed to their limits in efficiently executing the growing complexity of data and models. Compute-in-memory (CIM) can potentially play an important role in developing efficient hardware solutions that reduce data movement from compute-unit to memory, known as the von Neumann bottleneck. At its heart is a cross-bar architecture with nodal non-volatile-memory elements that performs an analog multiply-and-accumulate operation, enabling the matrix-vector-multiplications repeatedly used in all neural network workloads. The memory materials can significantly influence final system-level characteristics and chip performance, including speed, power, and classification accuracy. With an over-arching co-design viewpoint, this review assesses the use of cross-bar based CIM for neural networks, connecting the material properties and the associated design constraints and demands to application, architecture, and performance. Both digital and analog memory are considered, assessing the status for training and inference, and providing metrics for the collective set of properties non-volatile memory materials will need to demonstrate for a successful CIM technology.
    Language English
    Publishing date 2023-03-02
    Publishing country Germany
    Document type Journal Article ; Review
    ZDB-ID 1474949-X
    ISSN 1521-4095 ; 0935-9648
    ISSN (online) 1521-4095
    ISSN 0935-9648
    DOI 10.1002/adma.202204944
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  4. Book ; Online: Algorithm for Training Neural Networks on Resistive Device Arrays

    Gokmen, Tayfun / Haensch, Wilfried

    2019  

    Abstract: Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training ... ...

    Abstract Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training accuracy on this imminent analog hardware however strongly depends on the switching characteristics of the cross-point elements. One of the key requirements is that these resistive devices must change conductance in a symmetrical fashion when subjected to positive or negative pulse stimuli. Here, we present a new training algorithm, so-called the "Tiki-Taka" algorithm, that eliminates this stringent symmetry requirement. We show that device asymmetry introduces an unintentional implicit cost term into the SGD algorithm, whereas in the "Tiki-Taka" algorithm a coupled dynamical system simultaneously minimizes the original objective function of the neural network and the unintentional cost term due to device asymmetry in a self-consistent fashion. We tested the validity of this new algorithm on a range of network architectures such as fully connected, convolutional and LSTM networks. Simulation results on these various networks show that whatever accuracy is achieved using the conventional SGD algorithm with symmetric (ideal) device switching characteristics the same accuracy is also achieved using the "Tiki-Taka" algorithm with non-symmetric (non-ideal) device switching characteristics. Moreover, all the operations performed on the arrays are still parallel and therefore the implementation cost of this new algorithm on array architectures is minimal; and it maintains the aforementioned power and speed benefits. These algorithmic improvements are crucial to relax the material specification and to realize technologically viable resistive crossbar arrays that outperform digital accelerators for similar training tasks.

    Comment: 26 pages, 7 fiures
    Keywords Computer Science - Machine Learning ; Computer Science - Emerging Technologies ; Computer Science - Neural and Evolutionary Computing ; Statistics - Machine Learning
    Subject code 006
    Publishing date 2019-09-17
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  5. Article: Training LSTM Networks With Resistive Cross-Point Devices.

    Gokmen, Tayfun / Rasch, Malte J / Haensch, Wilfried

    Frontiers in neuroscience

    2018  Volume 12, Page(s) 745

    Abstract: In our previous work we have shown that resistive cross point devices, so called resistive processing unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional neural ... ...

    Abstract In our previous work we have shown that resistive cross point devices, so called resistive processing unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional neural networks. In this work, we further extend the RPU concept for training recurrent neural networks (RNNs) namely LSTMs. We show that the mapping of recurrent layers is very similar to the mapping of fully connected layers and therefore the RPU concept can potentially provide large acceleration factors for RNNs as well. In addition, we study the effect of various device imperfections and system parameters on training performance. Symmetry of updates becomes even more crucial for RNNs; already a few percent asymmetry results in an increase in the test error compared to the ideal case trained with floating point numbers. Furthermore, the input signal resolution to the device arrays needs to be at least 7 bits for successful training. However, we show that a stochastic rounding scheme can reduce the input signal resolution back to 5 bits. Further, we find that RPU device variations and hardware noise are enough to mitigate overfitting, so that there is less need for using dropout. Here we attempt to study the validity of the RPU approach by simulating large scale networks. For instance, the models studied here are roughly 1500 times larger than the more often studied multilayer perceptron models trained on the MNIST dataset in terms of the total number of multiplication and summation operations performed per epoch.
    Language English
    Publishing date 2018-10-24
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2411902-7
    ISSN 1662-453X ; 1662-4548
    ISSN (online) 1662-453X
    ISSN 1662-4548
    DOI 10.3389/fnins.2018.00745
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  6. Article ; Online: Neural Network Training With Asymmetric Crosspoint Elements.

    Onen, Murat / Gokmen, Tayfun / Todorov, Teodor K / Nowicki, Tomasz / Del Alamo, Jesús A / Rozen, John / Haensch, Wilfried / Kim, Seyoung

    Frontiers in artificial intelligence

    2022  Volume 5, Page(s) 891624

    Abstract: Analog crossbar arrays comprising programmable non-volatile resistors are under intense investigation for acceleration of deep neural network training. However, the ubiquitous asymmetric conductance modulation of practical resistive devices critically ... ...

    Abstract Analog crossbar arrays comprising programmable non-volatile resistors are under intense investigation for acceleration of deep neural network training. However, the ubiquitous asymmetric conductance modulation of practical resistive devices critically degrades the classification performance of networks trained with conventional algorithms. Here we first describe the fundamental reasons behind this incompatibility. Then, we explain the theoretical underpinnings of a novel fully-parallel training algorithm that is compatible with asymmetric crosspoint elements. By establishing a powerful analogy with classical mechanics, we explain how device asymmetry can be exploited as a useful feature for analog deep learning processors. Instead of conventionally tuning weights in the direction of the error function gradient, network parameters can be programmed to successfully minimize the total energy (Hamiltonian) of the system that incorporates the effects of device asymmetry. Our technique enables immediate realization of analog deep learning accelerators based on readily available device technologies.
    Language English
    Publishing date 2022-05-09
    Publishing country Switzerland
    Document type Journal Article
    ISSN 2624-8212
    ISSN (online) 2624-8212
    DOI 10.3389/frai.2022.891624
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  7. Article: RAPA-ConvNets: Modified Convolutional Networks for Accelerated Training on Architectures With Analog Arrays.

    Rasch, Malte J / Gokmen, Tayfun / Rigotti, Mattia / Haensch, Wilfried

    Frontiers in neuroscience

    2019  Volume 13, Page(s) 753

    Abstract: Analog arrays are a promising emerging hardware technology with the potential to drastically speed up deep learning. Their main advantage is that they employ analog circuitry to compute matrix-vector products in constant time, irrespective of the size of ...

    Abstract Analog arrays are a promising emerging hardware technology with the potential to drastically speed up deep learning. Their main advantage is that they employ analog circuitry to compute matrix-vector products in constant time, irrespective of the size of the matrix. However, ConvNets map very unfavorably onto analog arrays when done in a straight-forward manner, because kernel matrices are typically small and the constant time operation needs to be sequentially iterated a large number of times. Here, we propose to parallelize the training by replicating the kernel matrix of a convolution layer on distinct analog arrays, and randomly divide parts of the compute among them. With this modification, analog arrays execute ConvNets with a large acceleration factor that is proportional to the number of kernel matrices used per layer (here tested 16-1024). Despite having more free parameters, we show analytically and in numerical experiments that this new convolution architecture is self-regularizing and implicitly learns similar filters across arrays. We also report superior performance on a number of datasets and increased robustness to adversarial attacks. Our investigation suggests to revise the notion that emerging hardware architectures that feature analog arrays for fast matrix-vector multiplication are not suitable for ConvNets.
    Language English
    Publishing date 2019-07-30
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2411902-7
    ISSN 1662-453X ; 1662-4548
    ISSN (online) 1662-453X
    ISSN 1662-4548
    DOI 10.3389/fnins.2019.00753
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  8. Article: Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices.

    Gokmen, Tayfun / Onen, Murat / Haensch, Wilfried

    Frontiers in neuroscience

    2017  Volume 11, Page(s) 538

    Abstract: In a previous work we have detailed the requirements for obtaining maximal deep learning performance benefit by implementing fully connected deep neural networks (DNN) in the form of arrays of resistive devices. Here we extend the concept of Resistive ... ...

    Abstract In a previous work we have detailed the requirements for obtaining maximal deep learning performance benefit by implementing fully connected deep neural networks (DNN) in the form of arrays of resistive devices. Here we extend the concept of Resistive Processing Unit (RPU) devices to convolutional neural networks (CNNs). We show how to map the convolutional layers to fully connected RPU arrays such that the parallelism of the hardware can be fully utilized in all three cycles of the backpropagation algorithm. We find that the noise and bound limitations imposed by the analog nature of the computations performed on the arrays significantly affect the training accuracy of the CNNs. Noise and bound management techniques are presented that mitigate these problems without introducing any additional complexity in the analog circuits and that can be addressed by the digital circuits. In addition, we discuss digitally programmable update management and device variability reduction techniques that can be used selectively for some of the layers in a CNN. We show that a combination of all those techniques enables a successful application of the RPU concept for training CNNs. The techniques discussed here are more general and can be applied beyond CNN architectures and therefore enables applicability of the RPU approach to a large class of neural network architectures.
    Language English
    Publishing date 2017-10-10
    Publishing country Switzerland
    Document type Journal Article
    ZDB-ID 2411902-7
    ISSN 1662-453X ; 1662-4548
    ISSN (online) 1662-453X
    ISSN 1662-4548
    DOI 10.3389/fnins.2017.00538
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

  9. Book ; Online: Training large-scale ANNs on simulated resistive crossbar arrays

    Rasch, Malte J. / Gokmen, Tayfun / Haensch, Wilfried

    2019  

    Abstract: Accelerating training of artificial neural networks (ANN) with analog resistive crossbar arrays is a promising idea. While the concept has been verified on very small ANNs and toy data sets (such as MNIST), more realistically sized ANNs and datasets have ...

    Abstract Accelerating training of artificial neural networks (ANN) with analog resistive crossbar arrays is a promising idea. While the concept has been verified on very small ANNs and toy data sets (such as MNIST), more realistically sized ANNs and datasets have not yet been tackled. However, it is to be expected that device materials and hardware design constraints, such as noisy computations, finite number of resistive states of the device materials, saturating weight and activation ranges, and limited precision of analog-to-digital converters, will cause significant challenges to the successful training of state-of-the-art ANNs. By using analog hardware aware ANN training simulations, we here explore a number of simple algorithmic compensatory measures to cope with analog noise and limited weight and output ranges and resolutions, that dramatically improve the simulated training performances on RPU arrays on intermediately to large-scale ANNs.
    Keywords Computer Science - Neural and Evolutionary Computing ; Computer Science - Emerging Technologies ; Computer Science - Machine Learning
    Subject code 006
    Publishing date 2019-06-06
    Publishing country us
    Document type Book ; Online
    Database BASE - Bielefeld Academic Search Engine (life sciences selection)

    More links

    Kategorien

  10. Article ; Online: ENGINEERING. Solar-powering the Internet of Things.

    Haight, Richard / Haensch, Wilfried / Friedman, Daniel

    Science (New York, N.Y.)

    2016  Volume 353, Issue 6295, Page(s) 124–125

    Language English
    Publishing date 2016-07-08
    Publishing country United States
    Document type Journal Article
    ZDB-ID 128410-1
    ISSN 1095-9203 ; 0036-8075
    ISSN (online) 1095-9203
    ISSN 0036-8075
    DOI 10.1126/science.aag0476
    Database MEDical Literature Analysis and Retrieval System OnLINE

    More links

    Kategorien

To top