LIVIVO - Search results -

Search results

Result 1 - 10 of total 38

Search options

Article: Algorithm for Training Neural Networks on Resistive Device Arrays.

Gokmen, Tayfun / Haensch, Wilfried

2020 Volume 14, Page(s) 103

Abstract: Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training ... ...

Abstract	Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training accuracy on this imminent analog hardware, however, strongly depends on the switching characteristics of the cross-point elements. One of the key requirements is that these resistive devices must change conductance in a symmetrical fashion when subjected to positive or negative pulse stimuli. Here, we present a new training algorithm, so-called the "Tiki-Taka" algorithm, that eliminates this stringent symmetry requirement. We show that device asymmetry introduces an unintentional implicit cost term into the SGD algorithm, whereas in the "Tiki-Taka" algorithm a coupled dynamical system simultaneously minimizes the original objective function of the neural network and the unintentional cost term due to device asymmetry in a self-consistent fashion. We tested the validity of this new algorithm on a range of network architectures such as fully connected, convolutional and LSTM networks. Simulation results on these various networks show that the accuracy achieved using the conventional SGD algorithm with symmetric (ideal) device switching characteristics is matched in accuracy achieved using the "Tiki-Taka" algorithm with non-symmetric (non-ideal) device switching characteristics. Moreover, all the operations performed on the arrays are still parallel and therefore the implementation cost of this new algorithm on array architectures is minimal; and it maintains the aforementioned power and speed benefits. These algorithmic improvements are crucial to relax the material specification and to realize technologically viable resistive crossbar arrays that outperform digital accelerators for similar training tasks.
Language	English
Publishing date	2020-02-26
Publishing country	Switzerland
Document type	Journal Article
ZDB-ID	2411902-7
ISSN	1662-453X ; 1662-4548
ISSN (online)	1662-453X
ISSN	1662-4548
DOI	10.3389/fnins.2020.00103
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.

Book ; Online: LRMP

Nallathambi, Abinand / Bose, Christin David / Haensch, Wilfried / Raghunathan, Anand

Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

2023

Abstract: In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves ... ...

Abstract	In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained NVM-based IMC accelerators. LRMP uses a combination of reinforcement learning and integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.8-9$\times$ latency and 11.8-19$\times$ throughput improvement at iso-accuracy.
Keywords	Computer Science - Hardware Architecture
Publishing date	2023-12-05
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: Compute in-Memory with Non-Volatile Elements for Neural Networks: A Review from a Co-Design Perspective.

Haensch, Wilfried / Raghunathan, Anand / Roy, Kaushik / Chakrabarti, Bhaswar / Phatak, Charudatta M / Wang, Cheng / Guha, Supratik

Advanced materials (Deerfield Beach, Fla.)

2023 Volume 35, Issue 37, Page(s) e2204944

Abstract: Deep learning has become ubiquitous, touching daily lives across the globe. Today, traditional computer architectures are stressed to their limits in efficiently executing the growing complexity of data and models. Compute-in-memory (CIM) can potentially ...

Abstract	Deep learning has become ubiquitous, touching daily lives across the globe. Today, traditional computer architectures are stressed to their limits in efficiently executing the growing complexity of data and models. Compute-in-memory (CIM) can potentially play an important role in developing efficient hardware solutions that reduce data movement from compute-unit to memory, known as the von Neumann bottleneck. At its heart is a cross-bar architecture with nodal non-volatile-memory elements that performs an analog multiply-and-accumulate operation, enabling the matrix-vector-multiplications repeatedly used in all neural network workloads. The memory materials can significantly influence final system-level characteristics and chip performance, including speed, power, and classification accuracy. With an over-arching co-design viewpoint, this review assesses the use of cross-bar based CIM for neural networks, connecting the material properties and the associated design constraints and demands to application, architecture, and performance. Both digital and analog memory are considered, assessing the status for training and inference, and providing metrics for the collective set of properties non-volatile memory materials will need to demonstrate for a successful CIM technology.
Language	English
Publishing date	2023-03-02
Publishing country	Germany
Document type	Journal Article ; Review
ZDB-ID	1474949-X
ISSN	1521-4095 ; 0935-9648
ISSN (online)	1521-4095
ISSN	0935-9648
DOI	10.1002/adma.202204944
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Book ; Online: Algorithm for Training Neural Networks on Resistive Device Arrays

Gokmen, Tayfun / Haensch, Wilfried

2019

Abstract	Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training accuracy on this imminent analog hardware however strongly depends on the switching characteristics of the cross-point elements. One of the key requirements is that these resistive devices must change conductance in a symmetrical fashion when subjected to positive or negative pulse stimuli. Here, we present a new training algorithm, so-called the "Tiki-Taka" algorithm, that eliminates this stringent symmetry requirement. We show that device asymmetry introduces an unintentional implicit cost term into the SGD algorithm, whereas in the "Tiki-Taka" algorithm a coupled dynamical system simultaneously minimizes the original objective function of the neural network and the unintentional cost term due to device asymmetry in a self-consistent fashion. We tested the validity of this new algorithm on a range of network architectures such as fully connected, convolutional and LSTM networks. Simulation results on these various networks show that whatever accuracy is achieved using the conventional SGD algorithm with symmetric (ideal) device switching characteristics the same accuracy is also achieved using the "Tiki-Taka" algorithm with non-symmetric (non-ideal) device switching characteristics. Moreover, all the operations performed on the arrays are still parallel and therefore the implementation cost of this new algorithm on array architectures is minimal; and it maintains the aforementioned power and speed benefits. These algorithmic improvements are crucial to relax the material specification and to realize technologically viable resistive crossbar arrays that outperform digital accelerators for similar training tasks. Comment: 26 pages, 7 fiures
Keywords	Computer Science - Machine Learning ; Computer Science - Emerging Technologies ; Computer Science - Neural and Evolutionary Computing ; Statistics - Machine Learning
Subject code	006
Publishing date	2019-09-17
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article: Training LSTM Networks With Resistive Cross-Point Devices.

Gokmen, Tayfun / Rasch, Malte J / Haensch, Wilfried

Frontiers in neuroscience

2018 Volume 12, Page(s) 745

Abstract: In our previous work we have shown that resistive cross point devices, so called resistive processing unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional neural ... ...

Abstract	In our previous work we have shown that resistive cross point devices, so called resistive processing unit (RPU) devices, can provide significant power and speed benefits when training deep fully connected networks as well as convolutional neural networks. In this work, we further extend the RPU concept for training recurrent neural networks (RNNs) namely LSTMs. We show that the mapping of recurrent layers is very similar to the mapping of fully connected layers and therefore the RPU concept can potentially provide large acceleration factors for RNNs as well. In addition, we study the effect of various device imperfections and system parameters on training performance. Symmetry of updates becomes even more crucial for RNNs; already a few percent asymmetry results in an increase in the test error compared to the ideal case trained with floating point numbers. Furthermore, the input signal resolution to the device arrays needs to be at least 7 bits for successful training. However, we show that a stochastic rounding scheme can reduce the input signal resolution back to 5 bits. Further, we find that RPU device variations and hardware noise are enough to mitigate overfitting, so that there is less need for using dropout. Here we attempt to study the validity of the RPU approach by simulating large scale networks. For instance, the models studied here are roughly 1500 times larger than the more often studied multilayer perceptron models trained on the MNIST dataset in terms of the total number of multiplication and summation operations performed per epoch.
Language	English
Publishing date	2018-10-24
Publishing country	Switzerland
Document type	Journal Article
ZDB-ID	2411902-7
ISSN	1662-453X ; 1662-4548
ISSN (online)	1662-453X
ISSN	1662-4548
DOI	10.3389/fnins.2018.00745
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article ; Online: Neural Network Training With Asymmetric Crosspoint Elements.

Onen, Murat / Gokmen, Tayfun / Todorov, Teodor K / Nowicki, Tomasz / Del Alamo, Jesús A / Rozen, John / Haensch, Wilfried / Kim, Seyoung

Frontiers in artificial intelligence

2022 Volume 5, Page(s) 891624

Abstract: Analog crossbar arrays comprising programmable non-volatile resistors are under intense investigation for acceleration of deep neural network training. However, the ubiquitous asymmetric conductance modulation of practical resistive devices critically ... ...

Abstract	Analog crossbar arrays comprising programmable non-volatile resistors are under intense investigation for acceleration of deep neural network training. However, the ubiquitous asymmetric conductance modulation of practical resistive devices critically degrades the classification performance of networks trained with conventional algorithms. Here we first describe the fundamental reasons behind this incompatibility. Then, we explain the theoretical underpinnings of a novel fully-parallel training algorithm that is compatible with asymmetric crosspoint elements. By establishing a powerful analogy with classical mechanics, we explain how device asymmetry can be exploited as a useful feature for analog deep learning processors. Instead of conventionally tuning weights in the direction of the error function gradient, network parameters can be programmed to successfully minimize the total energy (Hamiltonian) of the system that incorporates the effects of device asymmetry. Our technique enables immediate realization of analog deep learning accelerators based on readily available device technologies.
Language	English
Publishing date	2022-05-09
Publishing country	Switzerland
Document type	Journal Article
ISSN	2624-8212
ISSN (online)	2624-8212
DOI	10.3389/frai.2022.891624
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article: RAPA-ConvNets: Modified Convolutional Networks for Accelerated Training on Architectures With Analog Arrays.

Rasch, Malte J / Gokmen, Tayfun / Rigotti, Mattia / Haensch, Wilfried

Frontiers in neuroscience

2019 Volume 13, Page(s) 753

Abstract: Analog arrays are a promising emerging hardware technology with the potential to drastically speed up deep learning. Their main advantage is that they employ analog circuitry to compute matrix-vector products in constant time, irrespective of the size of ...

Abstract	Analog arrays are a promising emerging hardware technology with the potential to drastically speed up deep learning. Their main advantage is that they employ analog circuitry to compute matrix-vector products in constant time, irrespective of the size of the matrix. However, ConvNets map very unfavorably onto analog arrays when done in a straight-forward manner, because kernel matrices are typically small and the constant time operation needs to be sequentially iterated a large number of times. Here, we propose to parallelize the training by replicating the kernel matrix of a convolution layer on distinct analog arrays, and randomly divide parts of the compute among them. With this modification, analog arrays execute ConvNets with a large acceleration factor that is proportional to the number of kernel matrices used per layer (here tested 16-1024). Despite having more free parameters, we show analytically and in numerical experiments that this new convolution architecture is self-regularizing and implicitly learns similar filters across arrays. We also report superior performance on a number of datasets and increased robustness to adversarial attacks. Our investigation suggests to revise the notion that emerging hardware architectures that feature analog arrays for fast matrix-vector multiplication are not suitable for ConvNets.
Language	English
Publishing date	2019-07-30
Publishing country	Switzerland
Document type	Journal Article
ZDB-ID	2411902-7
ISSN	1662-453X ; 1662-4548
ISSN (online)	1662-453X
ISSN	1662-4548
DOI	10.3389/fnins.2019.00753
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Article: Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices.

Gokmen, Tayfun / Onen, Murat / Haensch, Wilfried

Frontiers in neuroscience

2017 Volume 11, Page(s) 538

Abstract: In a previous work we have detailed the requirements for obtaining maximal deep learning performance benefit by implementing fully connected deep neural networks (DNN) in the form of arrays of resistive devices. Here we extend the concept of Resistive ... ...

Abstract	In a previous work we have detailed the requirements for obtaining maximal deep learning performance benefit by implementing fully connected deep neural networks (DNN) in the form of arrays of resistive devices. Here we extend the concept of Resistive Processing Unit (RPU) devices to convolutional neural networks (CNNs). We show how to map the convolutional layers to fully connected RPU arrays such that the parallelism of the hardware can be fully utilized in all three cycles of the backpropagation algorithm. We find that the noise and bound limitations imposed by the analog nature of the computations performed on the arrays significantly affect the training accuracy of the CNNs. Noise and bound management techniques are presented that mitigate these problems without introducing any additional complexity in the analog circuits and that can be addressed by the digital circuits. In addition, we discuss digitally programmable update management and device variability reduction techniques that can be used selectively for some of the layers in a CNN. We show that a combination of all those techniques enables a successful application of the RPU concept for training CNNs. The techniques discussed here are more general and can be applied beyond CNN architectures and therefore enables applicability of the RPU approach to a large class of neural network architectures.
Language	English
Publishing date	2017-10-10
Publishing country	Switzerland
Document type	Journal Article
ZDB-ID	2411902-7
ISSN	1662-453X ; 1662-4548
ISSN (online)	1662-453X
ISSN	1662-4548
DOI	10.3389/fnins.2017.00538
Database	MEDical Literature Analysis and Retrieval System OnLINE

Order via subito

Book ; Online: Training large-scale ANNs on simulated resistive crossbar arrays

Rasch, Malte J. / Gokmen, Tayfun / Haensch, Wilfried

2019

Abstract: Accelerating training of artificial neural networks (ANN) with analog resistive crossbar arrays is a promising idea. While the concept has been verified on very small ANNs and toy data sets (such as MNIST), more realistically sized ANNs and datasets have ...

Abstract	Accelerating training of artificial neural networks (ANN) with analog resistive crossbar arrays is a promising idea. While the concept has been verified on very small ANNs and toy data sets (such as MNIST), more realistically sized ANNs and datasets have not yet been tackled. However, it is to be expected that device materials and hardware design constraints, such as noisy computations, finite number of resistive states of the device materials, saturating weight and activation ranges, and limited precision of analog-to-digital converters, will cause significant challenges to the successful training of state-of-the-art ANNs. By using analog hardware aware ANN training simulations, we here explore a number of simple algorithmic compensatory measures to cope with analog noise and limited weight and output ranges and resolutions, that dramatically improve the simulated training performances on RPU arrays on intermediately to large-scale ANNs.
Keywords	Computer Science - Neural and Evolutionary Computing ; Computer Science - Emerging Technologies ; Computer Science - Machine Learning
Subject code	006
Publishing date	2019-06-06
Publishing country	us
Document type	Book ; Online
Database	BASE - Bielefeld Academic Search Engine (life sciences selection)

Full text online

Full text

Inter-library loan at ZB MED

Your chosen title can be delivered directly to ZB MED Cologne location if you are registered as a user at ZB MED Cologne.

Article ; Online: ENGINEERING. Solar-powering the Internet of Things.

Haight, Richard / Haensch, Wilfried / Friedman, Daniel

Science (New York, N.Y.)

2016 Volume 353, Issue 6295, Page(s) 124–125

Language	English
Publishing date	2016-07-08
Publishing country	United States
Document type	Journal Article
ZDB-ID	128410-1
ISSN	1095-9203 ; 0036-8075
ISSN (online)	1095-9203
ISSN	0036-8075
DOI	10.1126/science.aag0476
Database	MEDical Literature Analysis and Retrieval System OnLINE

In stock of ZB MED Cologne/Königswinter

Zs.A 27: Show issues			Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (1.OG) ab Jg. 2022: Lesesaal (EG)
Zs.MG 77: Show issues

Order via subito

Details ▾
- See ZB MED holdings
- Order with fees

To top

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

Inter-library loan at ZB MED

More links

Kategorien

Order via subito

More links

Kategorien

Order via subito

Full text online

More links

Kategorien

Inter-library loan at ZB MED

More links

Kategorien

In stock of ZB MED Cologne/Königswinter

Order via subito