Exploiting Data Characteristics in The Design of Accelerators for Deep Learning

2019 Patrick H. Judd

Author : Patrick H. Judd
Release : 2019
Genre :
Kind :
Book Rating : /5 ( reviews)

Exploiting Data Characteristics in The Design of Accelerators for Deep Learning - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Exploiting Data Characteristics in The Design of Accelerators for Deep Learning write by Patrick H. Judd. This book was released on 2019. Exploiting Data Characteristics in The Design of Accelerators for Deep Learning available in PDF, EPUB and Kindle. The recent "Cambrian explosion" of Deep Learning (DL) algorithms in concert with the end of Moore's Law and Dennard Scaling has spurred interest in the design of custom hardware accelerators for DL algorithms. While DL has progressed quickly thanks in part to the abundance of efficient parallel computation provided by General Purpose Graphics Processing Units, newer DL algorithms demand even higher levels of compute density and efficiency. Furthermore, applications of DL in the mobile and embedded domains demand the energy efficiency of special purpose hardware. DL algorithms are dominated by large matrix-vector product computations, making them ideal targets for wide Single Instruction Multiple Data architectures. For the most part, efficiently mapping the structure of these computations to hardware is straightforward. Building on such designs, this thesis examines the data characteristics of these computations and proposes hardware modifications to exploit them for performance and energy efficiency. Specifically, this thesis examines the sparsity and precision requirements of Deep Convolutional Neural Networks, which comprise multiple layers of matrix-vector product computations. We propose a profiling method to find per layer reduced precision configurations while maintaining high classification accuracy. Following this, we propose three accelerator designs that build on top of the state-of-the-art DaDianNao accelerator. 1) Proteus exploits the reduced precision profiles by adding a light weight memory compression layer, saving energy in memory access and communication, and enabling larger networks in a fixed memory budget. 2) Cnvlutin exploits the presence of zero, and near zero, values in the inter-layer data by applying sparse compression to the data stream while maintain efficient utilization of the wide memory and compute structure of the SIMD accelerator. 3) Stripes exploits the reduced precision profiles for performance by processing data bit-serially, compensating for serial latency by exploiting the abundant parallelism in the convolution operation. All three designs exploit approximation, in terms of reduced precision and computation skipping to improve energy efficiency and/or performance while maintaining high classification accuracy. By approximating more aggressively, these designs can also dynamically trade-off accuracy for further improvements in performance and energy.

Data Orchestration in Deep Learning Accelerators

2022-05-31 Tushar Krishna

Author : Tushar Krishna
Release : 2022-05-31
Genre : Technology & Engineering
Kind :
Book Rating : 676/5 ( reviews)

Data Orchestration in Deep Learning Accelerators - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Data Orchestration in Deep Learning Accelerators write by Tushar Krishna. This book was released on 2022-05-31. Data Orchestration in Deep Learning Accelerators available in PDF, EPUB and Kindle. This Synthesis Lecture focuses on techniques for efficient data orchestration within DNN accelerators. The End of Moore's Law, coupled with the increasing growth in deep learning and other AI applications has led to the emergence of custom Deep Neural Network (DNN) accelerators for energy-efficient inference on edge devices. Modern DNNs have millions of hyper parameters and involve billions of computations; this necessitates extensive data movement from memory to on-chip processing engines. It is well known that the cost of data movement today surpasses the cost of the actual computation; therefore, DNN accelerators require careful orchestration of data across on-chip compute, network, and memory elements to minimize the number of accesses to external DRAM. The book covers DNN dataflows, data reuse, buffer hierarchies, networks-on-chip, and automated design-space exploration. It concludes with data orchestration challenges with compressed and sparse DNNs and future trends. The target audience is students, engineers, and researchers interested in designing high-performance and low-energy accelerators for DNN inference.

Efficient Processing of Deep Neural Networks

2022-05-31 Vivienne Sze

Author : Vivienne Sze
Release : 2022-05-31
Genre : Technology & Engineering
Kind :
Book Rating : 668/5 ( reviews)

Efficient Processing of Deep Neural Networks - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Efficient Processing of Deep Neural Networks write by Vivienne Sze. This book was released on 2022-05-31. Efficient Processing of Deep Neural Networks available in PDF, EPUB and Kindle. This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.

Simulating Dataflow Accelerators for Deep Learning Application in Heterogeneous System

2022 Quang Anh Hoang

Author : Quang Anh Hoang
Release : 2022
Genre : Computer architecture
Kind :
Book Rating : /5 ( reviews)

Simulating Dataflow Accelerators for Deep Learning Application in Heterogeneous System - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Simulating Dataflow Accelerators for Deep Learning Application in Heterogeneous System write by Quang Anh Hoang. This book was released on 2022. Simulating Dataflow Accelerators for Deep Learning Application in Heterogeneous System available in PDF, EPUB and Kindle. For the past few decades, deep learning has emerged as an essential discipline that broadens the horizon of the knowledge of humankind. At its core, Deep Neural Networks (DNN) play a vital role in processing input data to generate predictions or decisions (inference step), with their accuracy ameliorated by extensive training (training step). As the complexity of the problem increases, the number of layers in DNN models tends to rise. Such complex models require more computations and take longer to produce an output. Additionally, the large number of calculations require a tremendous amount of power. Therefore, improving energy efficiency is a primary design consideration. To address this concern, researchers have studied domain-specific architecture to develop highly efficient hardware tailored for a given application, which performs a given set of computations at a lower energy cost. An energy-efficient yet high-performance system is created by pairing this application-specific accelerator with a General-Purpose Processor (GPP). This heterogeneity helps offload the heavy computations to the accelerator while handling less computation intensive tasks on the GPP. In this thesis, we study the performance of dataflow accelerators integrated into a heterogeneous architecture for executing deep learning workloads. Fundamental to these accelerators is their high levels of concurrency in executing computations simultaneously, making them suitable to exploit data parallelism present in DNN operations. With the limited bandwidth of interconnection between accelerator and main memory being one of the critical constraints of a heterogeneous system, a tradeoff between memory overhead and computational runtime is worth considering. This tradeoff is the main criteria we use in this thesis to evaluate the performance of each architecture and configuration. A model of dataflow memristive crossbar array accelerator is first proposed to expand the scope of the heterogeneous simulation framework towards architectures with analog and mixed-signal circuits. At the core of this accelerator, an array of resistive memory cells connected in crossbar architecture is used for computing matrix multiplications. This design aims to study the effect of memory-performance tradeoffs on systems with analog components. Therefore, a comparison between memristive crossbar array architecture and its digital counterpart, systolic array, is presented. While existing studies focus on heterogeneous systems with digital components, this approach is the first to consider a mixed-signal accelerator incorporated with a general-purpose processor for deep learning workloads. Finally, an application interface software is designed to configure the system's architecture and map DNN layers to simulated hardware. At the core of this software is a DNN model parser-partitioner, which provides subsequent tasks of generating a hardware configuration for the accelerator and assigns partitioned workload to the simulated accelerator. The interface provided by this software can be developed further to incorporate scheduling and mapping algorithms. This extension will produce a synthesizer that will facilitate the following: • Hardware configuration: generate the optimal configuration of system hardware, incorporating the key hardware characteristics such as the number of accelerators, dimension of processing array, and memory allocation for each accelerator. • Schedule of execution: implement a mapping algorithm to decide on an efficient distribution and schedule of partitioned workloads. For future development, this synthesizer will unite the first two stages in system's design flow. In the first analysis stage, simulators search for optimal design aspects under a short time frame based on abstract application graphs and the system's specifications. In architecture stage, within the optimal design region from previous stage, simulators refine their findings by studying further details on architectural level. This inter-stage fusion, once finished, can bring the high accuracy of architectural-level simulation tool closer to analysis stage. In the opposite direction, mapping algorithms implemented in analysis tools can provide architectural exploration with near-optimal scheduling. Together, this stack of software can significantly reduce the time searching for specifications with optimal efficiency.

Embedded Deep Learning

2018-10-23 Bert Moons

Author : Bert Moons
Release : 2018-10-23
Genre : Technology & Engineering
Kind :
Book Rating : 236/5 ( reviews)

Embedded Deep Learning - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Embedded Deep Learning write by Bert Moons. This book was released on 2018-10-23. Embedded Deep Learning available in PDF, EPUB and Kindle. This book covers algorithmic and hardware implementation techniques to enable embedded deep learning. The authors describe synergetic design approaches on the application-, algorithmic-, computer architecture-, and circuit-level that will help in achieving the goal of reducing the computational cost of deep learning algorithms. The impact of these techniques is displayed in four silicon prototypes for embedded deep learning. Gives a wide overview of a series of effective solutions for energy-efficient neural networks on battery constrained wearable devices; Discusses the optimization of neural networks for embedded deployment on all levels of the design hierarchy – applications, algorithms, hardware architectures, and circuits – supported by real silicon prototypes; Elaborates on how to design efficient Convolutional Neural Network processors, exploiting parallelism and data-reuse, sparse operations, and low-precision computations; Supports the introduced theory and design concepts by four real silicon prototypes. The physical realization’s implementation and achieved performances are discussed elaborately to illustrated and highlight the introduced cross-layer design concepts.

You may also like...