Hardware-aware Algorithms for Efficient Machine Learning

2023 Tri Dao Phuc Quang

Author : Tri Dao Phuc Quang
Release : 2023
Genre :
Kind :
Book Rating : /5 ( reviews)

Hardware-aware Algorithms for Efficient Machine Learning - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Hardware-aware Algorithms for Efficient Machine Learning write by Tri Dao Phuc Quang. This book was released on 2023. Hardware-aware Algorithms for Efficient Machine Learning available in PDF, EPUB and Kindle. Machine learning (ML) training will continue to grow to consume more cycles, their inference will proliferate on more kinds of devices, and their capabilities will be used in more domains. Some goals central to this future are to make ML models efficient so they remain practical to train and deploy, and to unlock new application domains with new capabilities. We describe some recent developments in hardware-aware algorithms to improve the efficiency-quality tradeoff of ML models and equip them with long context. In Chapter 2, we focus on structured sparsity, a natural approach to mitigate the extensive compute and memory cost of large ML models. We describe a line of work on learnable fast transforms that, thanks to their expressiveness and efficiency, yields some of the first sparse training methods to speed up large models in wall-clock time (2x) without compromising their quality. In Chapter 3, we focus on efficient Transformer training and inference for long sequences. We describe FlashAttention, a fast and memory-efficient algorithm to compute attention with no approximation. By careful accounting of reads/writes between different levels of memory hierarchy, FlashAttention is 2-4x faster and uses 10-20x less memory compared to the best existing attention implementations, allowing us to train higher-quality Transformers with 8x longer context. FlashAttention is now widely used in some of the largest research labs and companies. In Chapter 4, we examine state-space models, a promising architecture designed for long-range memory. As we seek to understand why early state-space models did not perform well on language modeling tasks, we propose simple multiplicative interaction that expands their expressiveness. We also design hardware-friendly algorithms to train them. As a result, we are able to train state-space models to multi-billion parameter scale, demonstrating a new kind of model competitive with the dominant Transformers in language modeling. We conclude with some exciting directions in ML and systems, such as software-hardware co-design, structured sparsity for scientific AI, and long context for new AI workflows and modalities.

Hardware-Aware Probabilistic Machine Learning Models

2021-05-19 Laura Isabel Galindez Olascoaga

Author : Laura Isabel Galindez Olascoaga
Release : 2021-05-19
Genre : Technology & Engineering
Kind :
Book Rating : 420/5 ( reviews)

Hardware-Aware Probabilistic Machine Learning Models - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Hardware-Aware Probabilistic Machine Learning Models write by Laura Isabel Galindez Olascoaga. This book was released on 2021-05-19. Hardware-Aware Probabilistic Machine Learning Models available in PDF, EPUB and Kindle. This book proposes probabilistic machine learning models that represent the hardware properties of the device hosting them. These models can be used to evaluate the impact that a specific device configuration may have on resource consumption and performance of the machine learning task, with the overarching goal of balancing the two optimally. The book first motivates extreme-edge computing in the context of the Internet of Things (IoT) paradigm. Then, it briefly reviews the steps involved in the execution of a machine learning task and identifies the implications associated with implementing this type of workload in resource-constrained devices. The core of this book focuses on augmenting and exploiting the properties of Bayesian Networks and Probabilistic Circuits in order to endow them with hardware-awareness. The proposed models can encode the properties of various device sub-systems that are typically not considered by other resource-aware strategies, bringing about resource-saving opportunities that traditional approaches fail to uncover. The performance of the proposed models and strategies is empirically evaluated for several use cases. All of the considered examples show the potential of attaining significant resource-saving opportunities with minimal accuracy losses at application time. Overall, this book constitutes a novel approach to hardware-algorithm co-optimization that further bridges the fields of Machine Learning and Electrical Engineering.

Efficient Machine Learning Software Stack from Algorithms to Compilation

2023 Zixuan Jiang

Author : Zixuan Jiang
Release : 2023
Genre :
Kind :
Book Rating : /5 ( reviews)

Efficient Machine Learning Software Stack from Algorithms to Compilation - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Efficient Machine Learning Software Stack from Algorithms to Compilation write by Zixuan Jiang. This book was released on 2023. Efficient Machine Learning Software Stack from Algorithms to Compilation available in PDF, EPUB and Kindle. Machine learning enables the extraction of knowledge from data and decision-making without explicit programming, achieving great success and revolutionizing many fields. These successes can be attributed to the continuous advancements in machine learning software and hardware, which have expanded the boundaries and facilitated breakthroughs in diverse applications. The machine learning software stack is a comprehensive collection of components used to solve problems with machine learning algorithms. It encompasses problem definitions, data processing, model and method designs, software frameworks, libraries, code optimization, and system management. This stack supports the entire life cycle of a machine learning project. The software stack allows the community to stand on the shoulders of previous great work and push the limit of machine learning, fostering innovation and enabling broader adoption of machine learning techniques in academia and industry. The software stack is usually divided into algorithm and compilation with distinct design principles. Algorithm design prioritizes task-related performance, while compilation focuses on execution time and resource consumption on hardware devices. Maintaining arithmetic equivalence is optional in algorithm design, but compulsory in compilation to ensure consistent results. The compilation is closer to hardware than algorithm design. Compilation engineers optimize for hardware specifications, while algorithm developers usually do not prioritize hardware-friendliness. Opportunities to enhance hardware efficiency exist in algorithm and compilation designs, as well as their interplay. Despite extensive innovations and improvements, efficiency in the machine learning software stack is a continuing challenge. Algorithm design proposes efficient model architectures and learning algorithms, while compilation design optimizes computation graphs and simplifies operations. However, there is still a gap between the demand for efficiency and the current solutions, driven by rapidly growing workloads, limited resources in specific machine learning applications, and the need for cross-layer design. Addressing these challenges requires interdisciplinary research and collaboration. Improving efficiency in the machine learning software stack will optimize performance and enhance the accessibility and applicability of machine learning technologies. In this dissertation, we focus on addressing these efficiency challenges from the perspectives of machine learning algorithms and compilation. We introduce three novel improvements that enhance the efficiency of mainstream machine learning algorithms. Firstly, effective gradient matching for dataset condensation generates a small insightful dataset, accelerating training and other related tasks. Additionally, NormSoftmax proposes to append a normalization layer to achieve fast and stable training in Transformers and classification models. Lastly, mixed precision hardware-aware neural architecture search combines mixed-precision quantization, neural architecture search, and hardware energy efficiency, resulting in significantly more efficient neural networks than using a single method. However, algorithmic efficiency alone is insufficient to fully exploit the potential in the machine learning software stack. We delve into and optimize the compilation processes with three techniques. Firstly, we simplify the layer normalization in the influential Transformers, obtaining two equivalent and efficient Transformer variants with alternative normalization types. Our proposed variants enable efficient training and inference of popular models like GPT and ViT. Secondly, we formulate and solve the scheduling problem for reversible neural architectures, finding the optimal training schedule that fully leverages the computation and memory resources on hardware accelerators. Lastly, optimizer fusion allows users to accelerate the training process in the eager execution mode of machine learning frameworks. It leverages the better locality on hardware and parallelism in the computation graphs. Throughout the dissertation, we emphasize the integration of efficient algorithms and compilation into a cohesive machine learning software stack. We also consider hardware properties to provide hardware-friendly software designs. We demonstrate the effectiveness of the proposed methods in algorithm and compilation through extensive experiments. Our approaches effectively reduce the time and energy required for both training and inference. Ultimately, our methods have the potential to empower machine learning practitioners and researchers to build more efficient, powerful, robust, scalable, and accessible machine learning solutions

Efficient Processing of Deep Neural Networks

2022-05-31 Vivienne Sze

Author : Vivienne Sze
Release : 2022-05-31
Genre : Technology & Engineering
Kind :
Book Rating : 668/5 ( reviews)

Efficient Processing of Deep Neural Networks - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Efficient Processing of Deep Neural Networks write by Vivienne Sze. This book was released on 2022-05-31. Efficient Processing of Deep Neural Networks available in PDF, EPUB and Kindle. This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.

Machine Learning Algorithm and System Co-design for Hardware Efficiency

2023 Cheng Fu

Author : Cheng Fu
Release : 2023
Genre :
Kind :
Book Rating : /5 ( reviews)

Machine Learning Algorithm and System Co-design for Hardware Efficiency - read free eBook in online reader or directly download on the web page. Select files or add your book in reader. Download and read online ebook Machine Learning Algorithm and System Co-design for Hardware Efficiency write by Cheng Fu. This book was released on 2023. Machine Learning Algorithm and System Co-design for Hardware Efficiency available in PDF, EPUB and Kindle. Deep Neural Networks (DNNs) are increasingly adopted in various fields due to their unprecedented performance. Yet, the computation overhead of DNN evaluation and training continues to grow exponentially. Enormous system-level advancements have recently been witnessed to improve the efficiency of DNN computation; many efficient DNN algorithms are proposed to reduce the cost of DNN computation. However, this is far from optimal. Most current DNN computation systems do not fully exploit efficient ML algorithms, while ML algorithms fail to consider novel systems for deployment. This thesis focuses on designing efficient DNN algorithms and DNN computation systems to enable fast DNN training and evaluation. Instead of solely focusing on creating either new DNN algorithms or systems, we propose a co-design approach by exploring DNN models that are system-aware and DNN computation systems that are algorithm-aware. By leveraging such a co-design approach, we effectively advance the Pareto-frontier between task accuracy and efficiency of DNN execution in various application domains. We present a set of works that explore the co-design method. Firstly, we present an algorithm-aware DNN compiler for quantized DNN. By leveraging the weight repetition feature of this efficient DNN algorithm, we can greatly reduce the computation overhead of DNN inference on both CPU and GPU. This work illustrates how algorithm-aware system design can help in pushing the Pareto-frontier. Secondly, we discuss a hardware-aware DNN algorithm with enhanced model parallelism. We observe that previous works design efficient DNNs for single-device platforms. When customizing the DNN design for a multi-device system, we can reduce the DNN inference latency by a large margin while previous models can hardly be parallelized across multiple devices. Thirdly, we present a hardware-friendly transfer learning framework for natural language processing tasks. The existing transfer learning frameworks have a lot of computation redundancy when deploying on the existing systems. By reusing the computation of different transfer learning models, we can greatly reduce the computation overhead as well. Lastly, we introduce a novel training method to reduce the computation cost of DNN training and DNN design process. The key idea is to initialize the large models using small pretrained weights. The implicit knowledge in the pretrained models facilitates faster convergence of the large models. Besides, changing only the initialization phase of training means no extra computation overhead will be introduced to the existing training systems. Also, this new training method can be applied to accelerate the design process of system-aware DNN models. As Moore's Law is slowing down, the computational capacity of current DNN systems is plateauing. This thesis sheds light on how to overcome this limitation by designing domain-specific DNN algorithms and computation systems.

You may also like...