Artificial intelligence hardware design : (Record no. 88814)

000 -LEADER
fixed length control field 11393cam a22004937a 4500
003 - CONTROL NUMBER IDENTIFIER
control field CITU
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20240926125758.0
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS--GENERAL INFORMATION
fixed length control field m o d
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field cr un|---aucuu
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 240926b ||||| |||| 00| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119810452
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119810483
Qualifying information (electronic bk. : oBook)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 1119810485
Qualifying information (electronic bk. : oBook)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119810469
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 1119810469
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119810476
Qualifying information (electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 1119810477
Qualifying information (electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN 1119810450
024 7# - OTHER STANDARD IDENTIFIER
Standard number or code 10.1002/9781119810483
Source of number or code doi
035 ## - SYSTEM CONTROL NUMBER
System control number (OCoLC)1265465568
Canceled/invalid control number (OCoLC)1265344168
037 ## - SOURCE OF ACQUISITION
Stock number 9536220
Source of stock number/acquisition IEEE
040 ## - CATALOGING SOURCE
Original cataloging agency EBLCP
Language of cataloging eng
Description conventions rda
Transcribing agency EBLCP
Modifying agency YDX
-- DG1
-- OCLCO
-- IEEEE
-- OCLCF
-- UKAHL
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title eng
050 #4 - LIBRARY OF CONGRESS CALL NUMBER
Classification number QA76.87
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 006.3/2
Edition number 23
100 1# - MAIN ENTRY--PERSONAL NAME
Preferred name for the person Liu, Albert (Chun-Chen)
245 10 - TITLE STATEMENT
Title Artificial intelligence hardware design :
Remainder of title challenges and solutions /
Statement of responsibility, etc Albert Chun Chen Liu and Oscar Ming Kin Law.
264 #1 - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Hoboken :
Name of publisher, distributor, etc IEEE Press :
-- Wiley,
Date of publication, distribution, etc 2021.
300 ## - PHYSICAL DESCRIPTION
Extent 1 online resource (233 pages)
336 ## - CONTENT TYPE
Content type term text
Content type code txt
Source rdacontent.
337 ## - MEDIA TYPE
Media type term computer
Media type code c
Source rdamedia.
338 ## - CARRIER TYPE
Carrier type term online resource
Carrier type code cr
Source rdacarrier.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc Includes bibliographical references and index.
505 0# - CONTENTS
Formatted contents note Table of Contents<br/>Author Biographies xi<br/><br/>Preface xiii<br/><br/>Acknowledgments xv<br/><br/>Table of Figures xvii<br/><br/>1 Introduction 1<br/><br/>1.1 Development History 2<br/><br/>1.2 Neural Network Models 4<br/><br/>1.3 Neural Network Classification 4<br/><br/>1.3.1 Supervised Learning 4<br/><br/>1.3.2 Semi-supervised Learning 5<br/><br/>1.3.3 Unsupervised Learning 6<br/><br/>1.4 Neural Network Framework 6<br/><br/>1.5 Neural Network Comparison 10<br/><br/>Exercise 11<br/><br/>References 12<br/><br/>2 Deep Learning 13<br/><br/>2.1 Neural Network Layer 13<br/><br/>2.1.1 Convolutional Layer 13<br/><br/>2.1.2 Activation Layer 17<br/><br/>2.1.3 Pooling Layer 18<br/><br/>2.1.4 Normalization Layer 19<br/><br/>2.1.5 Dropout Layer 20<br/><br/>2.1.6 Fully Connected Layer 20<br/><br/>2.2 Deep Learning Challenges 22<br/><br/>Exercise 22<br/><br/>References 24<br/><br/>3 Parallel Architecture 25<br/><br/>3.1 Intel Central Processing Unit (CPU) 25<br/><br/>3.1.1 Skylake Mesh Architecture 27<br/><br/>3.1.2 Intel Ultra Path Interconnect (UPI) 28<br/><br/>3.1.3 Sub Non-unified Memory Access Clustering (SNC) 29<br/><br/>3.1.4 Cache Hierarchy Changes 31<br/><br/>3.1.5 Single/Multiple Socket Parallel Processing 32<br/><br/>3.1.6 Advanced Vector Software Extension 33<br/><br/>3.1.7 Math Kernel Library for Deep Neural Network (MKL-DNN) 34<br/><br/>3.2 NVIDIA Graphics Processing Unit (GPU) 39<br/><br/>3.2.1 Tensor Core Architecture 41<br/><br/>3.2.2 Winograd Transform 44<br/><br/>3.2.3 Simultaneous Multithreading (SMT) 45<br/><br/>3.2.4 High Bandwidth Memory (HBM2) 46<br/><br/>3.2.5 NVLink2 Configuration 47<br/><br/>3.3 NVIDIA Deep Learning Accelerator (NVDLA) 49<br/><br/>3.3.1 Convolution Operation 50<br/><br/>3.3.2 Single Data Point Operation 50<br/><br/>3.3.3 Planar Data Operation 50<br/><br/>3.3.4 Multiplane Operation 50<br/><br/>3.3.5 Data Memory and Reshape Operations 51<br/><br/>3.3.6 System Configuration 51<br/><br/>3.3.7 External Interface 52<br/><br/>3.3.8 Software Design 52<br/><br/>3.4 Google Tensor Processing Unit (TPU) 53<br/><br/>3.4.1 System Architecture 53<br/><br/>3.4.2 Multiply–Accumulate (MAC) Systolic Array 55<br/><br/>3.4.3 New Brain Floating-Point Format 55<br/><br/>3.4.4 Performance Comparison 57<br/><br/>3.4.5 Cloud TPU Configuration 58<br/><br/>3.4.6 Cloud Software Architecture 60<br/><br/>3.5 Microsoft Catapult Fabric Accelerator 61<br/><br/>3.5.1 System Configuration 64<br/><br/>3.5.2 Catapult Fabric Architecture 65<br/><br/>3.5.3 Matrix-Vector Multiplier 65<br/><br/>3.5.4 Hierarchical Decode and Dispatch (HDD) 67<br/><br/>3.5.5 Sparse Matrix-Vector Multiplication 68<br/><br/>Exercise 70<br/><br/>References 71<br/><br/>4 Streaming Graph Theory 73<br/><br/>4.1 Blaize Graph Streaming Processor 73<br/><br/>4.1.1 Stream Graph Model 73<br/><br/>4.1.2 Depth First Scheduling Approach 75<br/><br/>4.1.3 Graph Streaming Processor Architecture 76<br/><br/>4.2 Graphcore Intelligence Processing Unit 79<br/><br/>4.2.1 Intelligence Processor Unit Architecture 79<br/><br/>4.2.2 Accumulating Matrix Product (AMP) Unit 79<br/><br/>4.2.3 Memory Architecture 79<br/><br/>4.2.4 Interconnect Architecture 79<br/><br/>4.2.5 Bulk Synchronous Parallel Model 81<br/><br/>Exercise 83<br/><br/>References 84<br/><br/>5 Convolution Optimization 85<br/><br/>5.1 Deep Convolutional Neural Network Accelerator 85<br/><br/>5.1.1 System Architecture 86<br/><br/>5.1.2 Filter Decomposition 87<br/><br/>5.1.3 Streaming Architecture 90<br/><br/>5.1.3.1 Filter Weights Reuse 90<br/><br/>5.1.3.2 Input Channel Reuse 92<br/><br/>5.1.4 Pooling 92<br/><br/>5.1.4.1 Average Pooling 92<br/><br/>5.1.4.2 Max Pooling 93<br/><br/>5.1.5 Convolution Unit (CU) Engine 94<br/><br/>5.1.6 Accumulation (ACCU) Buffer 94<br/><br/>5.1.7 Model Compression 95<br/><br/>5.1.8 System Performance 95<br/><br/>5.2 Eyeriss Accelerator 97<br/><br/>5.2.1 Eyeriss System Architecture 97<br/><br/>5.2.2 2D Convolution to 1D Multiplication 98<br/><br/>5.2.3 Stationary Dataflow 99<br/><br/>5.2.3.1 Output Stationary 99<br/><br/>5.2.3.2 Weight Stationary 101<br/><br/>5.2.3.3 Input Stationary 101<br/><br/>5.2.4 Row Stationary (RS) Dataflow 104<br/><br/>5.2.4.1 Filter Reuse 104<br/><br/>5.2.4.2 Input Feature Maps Reuse 106<br/><br/>5.2.4.3 Partial Sums Reuse 106<br/><br/>5.2.5 Run-Length Compression (RLC) 106<br/><br/>5.2.6 Global Buffer 108<br/><br/>5.2.7 Processing Element Architecture 108<br/><br/>5.2.8 Network-on- Chip (NoC) 108<br/><br/>5.2.9 Eyeriss v2 System Architecture 112<br/><br/>5.2.10 Hierarchical Mesh Network 116<br/><br/>5.2.10.1 Input Activation HM-NoC 118<br/><br/>5.2.10.2 Filter Weight HM-NoC 118<br/><br/>5.2.10.3 Partial Sum HM-NoC 119<br/><br/>5.2.11 Compressed Sparse Column Format 120<br/><br/>5.2.12 Row Stationary Plus (RS+) Dataflow 122<br/><br/>5.2.13 System Performance 123<br/><br/>Exercise 125<br/><br/>References 125<br/><br/>6 In-Memory Computation 127<br/><br/>6.1 Neurocube Architecture 127<br/><br/>6.1.1 Hybrid Memory Cube (HMC) 127<br/><br/>6.1.2 Memory Centric Neural Computing (MCNC) 130<br/><br/>6.1.3 Programmable Neurosequence Generator (PNG) 131<br/><br/>6.1.4 System Performance 132<br/><br/>6.2 Tetris Accelerator 133<br/><br/>6.2.1 Memory Hierarchy 133<br/><br/>6.2.2 In-Memory Accumulation 133<br/><br/>6.2.3 Data Scheduling 135<br/><br/>6.2.4 Neural Network Vaults Partition 136<br/><br/>6.2.5 System Performance 137<br/><br/>6.3 NeuroStream Accelerator 138<br/><br/>6.3.1 System Architecture 138<br/><br/>6.3.2 NeuroStream Coprocessor 140<br/><br/>6.3.3 4D Tiling Mechanism 140<br/><br/>6.3.4 System Performance 141<br/><br/>Exercise 143<br/><br/>References 143<br/><br/>7 Near-Memory Architecture 145<br/><br/>7.1 DaDianNao Supercomputer 145<br/><br/>7.1.1 Memory Configuration 145<br/><br/>7.1.2 Neural Functional Unit (NFU) 146<br/><br/>7.1.3 System Performance 149<br/><br/>7.2 Cnvlutin Accelerator 150<br/><br/>7.2.1 Basic Operation 151<br/><br/>7.2.2 System Architecture 151<br/><br/>7.2.3 Processing Order 154<br/><br/>7.2.4 Zero-Free Neuron Array Format (ZFNAf) 155<br/><br/>7.2.5 The Dispatcher 155<br/><br/>7.2.6 Network Pruning 157<br/><br/>7.2.7 System Performance 157<br/><br/>7.2.8 Raw or Encoded Format (RoE) 158<br/><br/>7.2.9 Vector Ineffectual Activation Identifier Format (VIAI) 159<br/><br/>7.2.10 Ineffectual Activation Skipping 159<br/><br/>7.2.11 Ineffectual Weight Skipping 161<br/><br/>Exercise 161<br/><br/>References 161<br/><br/>8 Network Sparsity 163<br/><br/>8.1 Energy Efficient Inference Engine (EIE) 163<br/><br/>8.1.1 Leading Nonzero Detection (LNZD) Network 163<br/><br/>8.1.2 Central Control Unit (CCU) 164<br/><br/>8.1.3 Processing Element (PE) 164<br/><br/>8.1.4 Deep Compression 166<br/><br/>8.1.5 Sparse Matrix Computation 167<br/><br/>8.1.6 System Performance 169<br/><br/>8.2 Cambricon-X Accelerator 169<br/><br/>8.2.1 Computation Unit 171<br/><br/>8.2.2 Buffer Controller 171<br/><br/>8.2.3 System Performance 174<br/><br/>8.3 SCNN Accelerator 175<br/><br/>8.3.1 SCNN PT-IS-CP-Dense Dataflow 175<br/><br/>8.3.2 SCNN PT-IS-CP-Sparse Dataflow 177<br/><br/>8.3.3 SCNN Tiled Architecture 178<br/><br/>8.3.4 Processing Element Architecture 179<br/><br/>8.3.5 Data Compression 180<br/><br/>8.3.6 System Performance 180<br/><br/>8.4 SeerNet Accelerator 183<br/><br/>8.4.1 Low-Bit Quantization 183<br/><br/>8.4.2 Efficient Quantization 184<br/><br/>8.4.3 Quantized Convolution 185<br/><br/>8.4.4 Inference Acceleration 186<br/><br/>8.4.5 Sparsity-Mask Encoding 186<br/><br/>8.4.6 System Performance 188<br/><br/>Exercise 188<br/><br/>References 188<br/><br/>9 3D Neural Processing 191<br/><br/>9.1 3D Integrated Circuit Architecture 191<br/><br/>9.2 Power Distribution Network 193<br/><br/>9.3 3D Network Bridge 195<br/><br/>9.3.1 3D Network-on-Chip 195<br/><br/>9.3.2 Multiple-Channel High-Speed Link 195<br/><br/>9.4 Power-Saving Techniques 198<br/><br/>9.4.1 Power Gating 198<br/><br/>9.4.2 Clock Gating 199<br/><br/>Exercise 200<br/><br/>References 201<br/><br/>Appendix A: Neural Network Topology 203<br/><br/>Index 205
520 ## - SUMMARY, ETC.
Summary, etc ARTIFICIAL INTELLIGENCE HARDWARE DESIGN Learn foundational and advanced topics in Neural Processing Unit design with real-world examples from leading voices in the field In Artificial Intelligence Hardware Design: Challenges and Solutions, distinguished researchers and authors Drs. Albert Chun Chen Liu and Oscar Ming Kin Law deliver a rigorous and practical treatment of the design applications of specific circuits and systems for accelerating neural network processing. Beginning with a discussion and explanation of neural networks and their developmental history, the book goes on to describe parallel architectures, streaming graphs for massive parallel computation, and convolution optimization. The authors offer readers an illustration of in-memory computation through Georgia Tech's Neurocube and Stanford's Tetris accelerator using the Hybrid Memory Cube, as well as near-memory architecture through the embedded eDRAM of the Institute of Computing Technology, the Chinese Academy of Science, and other institutions. Readers will also find a discussion of 3D neural processing techniques to support multiple layer neural networks, as well as information like: A thorough introduction to neural networks and neural network development history, as well as Convolutional Neural Network (CNN) models Explorations of various parallel architectures, including the Intel CPU, Nvidia GPU, Google TPU, and Microsoft NPU, emphasizing hardware and software integration for performance improvement Discussions of streaming graph for massive parallel computation with the Blaize GSP and Graphcore IPU An examination of how to optimize convolution with UCLA Deep Convolutional Neural Network accelerator filter decomposition Perfect for hardware and software engineers and firmware developers, Artificial Intelligence Hardware Design is an indispensable resource for anyone working with Neural Processing Units in either a hardware or software capacity.
545 0# - BIOGRAPHICAL OR HISTORICAL DATA
Biographical or historical note About the Author<br/>Albert Chun Chen Liu, PhD, is Chief Executive Officer of Kneron. He is Adjunct Associate Professor at National Tsing Hua University, National Chiao Tung University, and National Cheng Kung University. He has published over 15 IEEE papers and is an IEEE Senior Member. He is a recipient of the IBM Problem Solving Award based on the use of the EIP tool suite in 2007 and IEEE TCAS Darlington award in 2021.<br/><br/>Oscar Ming Kin Law, PhD, is the Director of Engineering at Kneron. He works on smart robot development and in-memory architecture for neural networks. He has over twenty years of experience in the semiconductor industry working with CPU, GPU, and mobile design. He has also published over 60 patents in various areas.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Neural networks (Computer science)
Authority record control number http://id.loc.gov/authorities/subjects/sh90001937.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Artificial intelligence.
Authority record control number http://id.loc.gov/authorities/subjects/sh85008180.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Computer engineering.
Authority record control number http://id.loc.gov/authorities/subjects/sh85029495.
655 #4 - INDEX TERM--GENRE/FORM
Genre/form data or focus term Electronic books.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name Law, Oscar Ming Kin.
856 40 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier https://onlinelibrary.wiley.com/doi/book/10.1002/9781119810483
Link text Full text is available at Wiley Online Library Click here to view
942 ## - ADDED ENTRY ELEMENTS
Source of classification or shelving scheme
Item type EBOOK
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent Location Current Location Date acquired Source of acquisition Inventory number Full call number Barcode Date last seen Price effective from Item type
          COLLEGE LIBRARY COLLEGE LIBRARY 2024-09-26 Megatexts Phil. Inc. 52988 006.32 L7401 2021 CL-52988 2024-09-26 2024-09-26 EBOOK