Normal view MARC view ISBD view

Artificial intelligence hardware design : (Record no. 88814)

000 -LEADER
fixed length control field	11393cam a22004937a 4500
003 - CONTROL NUMBER IDENTIFIER
control field	CITU
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20240926125758.0
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS--GENERAL INFORMATION
fixed length control field	m o d
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field	cr un\|---aucuu
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	240926b \|\|\|\|\| \|\|\|\| 00\| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119810452
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119810483
Qualifying information	(electronic bk. : oBook)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	1119810485
Qualifying information	(electronic bk. : oBook)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119810469
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	1119810469
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119810476
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	1119810477
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN	1119810450
024 7# - OTHER STANDARD IDENTIFIER
Standard number or code	10.1002/9781119810483
Source of number or code	doi
035 ## - SYSTEM CONTROL NUMBER
System control number	(OCoLC)1265465568
Canceled/invalid control number	(OCoLC)1265344168
037 ## - SOURCE OF ACQUISITION
Stock number	9536220
Source of stock number/acquisition	IEEE
040 ## - CATALOGING SOURCE
Original cataloging agency	EBLCP
Language of cataloging	eng
Description conventions	rda
Transcribing agency	EBLCP
Modifying agency	YDX
--	DG1
--	OCLCO
--	IEEEE
--	OCLCF
--	UKAHL
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title	eng
050 #4 - LIBRARY OF CONGRESS CALL NUMBER
Classification number	QA76.87
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	006.3/2
Edition number	23
100 1# - MAIN ENTRY--PERSONAL NAME
Preferred name for the person	Liu, Albert (Chun-Chen)
245 10 - TITLE STATEMENT
Title	Artificial intelligence hardware design :
Remainder of title	challenges and solutions /
Statement of responsibility, etc	Albert Chun Chen Liu and Oscar Ming Kin Law.
264 #1 - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc	Hoboken :
Name of publisher, distributor, etc	IEEE Press :
--	Wiley,
Date of publication, distribution, etc	2021.
300 ## - PHYSICAL DESCRIPTION
Extent	1 online resource (233 pages)
336 ## - CONTENT TYPE
Content type term	text
Content type code	txt
Source	rdacontent.
337 ## - MEDIA TYPE
Media type term	computer
Media type code	c
Source	rdamedia.
338 ## - CARRIER TYPE
Carrier type term	online resource
Carrier type code	cr
Source	rdacarrier.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc	Includes bibliographical references and index.
505 0# - CONTENTS
Formatted contents note	Table of Contents<br/>Author Biographies xi<br/><br/>Preface xiii<br/><br/>Acknowledgments xv<br/><br/>Table of Figures xvii<br/><br/>1 Introduction 1<br/><br/>1.1 Development History 2<br/><br/>1.2 Neural Network Models 4<br/><br/>1.3 Neural Network Classification 4<br/><br/>1.3.1 Supervised Learning 4<br/><br/>1.3.2 Semi-supervised Learning 5<br/><br/>1.3.3 Unsupervised Learning 6<br/><br/>1.4 Neural Network Framework 6<br/><br/>1.5 Neural Network Comparison 10<br/><br/>Exercise 11<br/><br/>References 12<br/><br/>2 Deep Learning 13<br/><br/>2.1 Neural Network Layer 13<br/><br/>2.1.1 Convolutional Layer 13<br/><br/>2.1.2 Activation Layer 17<br/><br/>2.1.3 Pooling Layer 18<br/><br/>2.1.4 Normalization Layer 19<br/><br/>2.1.5 Dropout Layer 20<br/><br/>2.1.6 Fully Connected Layer 20<br/><br/>2.2 Deep Learning Challenges 22<br/><br/>Exercise 22<br/><br/>References 24<br/><br/>3 Parallel Architecture 25<br/><br/>3.1 Intel Central Processing Unit (CPU) 25<br/><br/>3.1.1 Skylake Mesh Architecture 27<br/><br/>3.1.2 Intel Ultra Path Interconnect (UPI) 28<br/><br/>3.1.3 Sub Non-unified Memory Access Clustering (SNC) 29<br/><br/>3.1.4 Cache Hierarchy Changes 31<br/><br/>3.1.5 Single/Multiple Socket Parallel Processing 32<br/><br/>3.1.6 Advanced Vector Software Extension 33<br/><br/>3.1.7 Math Kernel Library for Deep Neural Network (MKL-DNN) 34<br/><br/>3.2 NVIDIA Graphics Processing Unit (GPU) 39<br/><br/>3.2.1 Tensor Core Architecture 41<br/><br/>3.2.2 Winograd Transform 44<br/><br/>3.2.3 Simultaneous Multithreading (SMT) 45<br/><br/>3.2.4 High Bandwidth Memory (HBM2) 46<br/><br/>3.2.5 NVLink2 Configuration 47<br/><br/>3.3 NVIDIA Deep Learning Accelerator (NVDLA) 49<br/><br/>3.3.1 Convolution Operation 50<br/><br/>3.3.2 Single Data Point Operation 50<br/><br/>3.3.3 Planar Data Operation 50<br/><br/>3.3.4 Multiplane Operation 50<br/><br/>3.3.5 Data Memory and Reshape Operations 51<br/><br/>3.3.6 System Configuration 51<br/><br/>3.3.7 External Interface 52<br/><br/>3.3.8 Software Design 52<br/><br/>3.4 Google Tensor Processing Unit (TPU) 53<br/><br/>3.4.1 System Architecture 53<br/><br/>3.4.2 Multiply–Accumulate (MAC) Systolic Array 55<br/><br/>3.4.3 New Brain Floating-Point Format 55<br/><br/>3.4.4 Performance Comparison 57<br/><br/>3.4.5 Cloud TPU Configuration 58<br/><br/>3.4.6 Cloud Software Architecture 60<br/><br/>3.5 Microsoft Catapult Fabric Accelerator 61<br/><br/>3.5.1 System Configuration 64<br/><br/>3.5.2 Catapult Fabric Architecture 65<br/><br/>3.5.3 Matrix-Vector Multiplier 65<br/><br/>3.5.4 Hierarchical Decode and Dispatch (HDD) 67<br/><br/>3.5.5 Sparse Matrix-Vector Multiplication 68<br/><br/>Exercise 70<br/><br/>References 71<br/><br/>4 Streaming Graph Theory 73<br/><br/>4.1 Blaize Graph Streaming Processor 73<br/><br/>4.1.1 Stream Graph Model 73<br/><br/>4.1.2 Depth First Scheduling Approach 75<br/><br/>4.1.3 Graph Streaming Processor Architecture 76<br/><br/>4.2 Graphcore Intelligence Processing Unit 79<br/><br/>4.2.1 Intelligence Processor Unit Architecture 79<br/><br/>4.2.2 Accumulating Matrix Product (AMP) Unit 79<br/><br/>4.2.3 Memory Architecture 79<br/><br/>4.2.4 Interconnect Architecture 79<br/><br/>4.2.5 Bulk Synchronous Parallel Model 81<br/><br/>Exercise 83<br/><br/>References 84<br/><br/>5 Convolution Optimization 85<br/><br/>5.1 Deep Convolutional Neural Network Accelerator 85<br/><br/>5.1.1 System Architecture 86<br/><br/>5.1.2 Filter Decomposition 87<br/><br/>5.1.3 Streaming Architecture 90<br/><br/>5.1.3.1 Filter Weights Reuse 90<br/><br/>5.1.3.2 Input Channel Reuse 92<br/><br/>5.1.4 Pooling 92<br/><br/>5.1.4.1 Average Pooling 92<br/><br/>5.1.4.2 Max Pooling 93<br/><br/>5.1.5 Convolution Unit (CU) Engine 94<br/><br/>5.1.6 Accumulation (ACCU) Buffer 94<br/><br/>5.1.7 Model Compression 95<br/><br/>5.1.8 System Performance 95<br/><br/>5.2 Eyeriss Accelerator 97<br/><br/>5.2.1 Eyeriss System Architecture 97<br/><br/>5.2.2 2D Convolution to 1D Multiplication 98<br/><br/>5.2.3 Stationary Dataflow 99<br/><br/>5.2.3.1 Output Stationary 99<br/><br/>5.2.3.2 Weight Stationary 101<br/><br/>5.2.3.3 Input Stationary 101<br/><br/>5.2.4 Row Stationary (RS) Dataflow 104<br/><br/>5.2.4.1 Filter Reuse 104<br/><br/>5.2.4.2 Input Feature Maps Reuse 106<br/><br/>5.2.4.3 Partial Sums Reuse 106<br/><br/>5.2.5 Run-Length Compression (RLC) 106<br/><br/>5.2.6 Global Buffer 108<br/><br/>5.2.7 Processing Element Architecture 108<br/><br/>5.2.8 Network-on- Chip (NoC) 108<br/><br/>5.2.9 Eyeriss v2 System Architecture 112<br/><br/>5.2.10 Hierarchical Mesh Network 116<br/><br/>5.2.10.1 Input Activation HM-NoC 118<br/><br/>5.2.10.2 Filter Weight HM-NoC 118<br/><br/>5.2.10.3 Partial Sum HM-NoC 119<br/><br/>5.2.11 Compressed Sparse Column Format 120<br/><br/>5.2.12 Row Stationary Plus (RS+) Dataflow 122<br/><br/>5.2.13 System Performance 123<br/><br/>Exercise 125<br/><br/>References 125<br/><br/>6 In-Memory Computation 127<br/><br/>6.1 Neurocube Architecture 127<br/><br/>6.1.1 Hybrid Memory Cube (HMC) 127<br/><br/>6.1.2 Memory Centric Neural Computing (MCNC) 130<br/><br/>6.1.3 Programmable Neurosequence Generator (PNG) 131<br/><br/>6.1.4 System Performance 132<br/><br/>6.2 Tetris Accelerator 133<br/><br/>6.2.1 Memory Hierarchy 133<br/><br/>6.2.2 In-Memory Accumulation 133<br/><br/>6.2.3 Data Scheduling 135<br/><br/>6.2.4 Neural Network Vaults Partition 136<br/><br/>6.2.5 System Performance 137<br/><br/>6.3 NeuroStream Accelerator 138<br/><br/>6.3.1 System Architecture 138<br/><br/>6.3.2 NeuroStream Coprocessor 140<br/><br/>6.3.3 4D Tiling Mechanism 140<br/><br/>6.3.4 System Performance 141<br/><br/>Exercise 143<br/><br/>References 143<br/><br/>7 Near-Memory Architecture 145<br/><br/>7.1 DaDianNao Supercomputer 145<br/><br/>7.1.1 Memory Configuration 145<br/><br/>7.1.2 Neural Functional Unit (NFU) 146<br/><br/>7.1.3 System Performance 149<br/><br/>7.2 Cnvlutin Accelerator 150<br/><br/>7.2.1 Basic Operation 151<br/><br/>7.2.2 System Architecture 151<br/><br/>7.2.3 Processing Order 154<br/><br/>7.2.4 Zero-Free Neuron Array Format (ZFNAf) 155<br/><br/>7.2.5 The Dispatcher 155<br/><br/>7.2.6 Network Pruning 157<br/><br/>7.2.7 System Performance 157<br/><br/>7.2.8 Raw or Encoded Format (RoE) 158<br/><br/>7.2.9 Vector Ineffectual Activation Identifier Format (VIAI) 159<br/><br/>7.2.10 Ineffectual Activation Skipping 159<br/><br/>7.2.11 Ineffectual Weight Skipping 161<br/><br/>Exercise 161<br/><br/>References 161<br/><br/>8 Network Sparsity 163<br/><br/>8.1 Energy Efficient Inference Engine (EIE) 163<br/><br/>8.1.1 Leading Nonzero Detection (LNZD) Network 163<br/><br/>8.1.2 Central Control Unit (CCU) 164<br/><br/>8.1.3 Processing Element (PE) 164<br/><br/>8.1.4 Deep Compression 166<br/><br/>8.1.5 Sparse Matrix Computation 167<br/><br/>8.1.6 System Performance 169<br/><br/>8.2 Cambricon-X Accelerator 169<br/><br/>8.2.1 Computation Unit 171<br/><br/>8.2.2 Buffer Controller 171<br/><br/>8.2.3 System Performance 174<br/><br/>8.3 SCNN Accelerator 175<br/><br/>8.3.1 SCNN PT-IS-CP-Dense Dataflow 175<br/><br/>8.3.2 SCNN PT-IS-CP-Sparse Dataflow 177<br/><br/>8.3.3 SCNN Tiled Architecture 178<br/><br/>8.3.4 Processing Element Architecture 179<br/><br/>8.3.5 Data Compression 180<br/><br/>8.3.6 System Performance 180<br/><br/>8.4 SeerNet Accelerator 183<br/><br/>8.4.1 Low-Bit Quantization 183<br/><br/>8.4.2 Efficient Quantization 184<br/><br/>8.4.3 Quantized Convolution 185<br/><br/>8.4.4 Inference Acceleration 186<br/><br/>8.4.5 Sparsity-Mask Encoding 186<br/><br/>8.4.6 System Performance 188<br/><br/>Exercise 188<br/><br/>References 188<br/><br/>9 3D Neural Processing 191<br/><br/>9.1 3D Integrated Circuit Architecture 191<br/><br/>9.2 Power Distribution Network 193<br/><br/>9.3 3D Network Bridge 195<br/><br/>9.3.1 3D Network-on-Chip 195<br/><br/>9.3.2 Multiple-Channel High-Speed Link 195<br/><br/>9.4 Power-Saving Techniques 198<br/><br/>9.4.1 Power Gating 198<br/><br/>9.4.2 Clock Gating 199<br/><br/>Exercise 200<br/><br/>References 201<br/><br/>Appendix A: Neural Network Topology 203<br/><br/>Index 205
520 ## - SUMMARY, ETC.
Summary, etc	ARTIFICIAL INTELLIGENCE HARDWARE DESIGN Learn foundational and advanced topics in Neural Processing Unit design with real-world examples from leading voices in the field In Artificial Intelligence Hardware Design: Challenges and Solutions, distinguished researchers and authors Drs. Albert Chun Chen Liu and Oscar Ming Kin Law deliver a rigorous and practical treatment of the design applications of specific circuits and systems for accelerating neural network processing. Beginning with a discussion and explanation of neural networks and their developmental history, the book goes on to describe parallel architectures, streaming graphs for massive parallel computation, and convolution optimization. The authors offer readers an illustration of in-memory computation through Georgia Tech's Neurocube and Stanford's Tetris accelerator using the Hybrid Memory Cube, as well as near-memory architecture through the embedded eDRAM of the Institute of Computing Technology, the Chinese Academy of Science, and other institutions. Readers will also find a discussion of 3D neural processing techniques to support multiple layer neural networks, as well as information like: A thorough introduction to neural networks and neural network development history, as well as Convolutional Neural Network (CNN) models Explorations of various parallel architectures, including the Intel CPU, Nvidia GPU, Google TPU, and Microsoft NPU, emphasizing hardware and software integration for performance improvement Discussions of streaming graph for massive parallel computation with the Blaize GSP and Graphcore IPU An examination of how to optimize convolution with UCLA Deep Convolutional Neural Network accelerator filter decomposition Perfect for hardware and software engineers and firmware developers, Artificial Intelligence Hardware Design is an indispensable resource for anyone working with Neural Processing Units in either a hardware or software capacity.
545 0# - BIOGRAPHICAL OR HISTORICAL DATA
Biographical or historical note	About the Author<br/>Albert Chun Chen Liu, PhD, is Chief Executive Officer of Kneron. He is Adjunct Associate Professor at National Tsing Hua University, National Chiao Tung University, and National Cheng Kung University. He has published over 15 IEEE papers and is an IEEE Senior Member. He is a recipient of the IBM Problem Solving Award based on the use of the EIP tool suite in 2007 and IEEE TCAS Darlington award in 2021.<br/><br/>Oscar Ming Kin Law, PhD, is the Director of Engineering at Kneron. He works on smart robot development and in-memory architecture for neural networks. He has over twenty years of experience in the semiconductor industry working with CPU, GPU, and mobile design. He has also published over 60 patents in various areas.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Neural networks (Computer science)
Authority record control number	http://id.loc.gov/authorities/subjects/sh90001937.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Artificial intelligence.
Authority record control number	http://id.loc.gov/authorities/subjects/sh85008180.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Computer engineering.
Authority record control number	http://id.loc.gov/authorities/subjects/sh85029495.
655 #4 - INDEX TERM--GENRE/FORM
Genre/form data or focus term	Electronic books.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Law, Oscar Ming Kin.
856 40 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	https://onlinelibrary.wiley.com/doi/book/10.1002/9781119810483
Link text	Full text is available at Wiley Online Library Click here to view
942 ## - ADDED ENTRY ELEMENTS
Source of classification or shelving scheme
Item type	EBOOK

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Permanent Location	Current Location	Date acquired	Source of acquisition	Inventory number	Full call number	Barcode	Date last seen	Price effective from	Item type
					COLLEGE LIBRARY	COLLEGE LIBRARY	2024-09-26	Megatexts Phil. Inc.	52988	006.32 L7401 2021	CL-52988	2024-09-26	2024-09-26	EBOOK