Normal view MARC view ISBD view

Model-based reinforcement learning : (Record no. 88731)

000 -LEADER
fixed length control field	21245cam a2200481 i 4500
003 - CONTROL NUMBER IDENTIFIER
control field	CITU
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20240923144046.0
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS--GENERAL INFORMATION
fixed length control field	m o d
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field	cr \|n\|\|\|\|\|\|\|\|\|
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	240923b \|\|\|\|\| \|\|\|\| 00\| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119808572
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119808589
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	1119808588
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119808596
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	1119808596
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	9781119808602
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number	111980860X
Qualifying information	(electronic bk.)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN	111980857X
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN	9781119808572
024 7# - OTHER STANDARD IDENTIFIER
Standard number or code	10.1002/9781119808602
Source of number or code	doi
035 ## - SYSTEM CONTROL NUMBER
System control number	(OCoLC)1352957082
037 ## - SOURCE OF ACQUISITION
Stock number	9979000
Source of stock number/acquisition	IEEE
040 ## - CATALOGING SOURCE
Original cataloging agency	YDX
Language of cataloging	eng
Description conventions	rda
--	pn
Transcribing agency	YDX
Modifying agency	IEEEE
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title	eng.
050 #4 - LIBRARY OF CONGRESS CALL NUMBER
Classification number	Q325.6
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	006.3/1
Edition number	23/eng/20221227
100 1# - MAIN ENTRY--PERSONAL NAME
Preferred name for the person	Farsi, Milad,
Authority record control number	https://id.loc.gov/authorities/names/n2022058722
Relator term	author.
245 10 - TITLE STATEMENT
Title	Model-based reinforcement learning :
Remainder of title	from data to actions with Python-based toolbox /
Statement of responsibility, etc	Milad Farsi, Jun Liu.
264 #1 - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc	United States:
Name of publisher, distributor, etc	Wiley-Blackwell,
Date of publication, distribution, etc	2023.
300 ## - PHYSICAL DESCRIPTION
Extent	1 online resource.
336 ## - CONTENT TYPE
Content type term	text
Content type code	txt
Source	rdacontent.
337 ## - MEDIA TYPE
Media type term	computer
Media type code	c
Source	rdamedia.
338 ## - CARRIER TYPE
Carrier type term	online resource
Carrier type code	cr
Source	rdacarrier.
505 0# - CONTENTS
Formatted contents note	Table of Contents<br/>Preface xv<br/>Acronyms xix<br/>Introduction xxiii<br/>I.1 Background and Motivation<br/>I.2 Literature Review <br/>1 Nonlinear Systems Analysis 1<br/>1.1 Notation<br/>1.2 Nonlinear Dynamical Systems<br/>1.2.1 Remarks on Existence, Uniqueness, and Continua-<br/>tion of Solutions<br/>1.3 Lyapunov Analysis of Stability<br/>1.4 Stability Analysis of Discrete-Time Dynamical Systems<br/>1.5 Summary <br/>2 Optimal Control 17<br/>2.1 Problem Formulation<br/>2.2 Dynamic Programming <br/>2.2.1 Principle of Optimality <br/>2.2.2 Hamilton{Jacobi{Bellman Equation<br/>2.2.3 A Sucient Condition for Optimality<br/>vii<br/>2.2.4 Innite-Horizon Problems <br/>2.3 Linear Quadratic Regulator<br/>2.3.1 Dierential Riccati Equation<br/>2.3.2 Algebraic Riccati Equation<br/>2.3.3 Convergence of Solutions to the Dierential Riccati<br/>Equation<br/>2.3.4 Forward Propagation of the Dierential Riccati Equa-<br/>tion for Linear Quadratic Regulator<br/>2.4 Summary<br/>3 Reinforcement Learning <br/>3.1 Control-Ane Systems with Quadratic Costs<br/>3.2 Exact Policy Iteration<br/>3.2.1 Linear Quadratic Regulator <br/>3.3 Policy Iteration with Unknown Dynamics and Function Approx-<br/>imations <br/>3.3.1 Linear Quadratic Regulator with Unknown Dynamics <br/>3.4 Summary <br/>4 Learning of Dynamic Models <br/>4.1 Introduction <br/>4.2 Model Selection <br/>4.2.1 Grey-Box vs. Black-Box<br/>4.2.2 Parametric vs. Non-Parametric<br/>4.3 Parametric Model <br/>viii<br/>4.3.1 Model in Terms of Bases<br/>4.3.2 Data Collection<br/>4.3.3 Learning of Control Systems <br/>4.4 Parametric Learning Algorithms <br/>4.4.1 Least Squares <br/>4.4.2 Recursive Least Squares <br/>4.4.3 Gradient Descent<br/>4.4.4 Sparse Regression <br/>4.5 Persistence of Excitation<br/>4.6 Python Toolbox <br/>4.6.1 Congurations <br/>4.6.2 Model Upadte<br/>4.6.3 Model Validation <br/>4.7 Comparison Results <br/>4.7.1 Convergence of Parameters<br/>4.7.2 Error Analysis <br/>4.7.3 Runtime Results<br/>4.8 Summary<br/>5 Structured Online Learning-Based Control of Continuous-Time<br/>Nonlinear Systems <br/>5.1 Introduction <br/>5.2 A Structured Approximate Optimal Control Framework<br/>5.3 Local Stability and Optimality Analysis <br/>ix<br/>5.3.1 Linear Quadratic Regulator : : : : : : : : : : : : : : : 118<br/>5.3.2 SOL Control : : : : : : : : : : : : : : : : : : : : : : : 120<br/>5.4 SOL Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : 121<br/>5.4.1 ODE Solver and Control Update : : : : : : : : : : : 122<br/>5.4.2 Identied Model Update : : : : : : : : : : : : : : : : 123<br/>5.4.3 Database Update : : : : : : : : : : : : : : : : : : : : 124<br/>5.4.4 Limitations and Implementation Considerations : : : 126<br/>5.4.5 Asymptotic Convergence with Approximate Dynamics 127<br/>5.5 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : 128<br/>5.5.1 Systems Identiable in Terms of a Given Set of Bases 129<br/>5.5.2 Systems to Be Approximated by a Given Set of Bases 131<br/>5.5.3 Comparison Results : : : : : : : : : : : : : : : : : : : 138<br/>5.6 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 142<br/>6 A Structured Online Learning Approach to Nonlinear Track-<br/>ing with Unknown Dynamics 147<br/>6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 147<br/>6.2 A Structured Online Learning for Tracking Control : : : : : : : 148<br/>6.2.1 Stability and Optimality in the Linear Case : : : : : 155<br/>6.3 Learning-based Tracking Control Using SOL : : : : : : : : : : : 160<br/>6.4 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : 162<br/>6.4.1 Tracking Control of the Pendulum : : : : : : : : : : 163<br/>6.4.2 Synchronization of Chaotic Lorenz System : : : : : : 164<br/>6.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 167<br/>x<br/>7 Piecewise Learning and Control with Stability Guarantees 171<br/>7.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 171<br/>7.2 Problem Formulation : : : : : : : : : : : : : : : : : : : : : : : : : 173<br/>7.3 The Piecewise Learning and Control Framework : : : : : : : : : 173<br/>7.3.1 System Identication : : : : : : : : : : : : : : : : : : 174<br/>7.3.2 Database : : : : : : : : : : : : : : : : : : : : : : : : : 176<br/>7.3.3 Feedback Control : : : : : : : : : : : : : : : : : : : : 177<br/>7.4 Analysis of Uncertainty Bounds : : : : : : : : : : : : : : : : : : : 178<br/>7.4.1 Quadratic Programs for Bounding Errors : : : : : : : 180<br/>7.5 Stability Verication for Piecewise-Ane Learning and Control 185<br/>7.5.1 Piecewise Ane Models : : : : : : : : : : : : : : : : 185<br/>7.5.2 MIQP-based Stability Verication of PWA Systems 185<br/>7.5.3 Convergence of ACCPM : : : : : : : : : : : : : : : : 191<br/>7.6 Numerical Results : : : : : : : : : : : : : : : : : : : : : : : : : : : 193<br/>7.6.1 Pendulum System : : : : : : : : : : : : : : : : : : : : 193<br/>7.6.2 Dynamic Vehicle System with Skidding : : : : : : : : 197<br/>7.6.3 Comparison of Runtime Results : : : : : : : : : : : : 201<br/>7.7 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 201<br/>8 An Application to Solar Photovoltaic Systems 203<br/>8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203<br/>8.2 Problem Statement : : : : : : : : : : : : : : : : : : : : : : : : : : 208<br/>8.2.1 PV Array Model : : : : : : : : : : : : : : : : : : : : : 209<br/>8.2.2 DC-DC Boost Converter : : : : : : : : : : : : : : : : 211<br/>xi<br/>8.3 Optimal Control of PV Array : : : : : : : : : : : : : : : : : : : : 214<br/>8.3.1 Maximum Power Point Tracking Control : : : : : : : 217<br/>8.3.2 Reference Voltage Tracking Control : : : : : : : : : 226<br/>8.3.3 Piecewise Learning Control : : : : : : : : : : : : : : : 228<br/>8.4 Application Considerations : : : : : : : : : : : : : : : : : : : : : 229<br/>8.4.1 Partial Derivative Approximation Procedure : : : : : 230<br/>8.4.2 Partial Shading Eect : : : : : : : : : : : : : : : : : : 235<br/>8.5 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : 236<br/>8.5.1 Model and Control Verication : : : : : : : : : : : : 239<br/>8.5.2 Comparative Results : : : : : : : : : : : : : : : : : : : 239<br/>8.5.3 Model-Free Approach Results : : : : : : : : : : : : : 242<br/>8.5.4 Piecewise Learning Results : : : : : : : : : : : : : : : 243<br/>8.5.5 Partial Shading Results : : : : : : : : : : : : : : : : : 245<br/>8.6 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 246<br/>9 An Application to Low-Level Control of Quadrotors 255<br/>9.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 255<br/>9.2 Quadrotor Model : : : : : : : : : : : : : : : : : : : : : : : : : : : 259<br/>9.3 Structured Online Learning with RLS Identier on Quadrotor : 261<br/>9.3.1 Learning Procedure : : : : : : : : : : : : : : : : : : : 261<br/>9.3.2 Asymptotic Convergence with Uncertain Dynamics : 269<br/>9.3.3 Computational Properties : : : : : : : : : : : : : : : 272<br/>9.4 Numerical Results : : : : : : : : : : : : : : : : : : : : : : : : : : : 272<br/>9.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 275<br/>xii<br/>10 Python Toolbox 277<br/>10.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 277<br/>10.2 User Inputs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 278<br/>10.2.1 Process : : : : : : : : : : : : : : : : : : : : : : : : : : 278<br/>10.2.2 Objective : : : : : : : : : : : : : : : : : : : : : : : : : 280<br/>10.3 SOL : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 281<br/>10.3.1 Model Update : : : : : : : : : : : : : : : : : : : : : : 281<br/>10.3.2 Database : : : : : : : : : : : : : : : : : : : : : : : : : 282<br/>10.3.3 Library : : : : : : : : : : : : : : : : : : : : : : : : : : : 283<br/>10.3.4 Control : : : : : : : : : : : : : : : : : : : : : : : : : : 284<br/>10.4 Display and Outputs : : : : : : : : : : : : : : : : : : : : : : : : : 286<br/>10.4.1 Graphs and Printouts : : : : : : : : : : : : : : : : : : 286<br/>10.4.2 3D Simulation : : : : : : : : : : : : : : : : : : : : : : 288<br/>10.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 289<br/>11 Appendix 291<br/>11.1 Supplementary Analysis of Remark 5.4 : : : : : : : : : : : : : : : 291<br/>11.2 Supplementary Analysis of Remark 5.5 : : : : : : : : : : : : : : : 302<br/>Bibliography 303<br/>xiii<br/> Preface xv<br/>Acronyms xix<br/>Introduction xxiii<br/>I.1 Background and Motivation : : : : : : : : : : : : : : : : : : : : : xxiii<br/>I.2 Literature Review : : : : : : : : : : : : : : : : : : : : : : : : : : : xxix<br/>1 Nonlinear Systems Analysis 1<br/>1.1 Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1<br/>1.2 Nonlinear Dynamical Systems : : : : : : : : : : : : : : : : : : : : 3<br/>1.2.1 Remarks on Existence, Uniqueness, and Continua-<br/>tion of Solutions : : : : : : : : : : : : : : : : : : : : : 3<br/>1.3 Lyapunov Analysis of Stability : : : : : : : : : : : : : : : : : : : : 5<br/>1.4 Stability Analysis of Discrete-Time Dynamical Systems : : : : : 11<br/>1.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15<br/>2 Optimal Control 17<br/>2.1 Problem Formulation : : : : : : : : : : : : : : : : : : : : : : : : : 17<br/>2.2 Dynamic Programming : : : : : : : : : : : : : : : : : : : : : : : : 19<br/>2.2.1 Principle of Optimality : : : : : : : : : : : : : : : : : 19<br/>2.2.2 Hamilton{Jacobi{Bellman Equation : : : : : : : : : : 22<br/>2.2.3 A Sucient Condition for Optimality : : : : : : : : : 23<br/>vii<br/>2.2.4 Innite-Horizon Problems : : : : : : : : : : : : : : : : 25<br/>2.3 Linear Quadratic Regulator : : : : : : : : : : : : : : : : : : : : : 28<br/>2.3.1 Dierential Riccati Equation : : : : : : : : : : : : : : 28<br/>2.3.2 Algebraic Riccati Equation : : : : : : : : : : : : : : : 36<br/>2.3.3 Convergence of Solutions to the Dierential Riccati<br/>Equation : : : : : : : : : : : : : : : : : : : : : : : : : 40<br/>2.3.4 Forward Propagation of the Dierential Riccati Equa-<br/>tion for Linear Quadratic Regulator : : : : : : : : : : 43<br/>2.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 47<br/>3 Reinforcement Learning 49<br/>3.1 Control-Ane Systems with Quadratic Costs : : : : : : : : : : : 50<br/>3.2 Exact Policy Iteration : : : : : : : : : : : : : : : : : : : : : : : : 53<br/>3.2.1 Linear Quadratic Regulator : : : : : : : : : : : : : : : 59<br/>3.3 Policy Iteration with Unknown Dynamics and Function Approx-<br/>imations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62<br/>3.3.1 Linear Quadratic Regulator with Unknown Dynamics 70<br/>3.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 72<br/>4 Learning of Dynamic Models 75<br/>4.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 75<br/>4.2 Model Selection <br/>4.2.1 Grey-Box vs. Black-Box<br/>4.2.2 Parametric vs. Non-Parametric<br/>4.3 Parametric Model <br/>viii<br/>4.3.1 Model in Terms of Bases <br/>4.3.2 Data Collection<br/>4.3.3 Learning of Control Systems<br/>4.4 Parametric Learning Algorithms82<br/>4.4.1 Least Squares <br/>4.4.2 Recursive Least Squares <br/>4.4.3 Gradient Descent <br/>4.4.4 Sparse Regression<br/>4.5 Persistence of Excitation<br/>4.6 Python Toolbox <br/>4.6.1 Congurations<br/>4.6.2 Model Upadte<br/>4.6.3 Model Validation<br/>4.7 Comparison Results<br/>4.7.1 Convergence of Parameters<br/>4.7.2 Error Analysis<br/>4.7.3 Runtime Results<br/>4.8 Summary <br/>5 Structured Online Learning-Based Control of Continuous-Time<br/>Nonlinear Systems<br/>5.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 111<br/>5.2 A Structured Approximate Optimal Control Framework : : : : : 112<br/>5.3 Local Stability and Optimality Analysis : : : : : : : : : : : : : : 117<br/>ix<br/>5.3.1 Linear Quadratic Regulator : : : : : : : : : : : : : : : 118<br/>5.3.2 SOL Control : : : : : : : : : : : : : : : : : : : : : : : 120<br/>5.4 SOL Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : 121<br/>5.4.1 ODE Solver and Control Update : : : : : : : : : : : 122<br/>5.4.2 Identied Model Update : : : : : : : : : : : : : : : : 123<br/>5.4.3 Database Update : : : : : : : : : : : : : : : : : : : : 124<br/>5.4.4 Limitations and Implementation Considerations : : : 126<br/>5.4.5 Asymptotic Convergence with Approximate Dynamics 127<br/>5.5 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : 128<br/>5.5.1 Systems Identiable in Terms of a Given Set of Bases 129<br/>5.5.2 Systems to Be Approximated by a Given Set of Bases 131<br/>5.5.3 Comparison Results : : : : : : : : : : : : : : : : : : : 138<br/>5.6 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 142<br/>6 A Structured Online Learning Approach to Nonlinear Track-<br/>ing with Unknown Dynamics 147<br/>6.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 147<br/>6.2 A Structured Online Learning for Tracking Control : : : : : : : 148<br/>6.2.1 Stability and Optimality in the Linear Case : : : : : 155<br/>6.3 Learning-based Tracking Control Using SOL : : : : : : : : : : : 160<br/>6.4 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : 162<br/>6.4.1 Tracking Control of the Pendulum : : : : : : : : : : 163<br/>6.4.2 Synchronization of Chaotic Lorenz System : : : : : : 164<br/>6.5 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 167<br/>x<br/>7 Piecewise Learning and Control with Stability Guarantees 171<br/>7.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 171<br/>7.2 Problem Formulation : : : : : : : : : : : : : : : : : : : : : : : : : 173<br/>7.3 The Piecewise Learning and Control Framework : : : : : : : : : 173<br/>7.3.1 System Identication : : : : : : : : : : : : : : : : : : 174<br/>7.3.2 Database : : : : : : : : : : : : : : : : : : : : : : : : : 176<br/>7.3.3 Feedback Control : : : : : : : : : : : : : : : : : : : : 177<br/>7.4 Analysis of Uncertainty Bounds : : : : : : : : : : : : : : : : : : : 178<br/>7.4.1 Quadratic Programs for Bounding Errors : : : : : : : 180<br/>7.5 Stability Verication for Piecewise-Ane Learning and Control 185<br/>7.5.1 Piecewise Ane Models : : : : : : : : : : : : : : : : 185<br/>7.5.2 MIQP-based Stability Verication of PWA Systems 185<br/>7.5.3 Convergence of ACCPM : : : : : : : : : : : : : : : : 191<br/>7.6 Numerical Results : : : : : : : : : : : : : : : : : : : : : : : : : : : 193<br/>7.6.1 Pendulum System : : : : : : : : : : : : : : : : : : : : 193<br/>7.6.2 Dynamic Vehicle System with Skidding : : : : : : : : 197<br/>7.6.3 Comparison of Runtime Results : : : : : : : : : : : : 201<br/>7.7 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 201<br/>8 An Application to Solar Photovoltaic Systems 203<br/>8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203<br/>8.2 Problem Statement : : : : : : : : : : : : : : : : : : : : : : : : : : 208<br/>8.2.1 PV Array Model : : : : : : : : : : : : : : : : : : : : : 209<br/>8.2.2 DC-DC Boost Converter : : : : : : : : : : : : : : : : 211<br/>xi<br/>8.3 Optimal Control of PV Array : : : : : : : : : : : : : : : : : : : : 214<br/>8.3.1 Maximum Power Point Tracking Control : : : : : : : 217<br/>8.3.2 Reference Voltage Tracking Control : : : : : : : : : 226<br/>8.3.3 Piecewise Learning Control : : : : : : : : : : : : : : : 228<br/>8.4 Application Considerations : : : : : : : : : : : : : : : : : : : : : 229<br/>8.4.1 Partial Derivative Approximation Procedure : : : : : 230<br/>8.4.2 Partial Shading Eect : : : : : : : : : : : : : : : : : : 235<br/>8.5 Simulation Results : : : : : : : : : : : : : : : : : : : : : : : : : : 236<br/>8.5.1 Model and Control Verication : : : : : : : : : : : : 239<br/>8.5.2 Comparative Results : : : : : : : : : : : : : : : : : : : 239<br/>8.5.3 Model-Free Approach Results<br/>8.5.4 Piecewise Learning Results<br/>8.5.5 Partial Shading Results<br/>8.6 Summary<br/>9 An Application to Low-Level Control of Quadrotors <br/>9.1 Introduction<br/>9.2 Quadrotor Model <br/>9.3 Structured Online Learning with RLS Identier on Quadrotor <br/>9.3.1 Learning Procedure <br/>9.3.2 Asymptotic Convergence with Uncertain Dynamics<br/>9.3.3 Computational Properties<br/>9.4 Numerical Results<br/>9.5 Summary<br/>xii<br/>10 Python Toolbox <br/>10.1 Overview <br/>10.2 User Inputs<br/>10.2.1 Process <br/>10.2.2 Objective <br/>10.3 SOL <br/>10.3.1 Model Update<br/>10.3.2 Database<br/>10.3.3 Library<br/>10.3.4 Control<br/>10.4 Display and Outputs <br/>10.4.1 Graphs and Printouts<br/>10.4.2 3D Simulation <br/>10.5 Summary <br/>11 Appendix 291<br/>11.1 Supplementary Analysis of Remark 5.4<br/>11.2 Supplementary Analysis of Remark 5.5<br/>Bibliography
520 ## - SUMMARY, ETC.
Summary, etc	Model-Based Reinforcement Learning Explore a comprehensive and practical approach to reinforcement learning Reinforcement learning is an essential paradigm of machine learning, wherein an intelligent agent performs actions that ensure optimal behavior from devices. While this paradigm of machine learning has gained tremendous success and popularity in recent years, previous scholarship has focused either on theory-optimal control and dynamic programming - or on algorithms-most of which are simulation-based. Model-Based Reinforcement Learning provides a model-based framework to bridge these two aspects, thereby creating a holistic treatment of the topic of model-based online learning control. In doing so, the authors seek to develop a model-based framework for data-driven control that bridges the topics of systems identification from data, model-based reinforcement learning, and optimal control, as well as the applications of each. This new technique for assessing classical results will allow for a more efficient reinforcement learning system. At its heart, this book is focused on providing an end-to-end framework-from design to application-of a more tractable model-based reinforcement learning technique. Model-Based Reinforcement Learning readers will also find: A useful textbook to use in graduate courses on data-driven and learning-based control that emphasizes modeling and control of dynamical systems from data Detailed comparisons of the impact of different techniques, such as basic linear quadratic controller, learning-based model predictive control, model-free reinforcement learning, and structured online learning Applications and case studies on ground vehicles with nonholonomic dynamics and another on quadrator helicopters An online, Python-based toolbox that accompanies the contents covered in the book, as well as the necessary code and data Model-Based Reinforcement Learning is a useful reference for senior undergraduate students, graduate students, research assistants, professors, process control engineers, and roboticists.
545 0# - BIOGRAPHICAL OR HISTORICAL DATA
Biographical or historical note	About the Author<br/><br/>Milad Farsi received the B.S. degree in Electrical Engineering (Electronics) from the University of Tabriz in 2010. He obtained his M.S. degree also in Electrical Engineering (Control Systems) from the Sahand University of Technology in 2013. Moreover, he gained industrial experience as a Control System Engineer between 2012 and 2016. Later, he acquired the Ph.D. degree in Applied Mathematics from the University of Waterloo, Canada, in 2022, and he is currently a Postdoctoral Fellow at the same institution. His research interests include control systems, reinforcement learning, and their applications in robotics and power electronics.<br/><br/>Jun Liu received the Ph.D. degree in Applied Mathematics from the University of Waterloo, Canada, in 2010. He is currently an Associate Professor of Applied Mathematics and a Canada Research Chair in Hybrid Systems and Control at the University of Waterloo, Canada, where he directs the Hybrid Systems Laboratory. From 2012 to 2015, he was a Lecturer in Control and Systems Engineering at the University of Sheffield. During 2011 and 2012, he was a Postdoctoral Scholar in Control and Dynamical Systems at the California Institute of Technology. His main research interests are in the theory and applications of hybrid systems and control, including rigorous computational methods for control design with applications in cyber-physical systems and robotics.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Reinforcement learning.
Authority record control number	https://id.loc.gov/authorities/subjects/sh92000704.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element	Python (Computer program language)
Authority record control number	https://id.loc.gov/authorities/subjects/sh96008834.
655 #4 - INDEX TERM--GENRE/FORM
Genre/form data or focus term	Electronic books.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Liu, Jun
Titles and other words associated with a name	(Professor of applied mathematics),
Authority record control number	https://id.loc.gov/authorities/names/n2022058723
Relator term	author.
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	https://onlinelibrary.wiley.com/doi/book/10.1002/9781119808602
Link text	Full text is available at Wiley Online Library Click here to view.
942 ## - ADDED ENTRY ELEMENTS
Source of classification or shelving scheme
Item type	EBOOK

No items available.