2020begin
2121 import Pkg
2222 Pkg. activate (" ." )
23+ Pkg. instantiate ()
2324 # Pkg.status()
2425 using PlutoUI
2526 using Random
@@ -38,11 +39,89 @@ using ForwardDiff
3839# ╔═╡ ec473e69-d5ec-4d6a-b868-b89dadb85705
3940ChooseDisplayMode ()
4041
42+ # ╔═╡ 1f774f46-d57d-4668-8204-dc83d50d8c94
43+ md " # Intro - Optimal Control and Learning
44+
45+ In this course, we are interested in problems with the following structure:
46+
47+ ```math
48+ \b egin{equation}
49+ \!\!\!\!\!\!\!\!\m in_{\s ubstack{(\m athbf y_1,\m athbf x_1)\\\m athrm{s.t.}}}
50+ \!\u nderset{%
51+ \p hantom{\s ubstack{(\m athbf y_1,\m athbf x_1)\\\m athrm{s.t.}}}%
52+ \!\!\!\!\!\!\!\!\!\! (\m athbf y_1,\m athbf x_1)\i n\m athcal X_1(\m athbf x_0)%
53+ }{%
54+ \!\!\!\! f(\m athbf x_1,\m athbf y_1)%
55+ }
56+ +\m athbb{E}_1\B igl[
57+ \q uad \c dots
58+
59+ \; +\;\m athbb{E}_t\B igl[
60+ \m in_{\s ubstack{(\m athbf y_t,\m athbf x_t)\\\m athrm{s.t.}}}
61+ \!\u nderset{%
62+ \p hantom{\s ubstack{(\m athbf y_t,\m athbf x_t)\\\m athrm{s.t.}}}%
63+ \!\!\!\! (\m athbf y_t,\m athbf x_t)\i n\m athcal X_t(\m athbf x_{t-1},w_t)%
64+ }{%
65+ \!\!\!\!\!\!\!\!\!\! f(\m athbf x_t,\m athbf y_t)%
66+ }
67+ +\m athbb{E}_{t+1}[\c dots]
68+ \B igr].
69+ \e nd{equation}
70+ ```
71+ which minimizes a first stage cost function $f(\m athbf{x}_1,
72+ \m athbf{y}_1)$ and the expected value of future costs over possible
73+ values of the exogenous stochastic variable $\{ w_{t}\} _{t=2}^{T} \i n
74+ \O mega$.
75+
76+ Here, $\m athbf{x}_0$ is the initial system state and the
77+ control decisions $\m athbf{y}_t$ are obtained at every period $t$
78+ under a feasible region defined by the incoming state
79+ $\m athbf{x}_{t-1}$ and the realized uncertainty $w_t$. $\m athbf{E}_t$ represents the expected value over future uncertainties $\{ w_{\t au}\} _{\t au=t}^{T}$. This
80+ optimization program assumes that the system is entirely defined by
81+ the incoming state, a common modeling choice in many frameworks (e.g.,
82+ MDPs). This is without loss of generality,
83+ since any information can be appended in the state. The system
84+ constraints can be generally posed as:
85+
86+ ```math
87+ \b egin{align}
88+ &\m athcal{X}_t(\m athbf{x}_{t-1}, w_t)=
89+ \b egin{cases}
90+ \m athcal{T}(\m athbf{x}_{t-1}, w_t, \m athbf{y}_t) = \m athbf{x}_t \\
91+ h(\m athbf{x}_t, \m athbf{y}_t) \g eq 0
92+ \e nd{cases}
93+ \e nd{align}
94+ ```
95+ "
96+
97+ # ╔═╡ a0f71960-c97c-40d1-8f78-4b1860d2e0a2
98+ md """
99+ where the outgoing state of the system $\m athbf{x}_t$ is a
100+ transformation based on the incoming state, the realized uncertainty,
101+ and the control variables. $h(\m athbf{x}_t, \m athbf{y}_t) \g eq 0$
102+ captures the state constraints. Markov Decision Process (MDPs) refer
103+ to $\m athcal{T}$ as the "transition kernel" of the system. State and
104+ control variables are restricted further by additional constraints
105+ captured by $h(\m athbf{x}_t, \m athbf{y}_t) \g eq 0$. We
106+ consider policies that map the past information into decisions. In
107+ period $t$, an optimal policy is given by the solution of the dynamic
108+ equations:
109+
110+ ```math
111+ \b egin{align}
112+ V_{t}(\m athbf{x}_{t-1}, w_t) = &\m in_{\m athbf{x}_t, \m athbf{y}_t} \q uad \! \! f(\m athbf{x}_t, \m athbf{y}_t) + \m athbf{E}[V_{t+1}(\m athbf{x}_t, w_{t+1})] \\
113+ & \t ext{ s.t. } \q uad\m athbf{x}_t = \m athcal{T}(\m athbf{x}_{t-1}, w_t, \m athbf{y}_t) \n onumber \\
114+ & \q uad \q uad \q uad \! \! h(\m athbf{x}_t, \m athbf{y}_t) \g eq 0. \n onumber
115+ \e nd{align}
116+ ```
117+ """
118+
41119# ╔═╡ 52005382-177b-4a11-a914-49a5ffc412a3
42- md "# 101 (Continuous-Time) Dynamics
43- #### A Crash Course
120+ section_outline ( md "A Crash Course: " , md " (Continuous-Time) Dynamics
121+ " )
44122
45- General form for a smooth system:
123+ # ╔═╡ 8ea866a6-de0f-4812-8f59-2aebec709243
124+ md " General form for a smooth system:
46125
47126```math
48127\d ot{x} = f(x,u) \q uad \t ext{First-Order Ordinary Differential Equation (ODE)}
@@ -56,7 +135,6 @@ u \in \mathbb{R}^{m} & \text{Control} \\
56135\d ot{x} \i n \m athbb{R}^{n} & \t ext{Time derivative of } x \\
57136\e nd{cases}
58137```
59-
60138"
61139
62140# ╔═╡ 2be161cd-2d4c-4778-adca-d45f8ab05f98
@@ -951,7 +1029,10 @@ end
9511029# ╔═╡ Cell order:
9521030# ╟─13b12c00-6d6e-11f0-3780-a16e73360478
9531031# ╟─ec473e69-d5ec-4d6a-b868-b89dadb85705
1032+ # ╟─1f774f46-d57d-4668-8204-dc83d50d8c94
1033+ # ╟─a0f71960-c97c-40d1-8f78-4b1860d2e0a2
9541034# ╟─52005382-177b-4a11-a914-49a5ffc412a3
1035+ # ╟─8ea866a6-de0f-4812-8f59-2aebec709243
9551036# ╟─2be161cd-2d4c-4778-adca-d45f8ab05f98
9561037# ╟─b452ee52-ee33-44ad-a980-6a6e90954ee1
9571038# ╟─9f62fae9-283c-44c3-8d69-29bfa90faf29
0 commit comments