In this note, I will take a short review for a part of this paper. I will try to bring (learn) some necessary background of stochastic optimal control.
The main object discussed shortly in the following is represented as a SDE
\[dX_t = b(X_t, t)dt + dW_t\]Let’s consider a determistic control problem first
\[dx_t = F(x_t, u_t)dt\]where $x_t$ is the state variable, $u_t$ is the control variable. The optimal control problem tries to mimimize a value function
\[V(x_t, t) = \int_t^T C(x_t, u_t) dt + D(x_T)\]where $C$ is the control cost function and $D$ is the cost at the terminal time $T$. In fact, the value function here can be approximated with a small change of time, $dt$.
\[V(x_t, t) = \min_u \left\{C(x_t, u_t) + V(x_{t + dt}, t + dt)\right\}, \quad V(x, T) = D(x)\]This result has a long history and is known as the Hamilton-Jacobian-Bellman equation in dynamic programming. The right-hand side is further approximated with Taylor expansion.
\[\frac{\partial V}{\partial t} + \min_u\left\{C(x_t, u_t) + \frac{\partial V}{\partial x} F(x_t, u_t)\right\} = 0\]This is a ODE. However, it is nontrivial to solve this because it includes a minimization problem which we do not know exactly how to optimize it.
Now, turn our attention with the extension of the above where we consider a stochastic version
\[dX_t = b(X_t, u_t)dt + \sigma(X_t, u_t)dW_t\]and then value function here is
\[V(X_t, t) = \mathbb{E}\left[\int_t^T C(X_t, u_t) dt + D(X_T)\right]\]Similarly, as we apply Ito’s rule when approximate $V(X_{t + dt}, t + dt)$ (just remember that second-order derivative involves here), we can derive
\[\frac{\partial V}{\partial t} + \min_u\left\{C(x_t, u_t) + \frac{\partial V}{\partial x} F(x_t, u_t) + \frac{1}{2} \sigma^2(X_t, u_t) \frac{\partial^2 V}{\partial x^2}\right\} = 0, \quad V(x, T) = D(x)\]Again, this is difficult to solve in general.
The paper restricts the SDE with constant diffusion function
\[dX_t = b(X_t, t) dt + dW_t, \quad t\in [0, 1], X_0 = x_0\]The corresponding control SDE being studied is
\[dX_t^u = (b(X^u_t, t) + u(X^u_t, t)) dt + dW_t, \quad t\in [0, 1], X_0^u = x_0\]and the value function here will be denoted as $J$
\[J(x, t) = \mathbb{E}\left[\int_t^1 C(X_s^u, u_s) ds + D(X_1^u) \mid X_t^u=x\right]\]With the above derivation in the previous section, we obtain
\[\frac{\partial J}{\partial t} + \min_u\left\{C(x, u) + (b + u) \frac{\partial J}{\partial x} + \frac{1}{2}\frac{\partial^2 J}{\partial x^2}\right\} = 0\]Moving terms around we obtain
\[\frac{\partial J}{\partial t} + \mathcal{L}_tJ = -\min_u \left\{ C(x, u) + u \frac{\partial J}{\partial x}\right\}\]In the paper, the control cost function is the norm of $u$, therefore, the solution is exact and $u(x, t) = \frac{1}{2}\frac{\partial J}{\partial x}$, resulting
\[\frac{\partial J}{\partial t} + \mathcal{L}_tJ = \frac{1}{2}\lvert\lvert \frac{\partial J}{\partial x} \rvert\rvert^2\]Let $h(x, t) = \mathbb{E}[D(X_1) \mid X_t = x]$ is the value function of uncontrolled SDE, then
\[\frac{\partial h}{\partial t} + \mathcal{L_t}h = 0\]If we pick $J(x,t) = -\log h(x, t)$, we can obtain the exact PDE for $J$. The we can say the two value functions are the same
\[\underbrace{-\log \mathbb{E}[D(X_1)|X_t=x]}_{ \text{uncontrolled SDE value function}} = \underbrace{\min_u \mathbb{E} \left[\int_t^1 C(X_s^u, u_s) ds + D(X_1^u) \mid X_t^u=x\right] }_{\text{controlled SDE value function}}\]