Malliavin Calculus

(in progress) Some background of Mallivian Calculus

Introduction

This is a long post distilling some concepts of Malliavin Calculus and based on the lecture note of Martin Hairer.

Motivation Malliavin calculus is a modern tool tackling with differentiating random variable defined on a Gaussian probability space w.r.t. the underlying noise.

At the moment, I feel like White Noise Theory and Malliavian calculus share some similarity. They surely complement each other but I do not comprehend the difference between them, for example, what one can do but other cannot. I also plan to get to know more some background of rough path but this will be in another post.

Stochastic analysis centers around stochastic differential equations, i.e.,

\[dX_t = V_0(X_t)dt + \sum_{i=1}^m V_i(X_t) \circ dW_i(t)\]

where $\circ dW_t$ denotes Straonovich integration. In this equation, Hairer considers multiple noise parts.

White noise and Wiener chaos

This section concerns the definition of white noise under functional representation. Wiener chaos gives the decomposition form of white noise in which we will find somewhat similar to Fourier analysis or Sobolev space.

Definition of white noise

Now, let’s talk about spaces we will work on

$H = L^2(\mathbb{R}_+, \mathbb{R}^m)$: a real and separatable Hilbert space
$L^2(\Omega, \mathbb{P})$: for some probability space $(\Omega, \mathbb{P})$

White noise is linear isometrylinear map preserving distance $W: H \to L^2(\Omega, \mathbb{P})$ such that the ouput $W(h)$ is a real-valued Gaussian variable or

\[\mathbb{E}[W(h)] = 0, \qquad \mathbb{E}[W(h)W(g)] = \langle h, g \rangle_H.\]

The above is just the definition. How to establish such map will be shown next.

Orthonormal basis Here, we define

a sequence of i.i.d. normal random variable ${\xi_n}_{n\geq 0}$
an orthonormal basis ${e_n}_{n \geq 0}$ of $H$.

When representing $h = \sum_n h_n e_n \in H$, we construct $W(h)=\sum_n h_n \xi_n$. The normal random variable $\xi_n$ is now can rewrite in the functional form form $\xi_n = W(e_n)$.

In the lecture, $m$-dimensional Wiener process is defined by using funtion $\mathbf{1}_{[0,t)}^{(i)}$

\[(\mathbf{1}_{[0,t)}^{(i)})_j(s) = \begin{cases} 1 &\text{if } s \in [0, t) \text{ and } j=i \\ 0 &\text{otherwise } \end{cases}\]

Note that 1-dimensional case is easy to show, but for now, still follow the setup in the lecture.

The $i$-th dimension of Wiener process is defined as $W_i(t) = W(\mathbf{1}_{[0,t)}^{(i)})$ where we can check the covariance

\[\mathbb{E}[W_i(t)W_j(s)] = \langle \mathbf{1}_{[0,t)}^{(i)}, \mathbf{1}_{[0,s)}^{(j)} \rangle = \delta_{ij}(t \wedge s).\]

For arbitrary $h$, $W(h)$ can represent as (Wiener-Ito integral)

\[W(h) = \sum_{i=1}^m \int_0^\infty h_i(s) dW_i(s)\]

Note that we may need to clearly differentiate between $W(h)$ and $W_i(s)$.

Representation using Hermite polynomials

There are several formulation of Hermite polynomial

Recursive $H^\prime_n(x) = n H_{n-1}(x)$
A different recursive $H_{n + 1}(x) = xH_n(x) - H^\prime_n(x)$,
Explicit representation $H_n(x) = \exp\left(-\frac{D^2}{2}\right) x^n$, where $D$ is the differentiation w.r.t. to $x$, $\exp(\cdot)$ is defined in the Taylor expansion sense.

Some properties of Hermite polynomials:

$\mathbb{E}[H_n(X)] = 0$: $H_n(X)$ has zero-mean when $X \sim \mathcal{N}(0,1)$
Identity

\[\begin{aligned} \int H_n(x) H_m(x) e^{-x^2/2} dx = & \frac{1}{n + 1} \int H'_{n + 1}(x) H_m(x) e^{-x^2/2} dx \\ = & \frac{1}{n + 1} \int H_{n + 1}(x) (xH_m(x) - H'm(x)) e^{-x^2/2} dx \\ = & \frac{1}{n + 1} \int H_{n + 1}(x) H_{m + 1}(x) e^{-x^2/2} dx \end{aligned}\]

This recursive leads to $\mathbb{E}[H_n(X)H_m(X)] = n! \delta_{n.m}$

Let’s define linear subspaces of $L^2(\Omega, \mathbb{P})$ as

\[\mathcal{H} = \{H_n(W(h)), h \in H, \lvert\lvert h \rvert\rvert_H = 1\}\]

We have the following decomposition

Theorem Let $\mathcal{F}$ be the $\sigma$-algebra generated by $W$. Then,

\[L^2(\Omega, \mathcal{F}, \mathbb{P}) = \bigoplus_{n=0}^\infty \mathcal{H}_n\]

Proof Let $X \in L^2(\Omega, \mathcal{F}, \mathbb{P})$ be orthogonal to $\mathcal{H}_n$ for all $n$.

\[\mathbb{E}[XH_n(W(h))] = 0, \forall n \quad \Rightarrow \quad \mathbb{E}[X\exp(W(h))] = 0\]

We need to show that $X=0$. Splitting $X = X^+ - X^-$, and define the following measures

\[\nu^{+,-} = \mathbb{E}[X^{+,-} \mathbf{1}_B(W(h_1), \dots, W(h_m))], \quad B \in \mathcal{B}(\mathbb{R}^m)\]

Applying Laplace transform for $\nu$, we deduce

\[\varphi_{\nu^{+,-}}(\lambda) = \int \exp(\lambda \cdot x) \nu^{+,-}(dx) = \mathbb{E}[X^{+,-}\exp(\sum_i \lambda_i W(h_i))] = 0\]

As the Laplace is zero, then the measure is zero. Thus, $\mathbb{E}[X\mathbb{1}_F] = 0, \forall F \in \mathcal{F}$. Therefore, $X=0$ and we can conclude the proof.

Representation using multiple stochastic integrals

This section defines multiple Wiener-Ito integral w.r.t. Brownian motion. With this definition we can lead to a similar decomposition like the representation of Hermite polynomials presented above.

Consider

measure space $(T, \mathcal{B}, \mu)$
one-dimensional Brownian motion $B(t), t \in T = [a, b]$
Functional space $H=L^2([a, b], \mathbb{R})$
Functional representation $W(h) = \int_a^b h(s) dB(s)$

Like traditional stochastic calculus, this tries to set up a corner stone with elementary process

\[\mathcal{E} = \{u(t) = \sum_i F_i \mathbf{1}_{(t_i, t_{i+1}]}(t), t1 < \dots < t_{n+1}, t_i \in T, F_i \in \mathcal{F}_{t_i} \text{square integrable} \}\]

The Ito integral w.r.t. Brownian motion is

\[\int_T u(t) dB(t) = \sum_i F_i(B(t_{i+1}) - B(t_i))\]

Definition of multiple Wiener-Ito integral This is a multiple dimensional integral

\[\int_{T^n} f(t_1, t_2, \dots, t_n) dB(t_1)dB(t_2)\dots dB(t_n)\] \[I_n(f) = \sum_{i_1, \dots, i_n} a_{i_1 \dots i_n} \xi_{i_1} \dots \xi_{i_n}\]

Looking at this equation, one may think that order of $i_1,\dots, i_n$ may affect to $a$ but not for the product of $\xi$. So, the symmetrized version is defined (because $I_n$ is linear as well)

\[\tilde{f}(t_1, \dots, t_n) = \frac{1}{n!} \sum_{\sigma \in \mathcal{S}_n} f(t_{\sigma(1)}, \dots, t_{\sigma(n)})\]

with $\mathcal{S}_n$ is the set of all permutations. Because $dt_1…dt_n$ is symmetry, we have

\[\int_{T^n} |f(t_1, \dots, t_n)|^2dt_1...dt_n = \int_{T^n}f(t_{\sigma(1)}, \dots, t_{\sigma(n)}) dt_1...dt_n\]

Using the triangle inequality, we have

\[\lvert\lvert \tilde{f}\rvert\rvert_{L^2(T^n)} \leq \frac{1}{n!} \sum_{\sigma \in \mathcal{S}_n} \lvert\lvert {f}\rvert\rvert_{L^2(T^n)} = \lvert\lvert {f}\rvert\rvert_{L^2(T^n)}\]

We can say the Wiener-Ito integral of $f$ and $\tilde{f}$ are the same

Lemma If $f \in \mathcal{E}_n$, elementary process, then $I_n(f) = I_n(\tilde{f})$

This is quite easy to see if considering the symmetry of $\prod_i (B(t_i^{(2)} - B(t_i^{(1)}))$. The permutation version of this will have the same result.

Next, the following is the orthogonal property.

Lemma If $f, g \in \mathcal{E}_n$, elementary process, then

$\mathbb{E}[I_n(f)] = 0, \quad \quad \mathbb{E}[I_n(f)I_m(g)] = \begin{cases} 0, \quad& n \neq m \\ n! \langle \tilde{f}, \tilde{g} \rangle_{L^2(T^n)}, &n=m \end{cases}$

The first expectation is straightforward because using definition of elementary process and expect of Brownian motion

The second expectation needs to be treated more carefully. By the definition, this product will be the product of two summations, only the case that $\mathbb{E}[\xi^2] = \Delta t$ (basic Brownian motion property) remains, explaining when $n\neq m$, the expectation vanishes.

Continuing with defining the Wiener-Ito integral on $L^2{T^n}$ instead of elemetary process space, the general steps are based on a sequence of ${f_k} \in \mathcal{E}_n$ converging to $f \in L^2{T^n}$. This leads to the convergence in probability of expecation of $I_n(f)$.

The Malliavin derivative

Definition and properties

Goal: Rigorously define differentation w.r.t. white noise.

In Wiener process, we usually encounter that its derivative is a Gaussian noise, $\xi_i(t) = \frac{dW_i}{dt}$

The new operator $D_t^{(i)}$ takes derivative of a random variable w.r.t to $\xi_i(t)$. We may expect this operator works as

\[D_t^{(i)} W(h) = h_i(t)\]

It is because

\[W(h) = \sum_{i=1}^m \int_0^\infty h_i(t)\xi_i(t) dt.\]

We also expect the chain rules

\[D_t^{(i)} F(X1, \dots, X_n) = \sum_{k=1}^n \partial_k F(X_1, \dots, X_n)D_t^{(i)}X_k\]

In fact, the definition of $\mathscr{D}F$ can be interpreted as a directional derivative

\[\langle DF, h \rangle = \lim_{\epsilon \to 0}\frac{1}{\epsilon} (F(W(h_1) + \epsilon \langle h_1, h\rangle, \dots, W(h_n) + \epsilon \langle h_n, h\rangle) - F)\]

Proposition (Integration by parts) For every $X$, $h$, one has the identity $\mathbb{E}[\langle {D}X, h\rangle_H] = \mathbb{E}[XW(h)]$

Proof It is okay to consider only the case $\lvert\lvert h \rvert\rvert_H=1$.Suppose orthonormal basis ${e_1, \dots, e_n}$ of $H$ such that $h=e_1, F = f(W(e_1), \dots, W(e_n))$

Given $\phi(x)$ denoting standard normal distribtution, we have

$\mathbb{E}[\langle DF, h \rangle_H] = \int \partial_1 f(x)\phi(x)dx = \int f(x)\phi(x)x_1 dx = \mathbb{E}[FW(e_1)]= \mathbb{E}[FW(h)]$

The second equation used integration by part.

The following result uses $D(GF) = (DG)F + G(DF)$ (something like chain rule).

Lemma Let $F, G \in \mathcal{S}$ and $h \in H$

$\mathbb{E}[G\langle {D}F, h\rangle_H] = -\mathbb{E}[F\langle DG, h \rangle_H] + \mathbb{E}[FGW(h)]$

Proposition [Chain rule] Let $g: \mathbb{R}^d \to \mathbb{R}$ be a function in $\mathcal{C}^1$ with bounded partial deriviatives. Let $p\geq 1$ and $F = (F^1, \dots, F^d), F^i \in \mathbb{D}^{1,d}$. Then $g(F) \in \mathbb{D}^{1, p}$ and

\[D(g(F)) = \sum_{i=1}^d \partial_i g(F)DF^i\]

The derivative operator in the white noise case

Consider the case of one-dimensional Brownian motion $B(t), t \in T = [a, b], H = L^2(T)$. The functional $W(h) = \int_a^b h(s) dB(s)$

Proposition $F = \sum_{n=0}^\infty I_n(f_n(\cdot, t))$ $D_tF = \sum_{n=1}^\infty n I_{n-1}(f_n(\cdot, t)).$

Proof We also start with elementary process where $f_n \in \mathcal{E}_n$ symmetric. Consider a really simple case $F = I_n(f_n)$

Proposition Let $g: \mathbb{R}^d \to R$ be a Lipschitz function ($\lvert g(x) - g(y)\rvert \leq K \lvert\lvert x- y\rvert\rvert$). Suppose $F = (F^1, \dots, F^d)$ is a random vector such that $F^i \in \mathbb{D}^{1,2}$. Then $g(F) \in \mathbb{D}^{1,2}$ and there exists a random vector $G=(G_1, \dots, G_d)$ such that

\[D(g(F)) = \sum_{i=1}^d G_i DF^i.\]

Divergence operator

In short, divergence operator is defined as the dual (adjoint) of the derivative operator defined in the previous section.

Definition of divergence operator

The divergence operator is denoted as $\delta$ which is unbounded, $\delta: L^2(\Omega; H) \to L^2(\Omega)$, satisfying

The domain of $\delta$, $\text{Dom} \delta$, contains $u \in L^2(\Omega; H)$ such that $\lvert \mathbb{E}[\langle DF , u \rangle_H] \rvert \leq c_u \lvert \lvert F \rvert\rvert_{L^2(\Omega)}$
Duality relation: $\mathbb{E}[F\delta(u)] = \mathbb{E}[\langle DF, u \rangle_H]$

Proposition[Properties of divergence]

If $u \in \text{Dom}(\delta)$, then $\mathbb{E}[\delta(u)] = 0$
Divergence operator is linear and closed under $\text{Dom} \delta$
$\delta(u) = \sum_{j=1}^n F_j W(h_j) - \sum_{j=1}^n \langle DF_j, h_j \rangle_H$
$\langle D(\delta(u)), h\rangle_H = \langle u, h \rangle_H + \delta\left( \sum_j \langle DF_j, h\rangle_H h_j \right)$ </i><aside>To prove the third point, we may use the integral by parts of the derivative operator</aside>

The Skorohod integral

This part will consider the restricted case which is Brownian motion. This makes the divergence $\delta(u)$ now is the Skorohod integral.

Consider the Wiener chaos expansion

\[u(t) = \sum_n I_n(f_n(\cdot, t))\]

The Skorohod integral will be represented as

\[\delta(u) = \sum_{n=0}^\infty I_{n+1}(\tilde{f}_n)\]

converging in $L^2(\Omega)$ where

$$ \tilde{f}_n(t_1, \dots, t_n, t) = \frac{1}{n+1} (f_n(t_1, \dots, t_n, t) + \sum_{i=1}^n f_n(t_1, \dots, t_{i-1}, t, t_{i+1}, \dots, t_n, t_i)) $$

Proposition[Skorohod integral is Ito integral] $\delta(u)$ coincides with the Ito integral w.r.t. Brownian mtion, that is

\[\delta(u) = \int_{a}^b u(s)dB(s)\]

Proof Consider an elementary adapted process

\[u_t = \sum_j F_j \mathbf{1}_{(t_j, t_{j+1})}(t)\]

Now looking at each of component in the sum

\[\delta(F_j \mathbf{1}_{(t_j, t_{j+1})}(\cdot)) = F_j \delta(\mathbf{1}_{(t_j, t_{j+1})}(\cdot)) - \int_t D_t F_j \mathbf{1}_{(t_j, t_{j+1})}(t) dt = F_j(B(t_{j+1}) - B(t_j))\]

The Clark-Ocone formula

Given $F$, exist $u$ such that $F = \mathbb{E}[F] + \int_0^\infty u(t)dB(t)$

This result says that a stochastic process can be represented by its mean which is a deterministic part and a randomness part.

Proof First, consider zero-mean integrable random variable $G$ that is orthogonal to all stochastic integrals $\int_{\mathbb{R}_+} u(t) dB(t)$.

Let $M_u(t) = \exp(\int_0^t u(s) dB(s) - \frac{1}{2}\int_0^t u^2(s)ds)$. By Ito’s formula

\[M_u(t) = M_u(0) + \int_0^t M_u(s) u(s) dB(s)\]

Hence, such random variable $G$ is orthogonal to

\[\mathcal{E}(h) = \exp\left(\int_0^\infty h(s) dB(s) - \frac{1}{2} \int_0^\infty h^2(s) ds\right)\]

And ${ e^{W(h)}, h \in L^2(\mathbb{R}_+) }$ form a total subset of $L^2(\Omega)$, this leads to the desired conclusion.

Integration by parts and regularity

This section provide the foundation of the integration by parts in Malliavin calculus. This will help

The integration by parts formula

Proposition Let $F, G$ be two random variables such that $F \in \mathbb{D}^{1,2}$ domain of first-order derivative operator $D$ in $L^2$. Let $u$ be an $H$-valued random variable such that $\langle DF, u \rangle_H$ a.s. and $Gu(\langle DF, u\rangle_H)^{-1} \in \text{Dom } \delta$. Then for any function $f\in \mathcal{C}^1$ with bounded derivatives, we have that

\[\mathbb{E}[f'(F)G] = \mathbb{E}[f(F) H(F, G)],\]

where $H(F, G) = \delta(Gu(\langle DF, u\rangle_H)^{-1})$ .

Existence and smoothness of densities

Hormander’s therem

The main focus of this theorem is to show there is a unique solutions for SDEs under some conditions.

Consider the following setup:

Brownian motion has $d$ dimensions: $B(t) = (B^1(t), \dots, B^d(t)), t\in [0, T]$
Linear growth assumption

Let $X(t)$ be the solution of $d$-dimensional systems of SDEs

\[dX_i(t) = \sum_{j=1}^d \sigma_{ij}(X(t))dB^j(t) + b_i(X(t))dt, \quad X_i(0) = x_0^i, \quad i = 1,\dots, d\]

Theorem There exists a unique continuous soltion and the following expectation is bounded

\[\mathbb{E}\left[\sup_{0\leq t \leq T} \lvert X(t)\rvert^p\right] \leq C\]

for any $p \geq 2$, where $C = C(p, T, K)> 0$

Theorem The derivative $D^j_rX_i(s))$

\[D^j_rX_i(t)) = \sigma_{ij}(X(r)) + \sum_{k,l=1}^d \int_r^t \partial_k \sigma_{il}(X(s))D^j_s(X_k(s))dB^l(s) + \sum_{k=1}^d\int_r^t \partial_k b_i(X(s))ds\]

Some notations:

Vector fields: $\sigma_j = \sum_{i=1}^d\sigma_{ij}(x)\frac{\partial}{\partial x_i}$, $b = \sum_{i=1}^d b_i(x)\frac{\partial}{\partial x_i}$
Covariant derivative: $\sigma_j \nabla\sigma_k = \sum_{i,l=1}^d \sigma_{lj} \partial_l \sigma_{ik}\frac{\partial }{\partial x_i}$
Lie bracket: $[\sigma_j,\sigma_k] = \sigma_j \nabla \sigma_k - \sigma_k \nabla \sigma_j$
Define $\sigma_0 = b - \frac{1}{2}\sum_{i=1}^d\sigma_i \nabla \sigma_i$

With these notation, the above SDE can be defined with a Stratonovich integral

\[X(t) = X_0 + \sum_{j = 1}^d \int_0^t \sigma(X(s)) \circ dB^j(s) + \int_0^t \sigma_0(X(s))ds\]

Holder condition This is a vector space spanned by the vector filed

\[\mathbf{(H)} = \text{span} \{\sigma_1, \dots, \sigma_d, [\sigma_i, \sigma_j], [\sigma_i, [\sigma_j, \sigma_k]]\}\]

Theorem Assume that Hormander’s condition $\mathbf{(H)}$ holds and the coefficients of SDE are finitely differentiable. Then for any $t > 0$, $X(t)$ has an infintely differentiable density.

The proof is based on the quadratic varation is large, then the semimartingale is small with an exponentially small probability.

Applications

This part will focus on how to use Malliavin calculus in mathematical finance. Again, the main concern when I read this section is that the benefit of using Malliavin calculus over Ito calculus. The first three subsections contains some introductory background. The remaining subsections discussed the actual use of Malliavin calculus.

Pricing and hedging financial options

This will give a brief introduction of options. An option is a contract, or right to buy (put) or sell (call) an amount of assets. Some terminologies related to this concept are

Strike price or exercise price $K$
Maturity or exersize time $T$
When performing exchange or executing what is written in contract, there is a fee $x$ for doing so. It is called option premium.

There are two ways of exercising options

Europian options: exercising at maturity $T$
American options: exercising at any time before maturity.

The value of put or call are decided by

\[C_T = \max(S_T - K, 0), \quad P_T = \max(K - S_T, 0)\]

To further work with these values, we take into account their neural risk which involves analyzing their statistical estimation (mostly expectation) w.r.t. market randomness.

Two main questions that might be interesting are

pricing option: evaluate the price of an option at time $t=0$
hedging option: evaluate the value of an option at maturity.

The Black-Scholes model

The Black-Scholes model is very well-known in quantitative finance field, helping us to understand of the market dynamics. The downside, however, is that the model has restricted assumptions, therefore, usually is used for pedagodical purposes.

\[dS_t = S_t \mu dt + S_t\sigma dB_t\]

Ito calculus allows us to have a close-form solution for this SDE

\[S_t = S_0 \exp(\mu t - \frac{\sigma^2}{2}t + \sigma B_t)\]

Therefore,

\[\mathbb{E}[S_t] = S_0\exp(\mu t), \quad \mathbb{E}[S_t^2] = S_0^2 \exp((2\mu + \sigma^2)t)\]

Pricing and hedging options in the Black-Scholes model

There is an equivalence between the solution of Black-Scholes models and a partial diferential equation (PDE).

Theorem Let $h$ be a continuous function of at most linear growth. Assume that $v(t, y)$ is a regular solution of PDE

\[\begin{cases} \frac{1}{2}\sigma^2 y^2 \frac{\partial^2 v}{\partial y^2} + ry \frac{\partial v}{\partial y} + \frac{\partial v}{\partial t} - r v = 0 \\ v(T, y) = h(y) \end{cases}\]

There exists a portfolio with value $v(t, S_t)$ at time $t$ replicated flow $h(S_T)$. And the value of this hedging portfoliio is given by $\beta(t, S_t) = \frac{\partial v}{\partial y}(t, S_t)$.

Let’s take a moment to think how to interpret this theorem. Looking at the boundary condition, this means that we may expect that at maturity $T$, the solution $v$ should agree with the function $h$. And the solution $v(t, y)$ on $(0, T)$ describes the dynamics of $v$ along the interval. This is the backward solution because we start at $T$ and go back to $0$.

Proof

Using Ito's formula to $v(t, S_t)$

\[dv(t, S_t) = \frac{\partial v}{\partial t}(t, S_t) dt + \frac{\partial v}{\partial y}(t, S_t) dS_t + \frac{1}{2}\sigma^2S_t^2\frac{\partial^2 v}{\partial y^2}(t, S_t) dt\]

The above is purely a mathematical derivation. On the hand, managing portfolio requires to

\[dv(s, S_t) = v(t, S_t)rdt + \beta(t, S_t)(dS_t - rS_tdt)\]

Picking $\beta(t, S_t) = \frac{\partial v}{\partial y}(t, S_t)$, the part with $dS_t$ vanishes, the remain will be reduced to

\[\frac{\partial v}{\partial t}(t, S_t) + \frac{1}{2}\sigma^2S_t^2\frac{\partial^2 v}{\partial y^2}(t, S_t) = v(t, S_t)rdt - rS_t\frac{\partial v}{\partial y}(t, S_t)\]

And we obtain the expect PDE.

Sensibility with respect to the parameters: the greeks

Consider the price of an option $V_0$ with strike $K$ and maturity $t$.

The most crucial parameters are $(x, r, \sigma, T, K)$

the premimum $x$
the interest rate $r$
the volatility $\sigma$

People working in finance are interested in obtaining some quanities named after some characters in Greek alphabet:

Delta: $\Delta = \frac{\partial V_0}{\partial x}$
Gamma: $\Gamma = \frac{\partial^2 V_0}{\partial^2 x}$
Vega: $\vartheta = \frac{\partial V_0}{\partial \sigma}$

These Greeks will be computed using integration by parts of Mallivian calculus.