FM revealed the relationship between flow $\phi_t$, vector field $u_t$ and probability density path (object $p_t$ and simple gaussian $p_0$), defined the objective of flow matching technique: use a neural network to approximate $u_t$, and find a possible way to define a $u_t$.


Flow, field and probability density path

Think of a small ball floating in a 3D space. It starts at a point we call $x_0$. The ball's movement follows a specific pattern called a velocity field $u_t(x)$. Here, $t$ is just the time, and $x$ shows where the ball is. If we know where the ball starts, we can figure out where it will be at any time $x(t)$ by using an ODE:

$$ \frac{d\boldsymbol{x}}{dt}=\boldsymbol{u}_t(\boldsymbol{x}),\quad\boldsymbol{\phi}_0(\boldsymbol{x})=\boldsymbol{x} $$

$\boldsymbol{\phi}_t(\boldsymbol{x})$ is the solution to this ODE, representing the position at time $t$ when starting from $x_0$.

Let's expand our view from one ball to many balls. Imagine many balls spread out in space, each with a starting position described by a distribution $p_0(x)$. As these balls move according to our rules, their distribution changes over time. We call this new distribution at any time $t$ as $p_t(x)$. While we don't need to know exactly what this distribution looks like, we do know it follows the continuity equation:

$$ \frac{dp_t}{dt}=-\nabla_x(\boldsymbol{u}_t(\boldsymbol{x})p_t(\boldsymbol{x})) $$

proof

If we can find such a flow, vector field, and probability distribution, then we can naturally define a diffusion process.

Let's extend this idea beyond just 3D space to work with any number of dimensions: the density function of the balls is a possibility dist. . As time passes, these balls move around, and their pattern changes from one shape to another. This is the basic idea of flow matching and similar modeling methods: we want to change distribution $A$ into another pattern distribution $B$.

To make this more concrete: we start with an initial dist. $p_0$ and want to reach a final dist. $p_1$. At any point in between, we call it dist. $p_t(x)$, and corresponding $u_t(x)$: $[0,1]\times\mathbb{R}^n\to\mathbb{R}^n$.

Once we know $u_t$, initial samples $x$, we can derive $p_1$ by solving the ODE above. FM introduced to use a network to approximate by optimize the following:

$$ \mathbb{L}{fm}(\theta)=\mathbb{E}{t,\boldsymbol{x}\sim p_t(\boldsymbol{x})}\begin{bmatrix}\|\boldsymbol{v}_\theta(\boldsymbol{x},t)-\boldsymbol{u}_t(\boldsymbol{x})\|^2\end{bmatrix} $$


$p_t(x)$ & $u_t(x)$ and conditional flow matching

$p_t(x)$ and $u_t(x)$ are hard to find a deterministic solution. But, we can use $p_t(x|x_1)$ and $u_t(x|x_1)$ (suppose $u_t(x|x_1)$ has a corresponding $p_t(x|x_1)$ )to estimate $p_t(x)$ and $u_t(x)$ while keeping continuity equation if we define the following (proof):

  1. $p_0(x)=p_0(x|x1)=p(x)$
  2. $p_1(x|x_1)\sim N(x; x_1, \sigma^2I) \approx q(x)$
  3. $\int p_1(x|x1)q(x_1)dx_1 = \int f(x -x_1)q(x_1)dx_1 = (q*f)(x) \approx q(x)$
  4. $u_t(x)=\int u_t(x|x_1)\frac{p_t(x|x_1)q(x_1)}{p_t(x)}dx_1$