第三章　线性高斯估计

Chapter 3 Linear-Gaussian Estimation

本章概览 / Chapter Overview

本章研究最简洁的情形：运动模型和观测模型都是线性的，所有噪声都是高斯的。在这种情况下，估计问题存在精确的闭合解。我们将从批量（Batch）方法出发，逐步推导出递归滤波器（Kalman filter）和平滑器（RTS smoother），最后讨论连续时间下的高斯过程回归方法。

This chapter studies the simplest case: linear motion and observation models, Gaussian noise. Exact closed-form solutions exist. We begin with the batch approach, derive the Kalman filter and RTS smoother, and conclude with continuous-time Gaussian process (GP) regression.

3.1 批量离散时间估计 / Batch Discrete-Time Estimation

3.1.1 问题设置 / Problem Setup

中文

考虑一个机器人随时间演化的系统。我们用两类方程描述它：

运动模型（Motion model）：描述状态如何随时间变化

$\mathbf{x}_k = \mathbf{A}_{k-1}\,\mathbf{x}_{k-1} + \mathbf{v}_k + \mathbf{w}_k, \quad \mathbf{w}_k \sim \mathcal{N}(\mathbf{0},\,\mathbf{Q}_k) \tag{3.1}$

观测模型（Observation model）：描述传感器如何感知状态

$\mathbf{y}_k = \mathbf{C}_k\,\mathbf{x}_k + \mathbf{n}_k, \quad \mathbf{n}_k \sim \mathcal{N}(\mathbf{0},\,\mathbf{R}_k) \tag{3.2}$

其中：

$x_{k} \in R^{N}$ ： $k$ 时刻的状态（位置、速度等）
$A_{k - 1}$ ：状态转移矩阵（State transition matrix）
$v_{k}$ ：已知的输入/控制量（known input/control）
$w_{k}$ ：过程噪声（Process noise），协方差 $Q_{k}$
$y_{k} \in R^{M}$ ：传感器测量
$C_{k}$ ：观测矩阵（Observation matrix）
$n_{k}$ ：测量噪声（Measurement noise），协方差 $R_{k}$

另外，初始状态先验为 $x_{0} \sim N (\overset{ˇ}{x}_{0}, \overset{ˇ}{P}_{0})$ ，符号 $\overset{ˇ}{\cdot}$ 表示”先验”， $\hat{\cdot}$ 表示”后验”。

直觉：为什么要”批量”？

批量方法把 $K + 1$ 个时刻的所有状态 ${x_{0}, x_{1}, \dots, x_{K}}$ 打包成一个大向量，一次性求解。这就像考试交卷前可以回头修改答案——因为后面的测量数据也能”反向”修正早期的估计。相比之下，滤波器只能向前看，无法回头。

English

We model the robot’s evolution with two sets of equations.

Motion model: how the state changes over time:

$x_{k} = A_{k - 1} x_{k - 1} + v_{k} + w_{k}, w_{k} \sim N (0, Q_{k})$

Observation model: how the sensor perceives the state:

$y_{k} = C_{k} x_{k} + n_{k}, n_{k} \sim N (0, R_{k})$

$k = 0, 1, \dots, K$ are discrete time steps.
The prior on the initial state is $x_{0} \sim N (\overset{ˇ}{x}_{0}, \overset{ˇ}{P}_{0})$ .
Check notation ( $\overset{ˇ}{\cdot}$ ) = prior; hat notation ( $\hat{\cdot}$ ) = posterior.

The batch approach stacks all unknowns $x = [x_{0}^{T}, \dots, x_{K}^{T}]^{T}$ into one large vector and solves for all of them simultaneously, exploiting all measurements — past and future — at once.

3.1.2 堆叠成矩阵形式 / Stacking into Matrix Form

中文

为了将所有方程写成一个统一的矩阵形式，我们定义以下符号。

定义状态向量（stacked state）和测量向量（stacked measurement）：

$x = x_{0} x_{1} ⋮ x_{K}, y = y_{1} y_{2} ⋮ y_{K}$

把运动方程改写成误差形式（满足方程时误差为零）：

$e_{motion, k} = x_{k} - A_{k - 1} x_{k - 1} - v_{k} \approx w_{k} \sim N (0, Q_{k})$

把观测方程改写成误差形式：

$e_{obs, k} = y_{k} - C_{k} x_{k} \approx n_{k} \sim N (0, R_{k})$

将所有误差项堆叠，可以写成统一的线性形式（含初始先验）：

$z = H x + noise, noise \sim N (0, W)$

其中 $z$ 是堆叠的”伪测量”（包含先验和实际测量）， $H$ 是系统矩阵， $W$ 是块对角噪声协方差矩阵。

举例（K=2）：对于两步系统，堆叠向量为 $z = \overset{ˇ}{x}_{0} v_{1} v_{2} y_{1} y_{2}, H = 1 - A_{0} C_{1} 1 - A_{1} C_{2} 1, W = blkdiag (\overset{ˇ}{P}_{0}, Q_{1}, Q_{2}, R_{1}, R_{2})$

注意 $H$ 是稀疏矩阵（大部分元素为零），这对高效求解至关重要。

English

All equations can be compactly stacked as:

$z = H x + noise, noise \sim N (0, W)$

where $z$ is a stacked vector of pseudo-measurements (including the prior on $x_{0}$ ), $H$ is a tall, sparse system matrix encoding the motion and observation models, and $W = blkdiag (\overset{ˇ}{P}_{0}, Q_{1}, \dots, Q_{K}, R_{1}, \dots, R_{K})$ is the block-diagonal noise covariance.

3.1.3 MAP 估计——加权最小二乘 / MAP Estimation = Weighted Least Squares

中文

给定上述线性模型，MAP（最大后验）估计等价于加权最小二乘（Weighted Least Squares）：

$\hat{\mathbf{x}} = \arg\min_{\mathbf{x}}\; J(\mathbf{x}), \quad J(\mathbf{x}) = \frac{1}{2}(\mathbf{z} - \mathbf{H}\mathbf{x})^T \mathbf{W}^{-1} (\mathbf{z} - \mathbf{H}\mathbf{x}) \tag{3.3}$

直觉：为什么叫”加权”？

不同测量的精度不同。协方差 $W$ 中方差大的测量，乘以 $W^{- 1}$ 后权重变小；方差小（精度高）的测量，权重变大。这就像考试卷：必答题（精度高的传感器）权重更大。

对目标函数 $J (x)$ 求导并令其为零：

$\frac{\partial J}{\partial x ^{T}} = - H^{T} W^{- 1} (z - Hx) = 0$

整理得到法方程（Normal Equations）：

$\boxed{(\mathbf{H}^T \mathbf{W}^{-1} \mathbf{H})\,\hat{\mathbf{x}} = \mathbf{H}^T \mathbf{W}^{-1} \mathbf{z}} \tag{3.4}$

这是线性方程组，可以直接求解。矩阵 $H^{T} W^{- 1} H$ 称为信息矩阵（information matrix）。

English

The MAP estimate minimizes the weighted least-squares cost:

$J (x) = \frac{1}{2} (z - Hx)^{T} W^{- 1} (z - Hx)$

Setting $\partial J / \partial x^{T} = 0$ yields the normal equations:

$(H^{T} W^{- 1} H) \hat{x} = H^{T} W^{- 1} z$

The matrix $H^{T} W^{- 1} H$ is called the information matrix. Its inverse is the posterior covariance:

$\hat{\mathbf{P}} = (\mathbf{H}^T \mathbf{W}^{-1} \mathbf{H})^{-1} \tag{3.5}$

3.1.4 贝叶斯推断的视角 / Bayesian Inference Perspective

中文

上面的 MAP 方法直接优化目标函数，没有明确写出概率。贝叶斯推断给出了更完整的解释：

先验 × 似然 ∝ 后验：

$p (x ∣ y_{1 : K}) \propto p (x_{0}) \prod_{k = 1}^{K} p (x_{k} ∣ x_{k - 1}) \cdot p (y_{k} ∣ x_{k})$

由于所有分布都是高斯的，对数后验是二次型，其最大值（MAP）恰好就是后验均值。更重要的是：

线性高斯系统的黄金定理：后验分布也是精确的高斯分布 $p (x ∣ y_{1 : K}) = N (\hat{x}, \hat{P})$ 其中 $\hat{x}$ 由法方程给出， $\hat{P} = (H^{T} W^{- 1} H)^{- 1}$ 。

这意味着 MAP 解、最小均方误差解（MMSE）、后验均值三者完全相同——这是线性高斯情形独有的优良性质。

English

The Bayesian interpretation is powerful. For linear-Gaussian models, the full posterior is exactly Gaussian:

$p (x ∣ y_{1 : K}) = N (\hat{x}, \hat{P})$

This means the MAP estimate (mode of the posterior) equals the posterior mean (MMSE estimate). This coincidence is special to linear-Gaussian systems and disappears the moment nonlinearity enters.

3.1.5 可观测性 / Observability and Existence of Solution

中文

法方程有唯一解当且仅当 $H^{T} W^{- 1} H$ 可逆，即 $H$ 列满秩。

定义可观测性矩阵（Observability matrix）：

$O = C_{1} C_{2} A_{1} C_{3} A_{2} A_{1} ⋮$

定理（可观测性）：系统可观测（即所有状态可以被唯一估计）当且仅当 $rank (O) = N$ （等于状态维度）。

直觉：如果某个维度的状态对所有测量都没有影响（即测量对它”视而不见”），那么无论拿到多少数据，我们也无法估计它。这就好比问一个盲人”今天天空是什么颜色？“——他根本感知不到颜色，所以无法回答。

English

A unique solution exists iff $H^{T} W^{- 1} H$ is full rank (equivalently, $H$ has full column rank).

Theorem (Observability): The batch system is observable — all state components can be uniquely recovered — if and only if $rank (O) = N$ .

If a state dimension is invisible to all measurements (i.e., it never appears in any $C_{k}$ or can be inferred transitively), no amount of data will allow it to be estimated.

3.2 利用稀疏结构：Cholesky 平滑器 / Exploiting Sparsity: The Cholesky Smoother

3.2.1 信息矩阵的块三对角结构 / Block-Tridiagonal Information Matrix

中文

批量问题中，信息矩阵 $H^{T} W^{- 1} H$ 具有特殊的**块三对角（block-tridiagonal）**结构：

Chunibyo

Explorer

ch03_linear_gaussian

第三章　线性高斯估计

Chapter 3 Linear-Gaussian Estimation

3.1 批量离散时间估计 / Batch Discrete-Time Estimation

3.1.1 问题设置 / Problem Setup

3.1.2 堆叠成矩阵形式 / Stacking into Matrix Form

3.1.3 MAP 估计——加权最小二乘 / MAP Estimation = Weighted Least Squares

3.1.4 贝叶斯推断的视角 / Bayesian Inference Perspective

3.1.5 可观测性 / Observability and Existence of Solution

3.2 利用稀疏结构：Cholesky 平滑器 / Exploiting Sparsity: The Cholesky Smoother

3.2.1 信息矩阵的块三对角结构 / Block-Tridiagonal Information Matrix

Graph View

Table of Contents

Backlinks

Chunibyo

Explorer

ch03_linear_gaussian

第三章 线性高斯估计 §

Chapter 3 Linear-Gaussian Estimation §

3.1 批量离散时间估计 / Batch Discrete-Time Estimation §

3.1.1 问题设置 / Problem Setup §

3.1.2 堆叠成矩阵形式 / Stacking into Matrix Form §

3.1.3 MAP 估计——加权最小二乘 / MAP Estimation = Weighted Least Squares §

3.1.4 贝叶斯推断的视角 / Bayesian Inference Perspective §

3.1.5 可观测性 / Observability and Existence of Solution §

3.2 利用稀疏结构：Cholesky 平滑器 / Exploiting Sparsity: The Cholesky Smoother §

3.2.1 信息矩阵的块三对角结构 / Block-Tridiagonal Information Matrix §

Graph View

Table of Contents

Backlinks

第三章　线性高斯估计

Chapter 3 Linear-Gaussian Estimation

3.1 批量离散时间估计 / Batch Discrete-Time Estimation

3.1.1 问题设置 / Problem Setup

3.1.2 堆叠成矩阵形式 / Stacking into Matrix Form

3.1.3 MAP 估计——加权最小二乘 / MAP Estimation = Weighted Least Squares

3.1.4 贝叶斯推断的视角 / Bayesian Inference Perspective

3.1.5 可观测性 / Observability and Existence of Solution

3.2 利用稀疏结构：Cholesky 平滑器 / Exploiting Sparsity: The Cholesky Smoother

3.2.1 信息矩阵的块三对角结构 / Block-Tridiagonal Information Matrix