第三章 线性高斯估计
Chapter 3 Linear-Gaussian Estimation
本章概览 / Chapter Overview
本章研究最简洁的情形:运动模型和观测模型都是线性的,所有噪声都是高斯的。在这种情况下,估计问题存在精确的闭合解。我们将从批量(Batch)方法出发,逐步推导出递归滤波器(Kalman filter)和平滑器(RTS smoother),最后讨论连续时间下的高斯过程回归方法。
This chapter studies the simplest case: linear motion and observation models, Gaussian noise. Exact closed-form solutions exist. We begin with the batch approach, derive the Kalman filter and RTS smoother, and conclude with continuous-time Gaussian process (GP) regression.
3.1 批量离散时间估计 / Batch Discrete-Time Estimation
3.1.1 问题设置 / Problem Setup
中文
考虑一个机器人随时间演化的系统。我们用两类方程描述它:
运动模型(Motion model):描述状态如何随时间变化
\mathbf{x}_k = \mathbf{A}_{k-1}\,\mathbf{x}_{k-1} + \mathbf{v}_k + \mathbf{w}_k, \quad \mathbf{w}_k \sim \mathcal{N}(\mathbf{0},\,\mathbf{Q}_k) \tag{3.1}
观测模型(Observation model):描述传感器如何感知状态
\mathbf{y}_k = \mathbf{C}_k\,\mathbf{x}_k + \mathbf{n}_k, \quad \mathbf{n}_k \sim \mathcal{N}(\mathbf{0},\,\mathbf{R}_k) \tag{3.2}
其中:
- : 时刻的状态(位置、速度等)
- :状态转移矩阵(State transition matrix)
- :已知的输入/控制量(known input/control)
- :过程噪声(Process noise),协方差
- :传感器测量
- :观测矩阵(Observation matrix)
- :测量噪声(Measurement noise),协方差
另外,初始状态先验为 ,符号 表示”先验”, 表示”后验”。
直觉:为什么要”批量”?
批量方法把 个时刻的所有状态 打包成一个大向量,一次性求解。这就像考试交卷前可以回头修改答案——因为后面的测量数据也能”反向”修正早期的估计。相比之下,滤波器只能向前看,无法回头。
English
We model the robot’s evolution with two sets of equations.
Motion model: how the state changes over time:
Observation model: how the sensor perceives the state:
- are discrete time steps.
- The prior on the initial state is .
- Check notation () = prior; hat notation () = posterior.
The batch approach stacks all unknowns into one large vector and solves for all of them simultaneously, exploiting all measurements — past and future — at once.
3.1.2 堆叠成矩阵形式 / Stacking into Matrix Form
中文
为了将所有方程写成一个统一的矩阵形式,我们定义以下符号。
定义状态向量(stacked state)和测量向量(stacked measurement):
把运动方程改写成误差形式(满足方程时误差为零):
把观测方程改写成误差形式:
将所有误差项堆叠,可以写成统一的线性形式(含初始先验):
其中 是堆叠的”伪测量”(包含先验和实际测量), 是系统矩阵, 是块对角噪声协方差矩阵。
举例(K=2):对于两步系统,堆叠向量为
注意 是稀疏矩阵(大部分元素为零),这对高效求解至关重要。
English
All equations can be compactly stacked as:
where is a stacked vector of pseudo-measurements (including the prior on ), is a tall, sparse system matrix encoding the motion and observation models, and is the block-diagonal noise covariance.
3.1.3 MAP 估计——加权最小二乘 / MAP Estimation = Weighted Least Squares
中文
给定上述线性模型,MAP(最大后验)估计等价于加权最小二乘(Weighted Least Squares):
\hat{\mathbf{x}} = \arg\min_{\mathbf{x}}\; J(\mathbf{x}), \quad J(\mathbf{x}) = \frac{1}{2}(\mathbf{z} - \mathbf{H}\mathbf{x})^T \mathbf{W}^{-1} (\mathbf{z} - \mathbf{H}\mathbf{x}) \tag{3.3}
直觉:为什么叫”加权”?
不同测量的精度不同。协方差 中方差大的测量,乘以 后权重变小;方差小(精度高)的测量,权重变大。这就像考试卷:必答题(精度高的传感器)权重更大。
对目标函数 求导并令其为零:
整理得到法方程(Normal Equations):
\boxed{(\mathbf{H}^T \mathbf{W}^{-1} \mathbf{H})\,\hat{\mathbf{x}} = \mathbf{H}^T \mathbf{W}^{-1} \mathbf{z}} \tag{3.4}
这是线性方程组,可以直接求解。矩阵 称为信息矩阵(information matrix)。
English
The MAP estimate minimizes the weighted least-squares cost:
Setting yields the normal equations:
The matrix is called the information matrix. Its inverse is the posterior covariance:
\hat{\mathbf{P}} = (\mathbf{H}^T \mathbf{W}^{-1} \mathbf{H})^{-1} \tag{3.5}
3.1.4 贝叶斯推断的视角 / Bayesian Inference Perspective
中文
上面的 MAP 方法直接优化目标函数,没有明确写出概率。贝叶斯推断给出了更完整的解释:
先验 × 似然 ∝ 后验:
由于所有分布都是高斯的,对数后验是二次型,其最大值(MAP)恰好就是后验均值。更重要的是:
线性高斯系统的黄金定理:后验分布也是精确的高斯分布 其中 由法方程给出,。
这意味着 MAP 解、最小均方误差解(MMSE)、后验均值三者完全相同——这是线性高斯情形独有的优良性质。
English
The Bayesian interpretation is powerful. For linear-Gaussian models, the full posterior is exactly Gaussian:
This means the MAP estimate (mode of the posterior) equals the posterior mean (MMSE estimate). This coincidence is special to linear-Gaussian systems and disappears the moment nonlinearity enters.
3.1.5 可观测性 / Observability and Existence of Solution
中文
法方程有唯一解当且仅当 可逆,即 列满秩。
定义可观测性矩阵(Observability matrix):
定理(可观测性):系统可观测(即所有状态可以被唯一估计)当且仅当 (等于状态维度)。
直觉:如果某个维度的状态对所有测量都没有影响(即测量对它”视而不见”),那么无论拿到多少数据,我们也无法估计它。这就好比问一个盲人”今天天空是什么颜色?“——他根本感知不到颜色,所以无法回答。
English
A unique solution exists iff is full rank (equivalently, has full column rank).
Theorem (Observability): The batch system is observable — all state components can be uniquely recovered — if and only if .
If a state dimension is invisible to all measurements (i.e., it never appears in any or can be inferred transitively), no amount of data will allow it to be estimated.
3.2 利用稀疏结构:Cholesky 平滑器 / Exploiting Sparsity: The Cholesky Smoother
3.2.1 信息矩阵的块三对角结构 / Block-Tridiagonal Information Matrix
中文
批量问题中,信息矩阵 具有特殊的**块三对角(block-tridiagonal)**结构:
\mathbf{\Lambda}_{00} & \mathbf{\Lambda}_{01} & & \\ \mathbf{\Lambda}_{10} & \mathbf{\Lambda}_{11} & \mathbf{\Lambda}_{12} & \\ & \mathbf{\Lambda}_{21} & \ddots & \mathbf{\Lambda}_{K-1,K} \\ & & \mathbf{\Lambda}_{K,K-1} & \mathbf{\Lambda}_{KK} \end{bmatrix}$$ 这是因为运动方程只把相邻时刻 $k-1$ 和 $k$ 联系起来,观测方程只涉及 $k$ 时刻的状态——所以在矩阵中,每个状态只与它的直接邻居"耦合"。 > **稀疏的力量**: > - 密集矩阵($N(K+1) \times N(K+1)$)的 Cholesky 分解需要 $O(N^3 K^3)$ 操作。 > - 利用块三对角结构,只需 $O(N^3 K)$ 操作——当 $K$ 很大(数千个时间步)时,这是**质的**提升。 --- **English** The information matrix $\mathbf{H}^T\mathbf{W}^{-1}\mathbf{H}$ is **block-tridiagonal** because the motion model only couples adjacent timesteps and the observation model only involves $\mathbf{x}_k$. Exploiting this structure via sparse Cholesky decomposition reduces computational cost from $O(N^3 K^3)$ (dense) to $O(N^3 K)$ (sparse), a dramatic speedup for long trajectories. --- ### 3.2.2 Cholesky 平滑器推导 / Cholesky Smoother Derivation **中文** 对块三对角信息矩阵进行 Cholesky 分解 $\mathbf{H}^T\mathbf{W}^{-1}\mathbf{H} = \mathbf{L}\mathbf{L}^T$,$\mathbf{L}$ 是下三角块双对角矩阵: $$\mathbf{L} = \begin{bmatrix}\mathbf{L}_{00} & & & \\ \mathbf{L}_{10} & \mathbf{L}_{11} & & \\ & \mathbf{L}_{21} & \ddots & \\ & & & \mathbf{L}_{KK}\end{bmatrix}$$ 求解过程分两步: 1. **前向传递(Forward pass)**:$\mathbf{L}\,\mathbf{d} = \mathbf{H}^T\mathbf{W}^{-1}\mathbf{z}$,从 $k=0$ 到 $k=K$ 2. **后向传递(Backward pass)**:$\mathbf{L}^T \hat{\mathbf{x}} = \mathbf{d}$,从 $k=K$ 到 $k=0$ 前向传递的递推公式(以信息矩阵形式书写): $$\boldsymbol{\Sigma}_{k|k} = (\mathbf{C}_k^T\mathbf{R}_k^{-1}\mathbf{C}_k + \mathbf{Q}_k^{-1} - \mathbf{Q}_k^{-1}(\mathbf{A}_{k-1}\boldsymbol{\Sigma}_{k-1|k-1}\mathbf{A}_{k-1}^T + \mathbf{Q}_k)^{-1})^{-1}$$ 等价地,我们定义每个时刻的**局部信息(local information)**: $$\mathbf{\Lambda}_k = \mathbf{C}_k^T \mathbf{R}_k^{-1} \mathbf{C}_k, \quad \mathbf{b}_k = \mathbf{C}_k^T \mathbf{R}_k^{-1} \mathbf{y}_k$$ 然后前向(预测+更新)递推: $$\check{\mathbf{\Lambda}}_k = (\mathbf{A}_{k-1} \hat{\mathbf{\Lambda}}_{k-1}^{-1} \mathbf{A}_{k-1}^T + \mathbf{Q}_k)^{-1}, \quad \check{\mathbf{q}}_k = \check{\mathbf{\Lambda}}_k(\mathbf{A}_{k-1}\hat{\mathbf{\Lambda}}_{k-1}^{-1}\hat{\mathbf{q}}_{k-1} + \mathbf{v}_k) \tag{3.6}$$ $$\hat{\mathbf{\Lambda}}_k = \check{\mathbf{\Lambda}}_k + \mathbf{\Lambda}_k, \quad \hat{\mathbf{q}}_k = \check{\mathbf{q}}_k + \mathbf{b}_k \tag{3.7}$$ 初始条件:$\hat{\mathbf{\Lambda}}_0 = \check{\mathbf{P}}_0^{-1} + \mathbf{\Lambda}_0$,$\hat{\mathbf{q}}_0 = \check{\mathbf{P}}_0^{-1}\check{\mathbf{x}}_0 + \mathbf{b}_0$。 完成前向传递后,令 $\hat{\mathbf{x}}_K = \hat{\mathbf{\Lambda}}_K^{-1}\hat{\mathbf{q}}_K$,然后后向传递: $$\hat{\mathbf{x}}_k = \hat{\mathbf{\Lambda}}_k^{-1}(\hat{\mathbf{q}}_k - (-\mathbf{A}_k^T \check{\mathbf{\Lambda}}_{k+1} \hat{\mathbf{x}}_{k+1})), \quad k = K-1, \ldots, 0 \tag{3.8}$$ --- **English** The Cholesky smoother leverages the block-bidiagonal Cholesky factor of the block-tridiagonal information matrix. The algorithm has two passes: **Forward pass** (information propagation, $k = 0 \to K$): - Accumulate process noise and observation information at each step. - Compute the local information matrix $\hat{\mathbf{\Lambda}}_k$ and information vector $\hat{\mathbf{q}}_k$. **Backward pass** (state recovery, $k = K \to 0$): - Use $\hat{\mathbf{x}}_K = \hat{\mathbf{\Lambda}}_K^{-1}\hat{\mathbf{q}}_K$ as the boundary condition. - Back-substitute to recover all $\hat{\mathbf{x}}_k$. Total cost: $O(N^3 K)$. --- ### 3.2.3 后验协方差 / Posterior Covariance **中文** 后验协方差矩阵 $\hat{\mathbf{P}} = (\mathbf{H}^T\mathbf{W}^{-1}\mathbf{H})^{-1}$ 可以通过以下方式高效计算: 对角块 $\hat{\mathbf{P}}_{kk}$ 给出时刻 $k$ 的不确定性;相邻时刻的交叉协方差块 $\hat{\mathbf{P}}_{k,k+1}$ 也可在 $O(N^3 K)$ 时间内计算。 **注意**:虽然 $\hat{\mathbf{P}}$ 在概念上是完整的 $N(K+1)\times N(K+1)$ 矩阵,但通常我们只需要对角块和近对角块——完整矩阵的显式计算代价是 $O(N^2 K^2)$,通常应避免。 --- **English** The posterior covariance $\hat{\mathbf{P}} = (\mathbf{H}^T\mathbf{W}^{-1}\mathbf{H})^{-1}$ is a dense matrix in general, but only the diagonal blocks $\hat{\mathbf{P}}_{kk}$ and off-diagonal blocks $\hat{\mathbf{P}}_{k,k\pm 1}$ are needed for the smoother. These can be recovered in $O(N^3 K)$ via an additional backward pass on the Cholesky factors. --- ## 3.3 RTS 平滑器 / Rauch-Tung-Striebel (RTS) Smoother **中文** Cholesky 平滑器虽然高效,但其信息形式($\mathbf{\Lambda}$, $\mathbf{q}$)不够直观。赫伯特·雷奇(Rauch)、弗洛伊德·唐(Tung)和卡罗尔·斯特里贝尔(Striebel)在 1965 年推导了一个代数等价的**协方差形式**版本,更易于实现和理解。 **RTS 平滑器 = 前向 Kalman 滤波 + 后向平滑** **前向(预测-校正)**(见 §3.4 Kalman 滤波器): 运行标准 Kalman 滤波器,存储每个时刻的先验估计 $(\check{\mathbf{x}}_k, \check{\mathbf{P}}_k)$ 和后验估计 $(\hat{\mathbf{x}}_k, \hat{\mathbf{P}}_k)$。 **后向平滑传递**(从 $k = K-1$ 到 $k = 0$): $$\mathbf{G}_k = \hat{\mathbf{P}}_k \mathbf{A}_k^T \check{\mathbf{P}}_{k+1}^{-1} \tag{3.9a}$$ $$\hat{\mathbf{x}}_k^s = \hat{\mathbf{x}}_k + \mathbf{G}_k(\hat{\mathbf{x}}_{k+1}^s - \check{\mathbf{x}}_{k+1}) \tag{3.9b}$$ $$\hat{\mathbf{P}}_k^s = \hat{\mathbf{P}}_k + \mathbf{G}_k(\hat{\mathbf{P}}_{k+1}^s - \check{\mathbf{P}}_{k+1})\mathbf{G}_k^T \tag{3.9c}$$ 其中上标 $s$ 表示平滑后的估计,$\mathbf{G}_k$ 称为**平滑增益(Smoother gain)**。 > **直觉:前向滤波 + 后向平滑** > > 设想你在开车,只看前方(Kalman 滤波器)。但现在假设你可以看行车记录仪回放:在知道了整段旅途之后,你对早期位置的估计会更准。这就是平滑器的作用——用未来的数据修正过去的估计。 > > 平滑增益 $\mathbf{G}_k$ 的物理意义:它衡量了时刻 $k$ 的后验与时刻 $k+1$ 的先验之间的"相关性"。若 $k+1$ 时刻的先验不确定性 $\check{\mathbf{P}}_{k+1}$ 很大,说明传播噪声大,修正量 $\mathbf{G}_k$ 就小(未来不太可靠,少参考它)。 --- **English** The RTS smoother is algebraically equivalent to the Cholesky smoother but expressed in covariance form — more intuitive and numerically stable. **Algorithm:** 1. **Forward pass:** Run the Kalman filter forward in time ($k = 0 \to K$), storing all priors $(\check{\mathbf{x}}_k, \check{\mathbf{P}}_k)$ and posteriors $(\hat{\mathbf{x}}_k, \hat{\mathbf{P}}_k)$. 2. **Backward smoothing pass** ($k = K-1 \to 0$): $$\mathbf{G}_k = \hat{\mathbf{P}}_k \mathbf{A}_k^T \check{\mathbf{P}}_{k+1}^{-1} \quad \text{(smoother gain)}$$ $$\hat{\mathbf{x}}_k^s = \hat{\mathbf{x}}_k + \mathbf{G}_k(\hat{\mathbf{x}}_{k+1}^s - \check{\mathbf{x}}_{k+1})$$ $$\hat{\mathbf{P}}_k^s = \hat{\mathbf{P}}_k + \mathbf{G}_k(\hat{\mathbf{P}}_{k+1}^s - \check{\mathbf{P}}_{k+1})\mathbf{G}_k^T$$ The smoother uses future measurements (via $\hat{\mathbf{x}}_{k+1}^s$) to refine past estimates. --- ## 3.4 卡尔曼滤波器 / The Kalman Filter ### 3.4.1 什么是卡尔曼滤波器? / What Is the Kalman Filter? **中文** 卡尔曼滤波器(Kalman Filter)是人类历史上最成功的算法之一。它回答的问题是:**在只能向前看的情况下(即不使用未来测量),如何从噪声数据中最优地估计当前状态?** 卡尔曼滤波器是 RTS 平滑器的前向部分,也是 Cholesky 平滑器前向传递的协方差形式。它于 1960 年由鲁道夫·卡尔曼(Rudolf Kálmán)发表。 NASA 在阿波罗登月任务中采用了卡尔曼滤波器——阿波罗 11 号登月舱上的导航计算机融合惯性传感器和雷达数据,估计飞船在月面上方的精确位置。 --- **English** The Kalman filter is arguably one of the most important algorithms ever devised. It answers: *how do we optimally estimate the current state in real time (causal) from noisy measurements?* It is the forward pass of the RTS smoother, and Nobel-Prize-adjacent in its importance. Rudolf Kálmán published it in 1960; NASA adopted it for Apollo almost immediately. --- ### 3.4.2 五个方程 / The Five Equations **中文** 卡尔曼滤波器由**五个核心方程**组成,分两个阶段交替进行。 **阶段一:预测(Prediction)**——用运动模型从 $k-1$ 预测到 $k$ $$\check{\mathbf{x}}_k = \mathbf{A}_{k-1}\hat{\mathbf{x}}_{k-1} + \mathbf{v}_k \tag{3.10a}$$ $$\check{\mathbf{P}}_k = \mathbf{A}_{k-1}\hat{\mathbf{P}}_{k-1}\mathbf{A}_{k-1}^T + \mathbf{Q}_k \tag{3.10b}$$ **阶段二:校正(Correction/Update)**——用新测量 $\mathbf{y}_k$ 更新估计 $$\mathbf{K}_k = \check{\mathbf{P}}_k \mathbf{C}_k^T(\mathbf{C}_k\check{\mathbf{P}}_k\mathbf{C}_k^T + \mathbf{R}_k)^{-1} \tag{3.10c}$$ $$\hat{\mathbf{x}}_k = \check{\mathbf{x}}_k + \mathbf{K}_k(\mathbf{y}_k - \mathbf{C}_k\check{\mathbf{x}}_k) \tag{3.10d}$$ $$\hat{\mathbf{P}}_k = (\mathbf{1} - \mathbf{K}_k\mathbf{C}_k)\check{\mathbf{P}}_k \tag{3.10e}$$ 其中 $\mathbf{K}_k$ 称为**卡尔曼增益(Kalman gain)**,$\mathbf{y}_k - \mathbf{C}_k\check{\mathbf{x}}_k$ 称为**新息(innovation)**——测量值与预测值的差。 > **每个方程的物理含义**: > > | 方程 | 含义 | > |------|------| > | (3.10a) 预测均值 | 用运动方程把上一步后验往前推一步 | > | (3.10b) 预测协方差 | 运动带来不确定性增加($\mathbf{Q}_k$ 项) | > | (3.10c) 卡尔曼增益 | 决定"信任测量还是信任预测" | > | (3.10d) 校正均值 | 在预测基础上,按增益修正新息 | > | (3.10e) 校正协方差 | 更新减少了不确定性 | --- **English** The Kalman filter alternates between two steps: **Prediction** (propagate state forward): $$\check{\mathbf{x}}_k = \mathbf{A}_{k-1}\hat{\mathbf{x}}_{k-1} + \mathbf{v}_k$$ $$\check{\mathbf{P}}_k = \mathbf{A}_{k-1}\hat{\mathbf{P}}_{k-1}\mathbf{A}_{k-1}^T + \mathbf{Q}_k$$ **Correction** (incorporate new measurement $\mathbf{y}_k$): $$\mathbf{K}_k = \check{\mathbf{P}}_k \mathbf{C}_k^T(\mathbf{C}_k\check{\mathbf{P}}_k\mathbf{C}_k^T + \mathbf{R}_k)^{-1} \quad \text{(Kalman gain)}$$ $$\hat{\mathbf{x}}_k = \check{\mathbf{x}}_k + \mathbf{K}_k\underbrace{(\mathbf{y}_k - \mathbf{C}_k\check{\mathbf{x}}_k)}_{\text{innovation}}$$ $$\hat{\mathbf{P}}_k = (\mathbf{1} - \mathbf{K}_k\mathbf{C}_k)\check{\mathbf{P}}_k$$ --- ### 3.4.3 卡尔曼增益的三种理解 / Three Ways to Understand Kalman Gain **中文** 卡尔曼增益是整个算法的核心,可以从三个角度理解: **角度 1:MAP 推导** 将校正步骤视为两个高斯分布(先验 + 似然)的乘积: $$p(\mathbf{x}_k \mid \mathbf{y}_k) \propto \mathcal{N}(\check{\mathbf{x}}_k, \check{\mathbf{P}}_k) \cdot \mathcal{N}(\mathbf{C}_k\mathbf{x}_k; \mathbf{y}_k, \mathbf{R}_k)$$ 由第二章的高斯条件分布公式,MAP 解即为上述五个方程。 **角度 2:贝叶斯推断(条件高斯)** 定义联合分布 $$\begin{bmatrix}\mathbf{x}_k \\ \mathbf{y}_k\end{bmatrix} \sim \mathcal{N}\left(\begin{bmatrix}\check{\mathbf{x}}_k \\ \mathbf{C}_k\check{\mathbf{x}}_k\end{bmatrix},\; \begin{bmatrix}\check{\mathbf{P}}_k & \check{\mathbf{P}}_k\mathbf{C}_k^T \\ \mathbf{C}_k\check{\mathbf{P}}_k & \mathbf{C}_k\check{\mathbf{P}}_k\mathbf{C}_k^T + \mathbf{R}_k\end{bmatrix}\right)$$ 利用条件高斯公式(第二章),$p(\mathbf{x}_k \mid \mathbf{y}_k)$ 的均值和协方差恰好就是上述五个方程。 **角度 3:最优增益(MMSE 最小化)** 将校正均值写成 $\hat{\mathbf{x}}_k = \check{\mathbf{x}}_k + \mathbf{K}_k(\mathbf{y}_k - \mathbf{C}_k\check{\mathbf{x}}_k)$,对任意增益 $\mathbf{K}_k$ 计算均方误差 $$\text{MSE} = E[\|\hat{\mathbf{x}}_k - \mathbf{x}_k\|^2] = \text{tr}(\hat{\mathbf{P}}_k)$$ 对 $\mathbf{K}_k$ 求导令其为零,得到上述卡尔曼增益公式。因此卡尔曼滤波器也是**最佳线性无偏估计器(BLUE,Best Linear Unbiased Estimator)**。 --- **English** **Derivation 1 (MAP):** The correction step is the product of two Gaussians: the prior $\mathcal{N}(\check{\mathbf{x}}_k, \check{\mathbf{P}}_k)$ and the likelihood $\mathcal{N}(\mathbf{C}_k\mathbf{x}_k; \mathbf{y}_k, \mathbf{R}_k)$. MAP finds the peak of the product, leading directly to the five equations. **Derivation 2 (Bayesian):** Writing the joint distribution of $(\mathbf{x}_k, \mathbf{y}_k)$ as Gaussian and applying the conditional Gaussian formula yields the same result. **Derivation 3 (MMSE/BLUE):** The Kalman gain $\mathbf{K}_k$ minimizes the trace of the posterior covariance $\hat{\mathbf{P}}_k = (\mathbf{1} - \mathbf{K}_k\mathbf{C}_k)\check{\mathbf{P}}_k$ over all linear estimators. This proves the Kalman filter is the **Best Linear Unbiased Estimator (BLUE)** — no other linear estimator can have lower MSE. --- ### 3.4.4 卡尔曼增益的极端情形 / Extreme Cases of the Kalman Gain **中文** 理解卡尔曼增益最简单的方法是看两种极端情形: **情形 1:测量非常精确($\mathbf{R}_k \to \mathbf{0}$)** $$\mathbf{K}_k \to \check{\mathbf{P}}_k\mathbf{C}_k^T(\mathbf{C}_k\check{\mathbf{P}}_k\mathbf{C}_k^T)^{-1} = \mathbf{C}_k^{-T}$$ (当 $\mathbf{C}_k$ 方阵可逆时)此时 $\hat{\mathbf{x}}_k = \mathbf{C}_k^{-1}\mathbf{y}_k$——完全相信测量,忽略预测。 **情形 2:测量非常噪声($\mathbf{R}_k \to \infty$)** $$\mathbf{K}_k \to \mathbf{0}$$ 此时 $\hat{\mathbf{x}}_k = \check{\mathbf{x}}_k$——完全相信预测,忽略测量。 > **增益是"信任度"的平衡**:$\mathbf{K}_k$ 在"相信传感器"和"相信运动模型"之间寻找最优平衡点。 --- **English** - **Perfect sensor** ($\mathbf{R}_k \to \mathbf{0}$): $\mathbf{K}_k \to \mathbf{C}_k^{-T}$, i.e., we trust the measurement completely. - **Very noisy sensor** ($\mathbf{R}_k \to \infty$): $\mathbf{K}_k \to \mathbf{0}$, i.e., we trust the prediction completely. The Kalman gain is thus an automatic, data-driven weighting between the motion model and the sensor measurement. --- ### 3.4.5 误差动力学与一致性 / Error Dynamics and Consistency **中文** 定义状态估计误差 $\tilde{\mathbf{x}}_k = \hat{\mathbf{x}}_k - \mathbf{x}_k$。 **无偏性(Unbiasedness)**:可以证明,只要初始估计无偏($E[\tilde{\mathbf{x}}_0] = \mathbf{0}$),则所有后续估计均无偏: $$E[\tilde{\mathbf{x}}_k] = \mathbf{0}, \quad \forall k$$ **一致性(Consistency)**:估计器是一致的(consistent),当且仅当真实误差协方差等于计算所得的协方差: $$E[\tilde{\mathbf{x}}_k\tilde{\mathbf{x}}_k^T] = \hat{\mathbf{P}}_k$$ 一致性意味着滤波器"知道自己有多不确定"。如果 $E[\tilde{\mathbf{x}}_k\tilde{\mathbf{x}}_k^T] > \hat{\mathbf{P}}_k$(真实误差比滤波器以为的大),则滤波器**过于乐观(overconfident)**,这在实际应用中是常见问题。 **裂隙不等式(CRLB)**:卡尔曼滤波器达到 Cramér-Rao 下界(CRLB),即它是在所有无偏估计器中方差最小的——这是"最优"的最强意义。 --- **English** Define the estimation error $\tilde{\mathbf{x}}_k = \hat{\mathbf{x}}_k - \mathbf{x}_k$. **Unbiasedness:** If initialized with an unbiased estimate, all subsequent Kalman filter estimates are unbiased: $E[\tilde{\mathbf{x}}_k] = \mathbf{0}$. **Consistency:** The filter is consistent if the reported covariance matches the true error covariance: $E[\tilde{\mathbf{x}}_k\tilde{\mathbf{x}}_k^T] = \hat{\mathbf{P}}_k$. An inconsistent filter that *underestimates* its uncertainty is called **overconfident** — a common practical failure mode. **Optimality (CRLB):** The Kalman filter achieves the Cramér-Rao Lower Bound — it is efficient (minimum-variance) among all unbiased estimators for linear-Gaussian systems. --- ### 3.4.6 可观测性与稳定性 / Observability and Stability **中文** 卡尔曼滤波器的长期行为与系统可观测性密切相关: - 如果系统**可观测**($\text{rank}(\mathbf{O}) = N$),则不论初始协方差 $\check{\mathbf{P}}_0$ 多大,滤波器的协方差 $\hat{\mathbf{P}}_k$ 都会**收敛到唯一的稳态值**(不依赖初始条件)。 - 如果系统**不可观测**,某些状态的不确定性将永远无法减小。 稳态卡尔曼增益和稳态协方差可以通过**离散代数 Riccati 方程(DARE)**预先计算,避免在线计算——这是实际系统(如飞行控制)中常用的工程手段。 --- **English** **Stability:** If the system is observable and the noise matrices are positive definite, the Kalman filter covariance converges to a steady-state value independent of $\check{\mathbf{P}}_0$. This steady-state covariance satisfies the **Discrete Algebraic Riccati Equation (DARE)**. In practice, the steady-state gain is pre-computed offline, giving a time-invariant filter with guaranteed stability. --- ## 3.5 连续时间估计:高斯过程回归 / Continuous-Time Estimation via Gaussian Process Regression ### 3.5.1 问题动机 / Motivation **中文** 到目前为止,我们处理的都是离散时间系统:状态只在 $t_0, t_1, \ldots, t_K$ 这些固定时刻有定义。但现实中,机器人的轨迹是**连续时间**的——传感器可能以不等间隔频率采样,或者我们需要在测量时刻之间**内插**(interpolate)状态。 > **类比**:想象你正在记录一段音乐,但你只有几个采样点。你如何估计两个采样点之间的音符?这就是内插问题。在机器人学中,你只有几个时刻的 GPS/IMU 读数,但需要估计任意时刻的位置。 **高斯过程(Gaussian Process,GP)**提供了一个优雅的框架:将整条轨迹 $\mathbf{x}(t)$ 视为一个随机过程,用"先验运动模型"来约束它,然后用测量数据来"后验"更新整条轨迹。 --- **English** Discrete-time methods fix the state at discrete time steps. But robot trajectories are fundamentally continuous-time, and we may need to query the state at arbitrary times between measurements. **Gaussian Process (GP) Regression** treats the entire trajectory $\mathbf{x}(t)$ as a stochastic process, defines a prior over it via a continuous-time motion model (an SDE), and updates the trajectory using discrete measurements. --- ### 3.5.2 连续时间运动模型 / Continuous-Time Motion Model **中文** 连续时间运动模型为线性时变随机微分方程(LTV SDE): $$\dot{\mathbf{x}}(t) = \mathbf{A}(t)\mathbf{x}(t) + \mathbf{B}(t)\mathbf{u}(t) + \mathbf{L}(t)\mathbf{w}(t) \tag{3.11}$$ 其中 $\mathbf{w}(t) \sim \mathcal{GP}(\mathbf{0}, \mathbf{Q}\delta(t-t'))$ 是白噪声过程,$\delta(\cdot)$ 是狄拉克 delta 函数,$\mathbf{Q}$ 是功率谱密度(power spectral density)矩阵。 **状态转移矩阵(Transition matrix)**$\boldsymbol{\Phi}(t,s)$ 满足: $$\frac{d}{dt}\boldsymbol{\Phi}(t,s) = \mathbf{A}(t)\boldsymbol{\Phi}(t,s), \quad \boldsymbol{\Phi}(s,s) = \mathbf{1}$$ **均值函数(Mean function)**: $$\check{\mathbf{x}}(t) = \boldsymbol{\Phi}(t,t_0)\check{\mathbf{x}}_0 + \int_{t_0}^t \boldsymbol{\Phi}(t,s)\mathbf{B}(s)\mathbf{u}(s)\,ds \tag{3.12}$$ **协方差函数(Covariance function)**:对 $t \le t'$, $$\check{\mathbf{P}}(t, t') = \boldsymbol{\Phi}(t, t_j)\left(\sum_{n=0}^j \boldsymbol{\Phi}(t_j, t_n)\mathbf{Q}_n\boldsymbol{\Phi}(t_j, t_n)^T\right)\boldsymbol{\Phi}(t', t_j)^T, \quad t_j \le t \le t' \tag{3.13}$$ 其中 $\mathbf{Q}_n = \int_0^{\Delta t_{n:n-1}} \boldsymbol{\Phi}(\Delta t, s)\mathbf{L}(s)\mathbf{Q}\mathbf{L}(s)^T\boldsymbol{\Phi}(\Delta t, s)^T\,ds$ 是离散化过程噪声协方差。 --- **English** The continuous-time motion model is a linear time-varying (LTV) stochastic differential equation (SDE): $$\dot{\mathbf{x}}(t) = \mathbf{A}(t)\mathbf{x}(t) + \mathbf{B}(t)\mathbf{u}(t) + \mathbf{L}(t)\mathbf{w}(t)$$ with $\mathbf{w}(t) \sim \mathcal{GP}(\mathbf{0}, \mathbf{Q}\delta(t-t'))$. The solution defines a Gaussian process prior over the trajectory $\mathbf{x}(t)$, fully characterized by its mean function $\check{\mathbf{x}}(t)$ and covariance function $\check{\mathbf{P}}(t,t')$. --- ### 3.5.3 稀疏 GP 先验与块三对角结构 / Sparse GP Prior and Block-Tridiagonal Structure **中文** 如果将 GP 离散化到测量时刻 $t_0, t_1, \ldots, t_K$,协方差矩阵为: $$\check{\mathbf{P}} = \mathbf{A}\mathbf{Q}\mathbf{A}^T \tag{3.14}$$ 其中 $\mathbf{A}$ 是前文定义的下三角块双对角矩阵,$\mathbf{Q} = \text{blkdiag}(\check{\mathbf{P}}_0, \mathbf{Q}_1, \ldots, \mathbf{Q}_K)$。 > **关键结论**:$\check{\mathbf{P}}^{-1}$ 是**块三对角矩阵**! > > $$\check{\mathbf{P}}^{-1} = \mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{A}^{-1}$$ > > 这是因为 $\mathbf{A}^{-1}$ 是块双对角的(下三角,仅有主对角和次对角非零),$\mathbf{Q}^{-1}$ 是块对角的,所以它们的乘积是块三对角的。 这个稀疏性使得 GP 批量估计与离散时间批量估计完全一致,都可以用 $O(K)$ 复杂度的稀疏求解器高效求解。 --- **English** Discretizing the GP prior at measurement times gives $\check{\mathbf{P}} = \mathbf{A}\mathbf{Q}\mathbf{A}^T$. Crucially, the **precision matrix** (inverse covariance): $$\check{\mathbf{P}}^{-1} = \mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{A}^{-1}$$ is **block-tridiagonal** — the same sparsity pattern as the discrete-time information matrix. This is the key connection: the continuous-time GP regression and discrete-time batch estimation are algebraically identical at the measurement times. --- ### 3.5.4 GP 查询:在任意时刻内插状态 / GP Querying: Interpolation at Arbitrary Times **中文** GP 最强大的功能是:在求解完 $\hat{\mathbf{x}}_{0:K}$ 之后,我们可以以 $O(1)$ 的代价查询任意时刻 $\tau \in (t_k, t_{k+1})$ 的状态估计。 后验均值和协方差为: $$\hat{\mathbf{x}}(\tau) = \check{\mathbf{x}}(\tau) + \boldsymbol{\Lambda}(\tau)(\hat{\mathbf{x}}_k - \check{\mathbf{x}}_k) + \boldsymbol{\Psi}(\tau)(\hat{\mathbf{x}}_{k+1} - \check{\mathbf{x}}_{k+1}) \tag{3.15a}$$ $$\hat{\mathbf{P}}(\tau,\tau) = \check{\mathbf{P}}(\tau,\tau) + \begin{bmatrix}\boldsymbol{\Lambda}(\tau) & \boldsymbol{\Psi}(\tau)\end{bmatrix} \left(\begin{bmatrix}\hat{\mathbf{P}}_{k,k} & \hat{\mathbf{P}}_{k,k+1} \\ \hat{\mathbf{P}}_{k+1,k} & \hat{\mathbf{P}}_{k+1,k+1}\end{bmatrix} - \begin{bmatrix}\check{\mathbf{P}}(t_k,t_k) & \check{\mathbf{P}}(t_k,t_{k+1}) \\ \check{\mathbf{P}}(t_{k+1},t_k) & \check{\mathbf{P}}(t_{k+1},t_{k+1})\end{bmatrix}\right)\begin{bmatrix}\boldsymbol{\Lambda}(\tau)^T \\ \boldsymbol{\Psi}(\tau)^T\end{bmatrix} \tag{3.15b}$$ 其中插值权重为: $$\boldsymbol{\Lambda}(\tau) = \boldsymbol{\Phi}(\tau, t_k) - \mathbf{Q}_\tau\boldsymbol{\Phi}(t_{k+1}, \tau)^T\mathbf{Q}_{k+1}^{-1}\boldsymbol{\Phi}(t_{k+1}, t_k)$$ $$\boldsymbol{\Psi}(\tau) = \mathbf{Q}_\tau\boldsymbol{\Phi}(t_{k+1}, \tau)^T\mathbf{Q}_{k+1}^{-1}$$ $$\mathbf{Q}_\tau = \int_{t_k}^{\tau} \boldsymbol{\Phi}(\tau,s)\mathbf{L}(s)\mathbf{Q}\mathbf{L}(s)^T\boldsymbol{\Phi}(\tau,s)^T\,ds$$ 注意:查询只涉及邻近的两个节点 $t_k$ 和 $t_{k+1}$,所以是 $O(1)$。 --- **English** After solving for the posterior at all measurement times, the GP framework allows querying at any $\tau \in (t_k, t_{k+1})$ at $O(1)$ cost: $$\hat{\mathbf{x}}(\tau) = \check{\mathbf{x}}(\tau) + \boldsymbol{\Lambda}(\tau)(\hat{\mathbf{x}}_k - \check{\mathbf{x}}_k) + \boldsymbol{\Psi}(\tau)(\hat{\mathbf{x}}_{k+1} - \check{\mathbf{x}}_{k+1})$$ The interpolation is a linear combination of the corrections at just the two neighboring nodes $t_k$ and $t_{k+1}$. --- ### 3.5.5 线性时不变情形与三次 Hermite 插值 / LTI Case and Cubic Hermite Interpolation **中文** 当运动模型为线性时不变(LTI)时,转移矩阵为 $\boldsymbol{\Phi}(t,s) = \exp(\mathbf{A}(t-s))$,计算大大简化。 **例子(常速度模型)**:设加速度为白噪声 $\ddot{\mathbf{p}}(t) = \mathbf{w}(t)$,状态为 $\mathbf{x}(t) = [\mathbf{p}(t)^T, \dot{\mathbf{p}}(t)^T]^T$,则 $$\mathbf{A} = \begin{bmatrix}\mathbf{0} & \mathbf{1} \\ \mathbf{0} & \mathbf{0}\end{bmatrix}, \quad \boldsymbol{\Phi}(\Delta t) = \begin{bmatrix}\mathbf{1} & \Delta t\mathbf{1} \\ \mathbf{0} & \mathbf{1}\end{bmatrix}$$ 代入 GP 查询公式,位置分量的插值恰好是**三次 Hermite 多项式插值(Cubic Hermite Interpolation)**: $$\hat{\mathbf{p}}_\tau - \check{\mathbf{p}}_\tau = h_{00}(\alpha)(\hat{\mathbf{p}}_k - \check{\mathbf{p}}_k) + h_{10}(\alpha)T(\hat{\dot{\mathbf{p}}}_k - \check{\dot{\mathbf{p}}}_k) + h_{01}(\alpha)(\hat{\mathbf{p}}_{k+1} - \check{\mathbf{p}}_{k+1}) + h_{11}(\alpha)T(\hat{\dot{\mathbf{p}}}_{k+1} - \check{\dot{\mathbf{p}}}_{k+1})$$ 其中 $\alpha = (\tau - t_k)/T \in [0,1]$,$T = t_{k+1} - t_k$,Hermite 基函数为: $$h_{00}(\alpha) = 1 - 3\alpha^2 + 2\alpha^3, \quad h_{10}(\alpha) = \alpha - 2\alpha^2 + \alpha^3$$ $$h_{01}(\alpha) = 3\alpha^2 - 2\alpha^3, \quad h_{11}(\alpha) = -\alpha^2 + \alpha^3$$ 这个结果非常优美:**三次样条插值自动地从 GP 先验中涌现出来**,无需人为指定插值方案。 --- **English** For the LTI constant-velocity model ($\ddot{\mathbf{p}} = \mathbf{w}$), the GP interpolation formula for position reduces to exactly the **cubic Hermite polynomial interpolation**: $$\hat{\mathbf{p}}_\tau - \check{\mathbf{p}}_\tau = h_{00}(\alpha)\Delta\hat{\mathbf{p}}_k + h_{10}(\alpha)T\Delta\hat{\dot{\mathbf{p}}}_k + h_{01}(\alpha)\Delta\hat{\mathbf{p}}_{k+1} + h_{11}(\alpha)T\Delta\hat{\dot{\mathbf{p}}}_{k+1}$$ with $\alpha \in [0,1]$ the fractional position between $t_k$ and $t_{k+1}$. This is remarkable: cubic Hermite splines emerge automatically from the physics-based GP prior — no ad-hoc interpolation scheme needs to be chosen. --- ### 3.5.6 与批量离散估计的等价性 / Equivalence to Batch Discrete-Time Estimation **中文** 将 GP 先验代入批量优化问题: $$\hat{\mathbf{x}} = \arg\min_{\mathbf{x}}\;\frac{1}{2}(\check{\mathbf{x}} - \mathbf{x})^T\check{\mathbf{P}}^{-1}(\check{\mathbf{x}} - \mathbf{x}) + \frac{1}{2}(\mathbf{y} - \mathbf{C}\mathbf{x})^T\mathbf{R}^{-1}(\mathbf{y} - \mathbf{C}\mathbf{x})$$ 由于 $\check{\mathbf{P}}^{-1} = \mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{A}^{-1}$ 是块三对角的,法方程为: $$\underbrace{(\mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{A}^{-1} + \mathbf{C}^T\mathbf{R}^{-1}\mathbf{C})}_{\text{block-tridiagonal}}\hat{\mathbf{x}} = \mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{v} + \mathbf{C}^T\mathbf{R}^{-1}\mathbf{y}$$ 这与离散时间批量问题的法方程**完全相同**! **总结**:连续时间 GP 估计在测量时刻与离散时间批量估计完全等价;两者都可以用 $O(K)$ 的 Cholesky 平滑器/RTS 平滑器高效求解;而 Kalman 滤波器是其前向传递部分。 图 3.7(原书)总结了各种线性高斯估计范式之间的关系: ``` 连续时间 ┌─────────────────────────────┐ │ 批量(GP回归) │ │ ↕ 等价 │ │ 递推(Kalman-Bucy 滤波器) │ └─────────────────────────────┘ ↕ 离散化等价 ┌─────────────────────────────┐ │ 离散时间 │ │ 批量(加权最小二乘) │ │ ↕ 稀疏 Cholesky │ │ 递推平滑(RTS smoother) │ │ ↕ 只取前向部分 │ │ 递推滤波(Kalman filter) │ └─────────────────────────────┘ ``` --- **English** Substituting the GP prior into the batch optimization yields a system with the **same block-tridiagonal structure** as discrete-time batch estimation: $$\underbrace{(\mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{A}^{-1} + \mathbf{C}^T\mathbf{R}^{-1}\mathbf{C})}_{\text{block-tridiagonal}}\hat{\mathbf{x}} = \mathbf{A}^{-T}\mathbf{Q}^{-1}\mathbf{v} + \mathbf{C}^T\mathbf{R}^{-1}\mathbf{y}$$ The continuous-time GP approach is **exactly equivalent** to discrete-time batch estimation at the measurement times. Both can be solved in $O(N^3 K)$ time using the Cholesky or RTS smoother, whose forward pass is the Kalman filter. --- ## 3.6 连续时间递推滤波:Kalman-Bucy 滤波器 / Kalman-Bucy Filter **中文** 前面讨论的方法都在测量时刻有限个离散点上工作。历史上还有一个优雅的结果:**Kalman-Bucy 滤波器**(1961 年),它处理连续时间测量流。 运动模型仍为 LTV SDE(3.11),但观测模型也是连续的: $$\mathbf{y}(t) = \mathbf{C}(t)\mathbf{x}(t) + \mathbf{n}(t), \quad \mathbf{n}(t) \sim \mathcal{GP}(\mathbf{0}, \mathbf{R}(t))$$ Kalman-Bucy 滤波器由两个微分方程组成: **均值方程**: $$\dot{\hat{\mathbf{x}}}(t) = \mathbf{A}(t)\hat{\mathbf{x}}(t) + \mathbf{B}(t)\mathbf{u}(t) + \mathbf{K}(t)(\mathbf{y}(t) - \mathbf{C}(t)\hat{\mathbf{x}}(t)) \tag{3.16a}$$ **协方差方程(Riccati 微分方程)**: $$\dot{\hat{\mathbf{P}}}(t) = \mathbf{A}(t)\hat{\mathbf{P}}(t) + \hat{\mathbf{P}}(t)\mathbf{A}(t)^T + \mathbf{L}(t)\mathbf{Q}(t)\mathbf{L}(t)^T - \mathbf{K}(t)\mathbf{R}(t)\mathbf{K}(t)^T \tag{3.16b}$$ 其中连续时间卡尔曼增益: $$\mathbf{K}(t) = \hat{\mathbf{P}}(t)\mathbf{C}(t)^T\mathbf{R}(t)^{-1} \tag{3.16c}$$ > **直觉**: > - 均值方程 (3.16a) 是运动模型 + 连续修正项(类比离散 KF 的 predict+correct) > - 协方差方程 (3.16b) 中的 $+\mathbf{L}\mathbf{Q}\mathbf{L}^T$ 项是过程噪声带来的增量,$-\mathbf{K}\mathbf{R}\mathbf{K}^T$ 是测量带来的减量 > - 这两项之间的平衡决定了稳态协方差 虽然在实际应用中传感器是离散采样的(不是真正的连续流),Kalman-Bucy 滤波器仍是估计理论的重要里程碑。 --- **English** The **Kalman-Bucy filter** (Kalman and Bucy, 1961) is the continuous-time version of the Kalman filter, handling a continuous stream of measurements. **Mean ODE:** $$\dot{\hat{\mathbf{x}}}(t) = \mathbf{A}(t)\hat{\mathbf{x}}(t) + \mathbf{B}(t)\mathbf{u}(t) + \mathbf{K}(t)(\mathbf{y}(t) - \mathbf{C}(t)\hat{\mathbf{x}}(t))$$ **Covariance ODE (Riccati equation):** $$\dot{\hat{\mathbf{P}}}(t) = \mathbf{A}(t)\hat{\mathbf{P}}(t) + \hat{\mathbf{P}}(t)\mathbf{A}(t)^T + \mathbf{L}(t)\mathbf{Q}(t)\mathbf{L}(t)^T - \mathbf{K}(t)\mathbf{R}(t)\mathbf{K}(t)^T$$ where $\mathbf{K}(t) = \hat{\mathbf{P}}(t)\mathbf{C}(t)^T\mathbf{R}(t)^{-1}$. The process-noise term $+\mathbf{L}\mathbf{Q}\mathbf{L}^T$ inflates covariance, and the measurement term $-\mathbf{K}\mathbf{R}\mathbf{K}^T$ deflates it; their balance determines steady-state uncertainty. --- ## 3.7 本章小结 / Chapter Summary **中文** | 方法 | 最优性 | 复杂度 | 适用场景 | |------|--------|--------|----------| | 批量加权最小二乘 | 精确(线性高斯下全局最优) | $O(N^3 K)$(稀疏) | 离线、全轨迹 | | Cholesky 平滑器 | 精确(等价批量) | $O(N^3 K)$ | 离线、数值稳定 | | RTS 平滑器 | 精确(等价批量) | $O(N^3 K)$ | 离线、协方差形式 | | Kalman 滤波器 | 因果最优(BLUE/MMSE) | $O(N^3)$/步 | 在线、实时 | | GP 回归(批量) | 精确 | $O(N^3 K)$ | 连续时间轨迹 | | Kalman-Bucy 滤波器 | 因果最优(连续时间) | 积分 ODE | 理论参考 | **三条核心结论**: 1. **线性高斯后验是精确高斯的**:MAP = 后验均值 = MMSE,三者合一。 2. **块三对角结构是高效求解的关键**:无论是离散还是连续时间,都能用 $O(K)$ 复杂度求解。 3. **连续与离散等价**:GP 批量估计在测量时刻与离散批量估计精确等价;RTS 平滑器的前向传递就是 Kalman 滤波器。 --- **English** **Three core takeaways:** 1. **The linear-Gaussian posterior is exactly Gaussian.** The MAP estimate equals the posterior mean equals the MMSE estimate — a happy coincidence that disappears with nonlinearity. 2. **Block-tridiagonal sparsity is the key to efficiency.** Both discrete-time and continuous-time linear-Gaussian problems share this structure, enabling $O(N^3 K)$ solution. 3. **Continuous-time and discrete-time are equivalent at measurement times.** GP batch estimation with SDE priors is identical to discrete-time batch estimation; the RTS smoother forward pass is the Kalman filter. --- ## 习题 / Exercises **3.1** 考虑一维离散时间系统: $$x_k = x_{k-1} + v_k + w_k, \quad w_k \sim \mathcal{N}(0, Q)$$ $$y_k = x_k + n_k, \quad n_k \sim \mathcal{N}(0, R)$$ 设 $K=5$,初始状态先验未知($\check{P}_0 \to \infty$)。写出批量最小二乘的矩阵 $\mathbf{H}$, $\mathbf{W}$, $\mathbf{z}$。问:解是否唯一? *Consider a 1D discrete-time system with $K=5$ steps. Derive $\mathbf{H}$, $\mathbf{W}$, $\mathbf{z}$ for the batch least-squares formulation. Is the solution unique?* **3.2** 沿用 3.1 的系统,设 $Q=R=1$,验证信息矩阵为三对角矩阵,写出其 Cholesky 因子的稀疏模式。 *With $Q=R=1$, verify the information matrix is tridiagonal and state the sparsity pattern of its Cholesky factor.* **3.3** 沿用 3.1 的系统,推导卡尔曼滤波方程的具体形式,并证明稳态先验协方差 $\check{P}$ 和后验协方差 $\hat{P}$ 满足: $$\check{P}^2 - Q\check{P} - QR = 0, \quad \hat{P}^2 + Q\hat{P} - QR = 0$$ 解释为什么每个方程只有一个根有物理意义。 *Derive the Kalman filter for this system and show the steady-state covariances satisfy the above quadratics. Explain which root is physically meaningful.* **3.4** 利用 MAP 方法(§3.3.2),推导一个**时间反向**运行的 Kalman 滤波器(backward Kalman filter)。 *Using the MAP approach, derive a Kalman filter that runs backward in time.* **3.5** 考虑位置 $p$ 和地标 $m$ 的联合估计(SLAM): $$\begin{bmatrix}p_k \\ m_k\end{bmatrix} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}\begin{bmatrix}p_{k-1} \\ m_{k-1}\end{bmatrix} + \begin{bmatrix}1 \\ 0\end{bmatrix}(d_k + w_k)$$ $$y_k = \begin{bmatrix}-1 & 1\end{bmatrix}\begin{bmatrix}p_k \\ m_k\end{bmatrix} + n_k$$ 用卡尔曼滤波器估计 $K$ 步后的位置和地标。分析初始化和可观测性。 *Apply the Kalman filter to simultaneously estimate robot position and landmark location (SLAM). Analyze initialization and observability.* --- *下一章将研究非线性和非高斯情形,这是实际机器人系统中的常见情况。/ The next chapter tackles nonlinear and non-Gaussian estimation — the common case in real robotics.*