第四章　非线性非高斯估计

Chapter 4 Nonlinear Non-Gaussian Estimation

本章概览 / Chapter Overview

本章是全书最重要的章节之一。现实世界中不存在线性高斯系统——传感器模型是非线性的，噪声不服从完美的高斯分布。本章系统介绍非线性估计的主要方法，从全贝叶斯视角出发，推导 Bayes 滤波器、扩展卡尔曼滤波器（EKF）、Sigma 点卡尔曼滤波器（SPKF）、粒子滤波器，以及批量非线性 MAP 估计（Gauss-Newton 方法）。

This chapter covers nonlinear, non-Gaussian (NLNG) estimation. Real-world motion and observation models are nonlinear. We derive the Bayes filter, EKF, IEKF, SPKF, particle filter, batch MAP via Gauss–Newton, sliding-window filters, and continuous-time nonlinear estimation.

4.1 引言：非线性带来的根本挑战 / Introduction: The Fundamental Challenge of Nonlinearity

4.1.1 一个直觉性例子：立体相机 / Motivating Example: Stereo Camera

中文

在第三章，线性高斯假设给我们带来了极大的便利：后验仍是高斯分布，MAP = 后验均值 = MMSE 估计。但现实传感器几乎无一例外是非线性的。

考虑一个立体相机（stereo camera）：它测量地标的视差（disparity） $y$ ，与地标深度 $x$ 的关系为：

$y = \frac{fb}{x} + n, \quad n \sim \mathcal{N}(0, R) \tag{4.1}$

其中 $f$ 是焦距（像素）， $b$ 是基线长（米）， $n$ 是测量噪声。

为什么这个模型是非线性的？

注意 $y$ 是 $1/ x$ 的函数，而不是 $x$ 的线性函数。当 $x$ 较大（地标较远）时，同样的 $Δ x$ 对应的 $Δ y$ 很小（相机对远处地标不敏感）；当 $x$ 较小（地标较近）时，同样的 $Δ x$ 对应的 $Δ y$ 很大。这种非对称性会导致后验分布不对称——即使先验和似然都是高斯的！

给定先验 $p (x) = N (\overset{x}{ˇ}, \overset{ˇ}{P})$ 和测量 $y$ ，贝叶斯后验为：

$p (x ∣ y) = \frac{p ( y ∣ x ) p ( x )}{\int p ( y ∣ x ) p ( x ) d x}$

由于分母的积分没有解析解，只能数值计算。数值结果表明：后验分布是非对称的（skewed）——这是非线性模型的典型特征。

English

Consider the stereo camera model $y = f b / x + n$ . Because the measurement model is nonlinear in the state $x$ , the posterior $p (x ∣ y)$ — even with Gaussian prior and Gaussian noise — is not Gaussian. It is asymmetric (skewed). The denominator integral $\int p (y ∣ x) p (x) d x$ has no closed form.

This is the fundamental challenge: once the models are nonlinear, we lose all the nice properties from Chapter 3. All estimation methods in this chapter are approximations.

4.1.2 MAP 估计的偏差 / Bias of MAP Estimation

中文

MAP 估计寻找使后验最大的状态：

$\hat{x}_{\text{map}} = \arg\max_x p(x \mid y) = \arg\min_x\left[\frac{1}{2R}\left(y - \frac{fb}{x}\right)^2 + \frac{1}{2\check{P}}(\check{x} - x)^2\right] \tag{4.2}$

这可以用数值优化方法求解。

然而，MAP 估计是有偏的（biased）： $E [\overset{x}{^}_{map} - x_{true}] \neq = 0$ 。这是因为非线性模型导致后验不对称，后验的众数（mode）不等于后验的均值（mean）。

偏差的直觉：想象后验 PDF 像一个歪斜的山峰——左坡陡、右坡缓。MAP 找到山顶（众数），而真正的”平均位置”在山顶偏右。MAP 系统性地偏向一侧。

在立体相机例子中，用 $1 0^{6}$ 次蒙特卡洛实验可以测得 MAP 偏差约为 $- 33$ cm（地标在 20 m 处时）。

两个性能指标：

均值误差（Mean error）： $\overset{e}{^}_{mean} = E [\overset{x}{^} - x_{true}]$ ，衡量偏差
均方误差（MSE）： $\overset{e}{^}_{sq} = E [(\overset{x}{^} - x_{true})^{2}]$ ，衡量偏差+方差之和

English

The MAP estimator is biased for nonlinear models because the mode and mean of the posterior differ. In the stereo camera example with a 20 m landmark, MAP has a bias of ~−33 cm.

Two performance metrics:

Mean error $\overset{e}{^}_{mean} = E [\overset{x}{^} - x_{true}]$ : captures systematic bias
Mean squared error $\overset{e}{^}_{sq} = E [(\overset{x}{^} - x_{true})^{2}]$ : captures bias + variance

A good estimator minimizes both. Perfect mean error ( $\overset{e}{^}_{mean} = 0$ ) alone is not sufficient — the trivial estimator $\overset{x}{^} = \overset{x}{ˇ}$ achieves this but has large MSE.

4.2 递推离散时间估计 / Recursive Discrete-Time Estimation

4.2.1 非线性模型设置 / Nonlinear Problem Setup

中文

非线性系统的运动模型和观测模型为：

$\mathbf{x}_k = \mathbf{f}(\mathbf{x}_{k-1}, \mathbf{v}_k, \mathbf{w}_k), \quad \mathbf{w}_k \sim p(\mathbf{w}_k) \tag{4.3a}$

$\mathbf{y}_k = \mathbf{g}(\mathbf{x}_k, \mathbf{n}_k), \quad \mathbf{n}_k \sim p(\mathbf{n}_k) \tag{4.3b}$

其中 $f (\cdot)$ 和 $g (\cdot)$ 是非线性函数，噪声不一定是高斯的。

马尔可夫性质（Markov property）：一旦知道 $x_{k - 1}$ ， $x_{k}$ 的分布就与 $x_{k - 2}, x_{k - 3}, \dots$ 无关。即”历史只通过当前状态影响未来”。这个性质对于设计递推滤波器至关重要。

English

The nonlinear motion and observation models are: $x_{k} = f (x_{k - 1}, v_{k}, w_{k}), y_{k} = g (x_{k}, n_{k})$

The system satisfies the Markov property: $x_{k}$ depends only on $x_{k - 1}$ , not on earlier history. This enables recursive (online) estimation.

4.2.2 贝叶斯滤波器 / The Bayes Filter

中文

贝叶斯滤波器是所有递推滤波器的理论基础。它试图维护一个完整的概率分布（信念函数，belief）来描述当前状态：

$\underbrace{p(\mathbf{x}_k \mid \check{\mathbf{x}}_0, \mathbf{v}_{1:k}, \mathbf{y}_{0:k})}_{\text{后验信念}} = \eta\underbrace{p(\mathbf{y}_k \mid \mathbf{x}_k)}_{\text{观测似然}} \int \underbrace{p(\mathbf{x}_k \mid \mathbf{x}_{k-1}, \mathbf{v}_k)}_{\text{运动模型}} \underbrace{p(\mathbf{x}_{k-1} \mid \check{\mathbf{x}}_0, \mathbf{v}_{1:k-1}, \mathbf{y}_{0:k-1})}_{\text{先验信念}}\,d\mathbf{x}_{k-1} \tag{4.4}$

这是一个预测-校正的两步结构：

预测步（Prediction）：用运动模型把先验分布往前推一步（积分中的项）
校正步（Correction）：用新测量 $y_{k}$ 更新分布（外面的 $η p (y_{k} ∣ x_{k})$ 项）

为什么贝叶斯滤波器无法直接实现？

无限维表示：概率密度函数需要无穷多参数（所有可能状态的概率值）才能完整表示

积分不可解析： $\int p (x_{k} ∣ x_{k - 1}, v_{k}) p (x_{k - 1} ∣ \dots) d x_{k - 1}$ 对一般非线性 $f (\cdot)$ 没有闭合解

因此，所有实际滤波器都是贝叶斯滤波器的近似。

English

The Bayes filter is the theoretical foundation for all recursive filters:

$posterior belief = η \cdot observation likelihood \times \int motion model \times prior belief d x_{k - 1}$

It is exact but intractable in practice for two reasons:

A PDF requires infinite parameters to represent exactly.
The integral has no closed form for general nonlinear motion models.

All practical filters are approximations of the Bayes filter. They differ in how they represent the belief (Gaussian vs. particles) and how they handle the integral (linearization vs. Monte Carlo vs. sigma-points).

4.2.3 扩展卡尔曼滤波器（EKF）/ Extended Kalman Filter

中文

**扩展卡尔曼滤波器（EKF）**是目前实用中最广泛使用的非线性滤波器。其核心思路是：

约束信念为高斯：假设 $p (x_{k} ∣ \dots) = N (\hat{x}_{k}, \hat{P}_{k})$
线性化非线性模型：在当前估计均值附近做一阶 Taylor 展开

线性化（在 $\hat{x}_{k - 1}$ 附近展开运动模型，在 $\overset{ˇ}{x}_{k}$ 附近展开观测模型）：

$\mathbf{f}(\mathbf{x}_{k-1}, \mathbf{v}_k, \mathbf{w}_k) \approx \check{\mathbf{x}}_k + \mathbf{F}_{k-1}(\mathbf{x}_{k-1} - \hat{\mathbf{x}}_{k-1}) + \mathbf{w}_k' \tag{4.5a}$

$\mathbf{g}(\mathbf{x}_k, \mathbf{n}_k) \approx \check{\mathbf{y}}_k + \mathbf{G}_k(\mathbf{x}_k - \check{\mathbf{x}}_k) + \mathbf{n}_k' \tag{4.5b}$

其中 Jacobian 矩阵为：

$\mathbf{F}_{k-1} = \left.\frac{\partial \mathbf{f}}{\partial \mathbf{x}_{k-1}}\right|_{\hat{\mathbf{x}}_{k-1}, \mathbf{v}_k, \mathbf{0}}, \quad \mathbf{G}_k = \left.\frac{\partial \mathbf{g}}{\partial \mathbf{x}_k}\right|_{\check{\mathbf{x}}_k, \mathbf{0}} \tag{4.5c}$

EKF 的五个方程（与 Kalman 滤波器完全类似，区别仅在于用 Jacobian 代替了线性矩阵，用非线性函数传播均值）：

步骤	方程
预测均值	$\overset{ˇ}{x}_{k} = f (\hat{x}_{k - 1}, v_{k}, 0)$
预测协方差	$\overset{ˇ}{P}_{k} = F_{k - 1} \hat{P}_{k - 1} F_{k - 1}^{T} + Q_{k}^{'}$
卡尔曼增益	$K_{k} = \overset{ˇ}{P}_{k} G_{k}^{T} (G_{k} \overset{ˇ}{P}_{k} G_{k}^{T} + R_{k}^{'})^{- 1}$
校正协方差	$\hat{P}_{k} = (1 - K_{k} G_{k}) \overset{ˇ}{P}_{k}$
校正均值	$\hat{x}_{k} = \overset{ˇ}{x}_{k} + K_{k} (y_{k} - g (\overset{ˇ}{x}_{k}, 0))$

其中 $Q_{k}^{'} = \frac{\partial f}{\partial w _{k}}_{\dots} Q_{k} \frac{\partial f}{\partial w _{k}}_{\dots}^{T}$ ， $R_{k}^{'} = \frac{\partial g}{\partial n _{k}}_{\dots} R_{k} \frac{\partial g}{\partial n _{k}}_{\dots}^{T}$ 是经过 Jacobian 变换的噪声协方差。

EKF 与标准 KF 的区别：

均值传播：EKF 用非线性函数 $f (\cdot)$ 传播均值（而不是线性矩阵 $A$ ）； $\overset{ˇ}{x}_{k} = f (\hat{x}_{k - 1}, v_{k}, 0)$

协方差传播：EKF 用 Jacobian $F_{k - 1}$ 代替线性转移矩阵 $A_{k - 1}$

新息（innovation）： $y_{k} - g (\overset{ˇ}{x}_{k}, 0)$ ，用非线性函数计算预测测量

EKF 的主要风险：

线性化操作点是估计均值，不是真实状态——若估计与真实相差太大，线性化误差很大

对高度非线性系统，EKF 可能发散（diverge）：估计漂离真实轨迹且协方差矩阵失去意义

通常偏差（biased）且不一致（inconsistent）

English

The EKF approximates the Bayes filter by:

Constraining the belief to be Gaussian.
Linearizing the nonlinear models around the current mean estimate.

The five EKF equations mirror the Kalman filter exactly, replacing linear matrices with Jacobians and propagating the mean through the full nonlinear function.

Key differences from KF:

Predict mean: $\overset{ˇ}{x}_{k} = f (\hat{x}_{k - 1}, v_{k}, 0)$ — uses the full nonlinear function
Predict covariance: uses Jacobian $F_{k - 1}$ instead of $A_{k - 1}$
Innovation: $y_{k} - g (\overset{ˇ}{x}_{k}, 0)$ — uses the full nonlinear observation model

EKF limitations: The linearization point is the estimated mean, not the true state. If these differ significantly (as in a highly nonlinear system), the EKF can diverge catastrophically.

4.2.4 迭代扩展卡尔曼滤波器（IEKF）/ Iterated Extended Kalman Filter

中文

EKF 在每个时刻只线性化一次，精度受限。**迭代 EKF（IEKF）**通过在校正步骤中迭代重新线性化来提升精度：

迭代过程：

初始化线性化点： $x_{op, k} = \overset{ˇ}{x}_{k}$
在 $x_{op, k}$ 处计算 Jacobian，执行校正步，得到新的均值估计 $\hat{x}_{k}$
更新线性化点： $x_{op, k} \leftarrow \hat{x}_{k}$
重复步骤 2-3，直到收敛

$\hat{\mathbf{x}}_k = \check{\mathbf{x}}_k + \mathbf{K}_k\left(\mathbf{y}_k - \mathbf{g}(\mathbf{x}_{\text{op},k}, \mathbf{0}) - \mathbf{G}_k(\check{\mathbf{x}}_k - \mathbf{x}_{\text{op},k})\right) \tag{4.6}$

IEKF 与 MAP 的等价性：可以证明，IEKF 校正步收敛到后验 PDF 的众数（mode），即 MAP 解。这意味着 IEKF 的”均值”实际上是后验的最大值，而不是后验的真实均值。

English

The IEKF improves the EKF’s correction step by iteratively re-linearizing around the best current estimate. At each iteration:

$x_{op, k} \leftarrow \hat{x}_{k}$

until convergence. It can be shown that the IEKF converges to the MAP solution (mode of the posterior), not the mean. This is important: IEKF gives the peak of the posterior, while the true posterior mean (MMSE) is different.

4.2.5 将 PDF 通过非线性传播的三种方法 / Three Ways to Pass a PDF Through a Nonlinearity

中文

EKF 面临的核心技术挑战是：如何将高斯 PDF 通过非线性函数传播，得到另一端 PDF 的近似？有三种主要方法：

方法 1：蒙特卡洛（Monte Carlo）——暴力法

从输入 PDF 抽取大量样本，每个样本精确通过非线性函数，然后从输出样本重建 PDF（计算均值和协方差）。

优点：最精确、无需解析表达式、适用于任意非线性和任意 PDF
缺点：计算代价高，尤其在高维时

精确结果（以 $y = x^{2}$ 为例）：若输入为 $N (μ_{x}, σ_{x}^{2})$ ，精确输出为： $μ_{y} = μ_{x}^{2} + σ_{x}^{2}, σ_{y}^{2} = 4 μ_{x}^{2} σ_{x}^{2} + 2 σ_{x}^{4}$

方法 2：线性化（Linearization）——EKF 的方法

在均值处做 Taylor 展开： $y \approx f (μ_{x}) + f^{'} (μ_{x}) (x - μ_{x})$ ，得到：

$μ_{y} \approx f (μ_{x}) = μ_{x}^{2} （仅 μ_{x}^{2} ，遗漏了 σ_{x}^{2} ！）$ $σ_{y}^{2} \approx (2 μ_{x})^{2} σ_{x}^{2} = 4 μ_{x}^{2} σ_{x}^{2} （遗漏了 2 σ_{x}^{4} ！）$

线性化方法低估了均值（偏差）且低估了方差（过于乐观），尤其在非线性较强时误差明显。

方法 3：Sigma 点变换（Sigmapoint Transformation, SPT）——无迹变换

不近似非线性函数，而是用少量精心选择的”sigma 点”来近似输入 PDF，然后每个 sigma 点精确通过非线性函数。

对于 $L$ 维输入 $N (μ_{x}, Σ_{xx})$ ，生成 $2 L + 1$ 个 sigma 点：

$L L^{T} = Σ_{xx} （ Cholesky 分解）$ $\mathbf{x}_0 = \boldsymbol{\mu}_x, \quad \mathbf{x}_i = \boldsymbol{\mu}_x + \sqrt{L+\kappa}\,\text{col}_i\mathbf{L}, \quad \mathbf{x}_{i+L} = \boldsymbol{\mu}_x - \sqrt{L+\kappa}\,\text{col}_i\mathbf{L} \tag{4.7}$

权重为： $α_{0} = κ / (L + κ)$ ， $α_{i} = 1/ (2 (L + κ))$ （ $i > 0$ ）

每个 sigma 点精确通过非线性函数： $y_{i} = f (x_{i})$

输出均值和协方差： $\boldsymbol{\mu}_y = \sum_{i=0}^{2L}\alpha_i y_i, \quad \boldsymbol{\Sigma}_{yy} = \sum_{i=0}^{2L}\alpha_i(y_i - \boldsymbol{\mu}_y)(y_i - \boldsymbol{\mu}_y)^T \tag{4.8}$

对于 $y = x^{2}$ 的例子，选 $κ = 2$ 时： $μ_{y} = μ_{x}^{2} + σ_{x}^{2} （精确！）, σ_{y}^{2} = 4 μ_{x}^{2} σ_{x}^{2} + 2 σ_{x}^{4} （精确！）$

三种方法的比较：

方法均值精度协方差精度计算代价需要 Jacobian？
蒙特卡洛最高最高最高否
线性化（EKF）低（有偏）低（过于乐观）低是
Sigma 点变换高（精确到3阶）高（精确到3阶）类似线性化否

方法	均值精度	协方差精度	计算代价	需要 Jacobian？
蒙特卡洛	最高	最高	最高	否
线性化（EKF）	低（有偏）	低（过于乐观）	低	是
Sigma 点变换	高（精确到3阶）	高（精确到3阶）	类似线性化	否

English

Three methods to propagate a Gaussian through a nonlinearity:

1. Monte Carlo (brute force): Draw many samples, pass each through exactly, reconstruct PDF. Most accurate, highest cost.

2. Linearization (EKF approach): First-order Taylor expansion. Fast, needs Jacobians, but:

Biased mean: $μ_{y} \approx f (μ_{x})$ , missing higher-order terms
Underestimates variance (overconfident)

3. Sigmapoint/Unscented Transform: Choose $2 L + 1$ deterministic sigma points to represent the Gaussian. Pass each through the nonlinearity exactly. Recombine. For the $y = x^{2}$ example with $κ = 2$ , this matches the exact mean and variance exactly.

Key insight: The sigmapoint transform approximates the distribution, not the nonlinearity. This is more accurate than linearization for the same computational cost.

4.2.6 粒子滤波器 / Particle Filter

中文

粒子滤波器（Particle Filter）是贝叶斯滤波器的蒙特卡洛近似。它用有限个粒子（particles） ${\hat{x}_{k, m}}_{m = 1}^{M}$ 来近似后验 PDF，每个粒子代表一个假设的状态。

算法步骤：

采样（Sampling）：从先验 PDF 和过程噪声 $p (w_{k})$ 中分别抽取 $M$ 个样本
预测（Prediction）：每个粒子通过非线性运动模型传播： $\overset{ˇ}{x}_{k, m} = f (\hat{x}_{k - 1, m}, v_{k}, w_{k, m})$
权重更新（Weighting）：根据新测量 $y_{k}$ 为每个预测粒子赋权重： $w_{k, m} = η p (y_{k} ∣ \overset{ˇ}{x}_{k, m})$ 其中 $η$ 是归一化常数
重采样（Resampling）：根据权重重新采样 $M$ 个粒子（权重大的粒子更可能被多次采样）

直觉：粒子滤波器像什么？

想象你在黑暗中追踪一只发光的萤火虫。你在它可能出现的位置放了 $M$ 盏灯（粒子）。每次萤火虫移动时，所有灯也移动（预测步）。然后你观察一下，看哪些灯与萤火虫的位置最一致，给那些灯更多”分数”（权重），去掉得分低的灯，复制得分高的灯（重采样）。随着时间推移，灯会越来越集中在萤火虫真实位置附近。

粒子滤波器的优缺点：

优点：可处理任意非高斯噪声、任意非线性模型、不需要 Jacobian
缺点：维数诅咒（curse of dimensionality）——高维状态需要指数级数量的粒子；计算代价高

English

The particle filter approximates the Bayes filter using $M$ weighted random samples (“particles”). Each particle is a hypothesis about the state.

Algorithm (bootstrap/condensation):

Draw $M$ samples from the prior belief and process noise
Propagate each particle through the nonlinear motion model exactly
Assign weight $w_{k, m} \propto p (y_{k} ∣ \overset{ˇ}{x}_{k, m})$ based on the new measurement
Resample $M$ new particles proportional to weights

Pros: Handles any non-Gaussian noise and any nonlinearity; no Jacobians needed. Cons: Dimensionality curse — number of particles needed grows exponentially with state dimension.

Systematic resampling (Madow): Normalize weights to $[0, 1]$ , create bins proportional to weights, draw one uniform random number $ρ \in [0, 1/ M)$ , then step by $1/ M$ to select $M$ samples.

4.2.7 Sigma 点卡尔曼滤波器（SPKF）/ Sigma-Point Kalman Filter

中文

Sigma 点卡尔曼滤波器（SPKF），也称为无迹卡尔曼滤波器（UKF），将 sigma 点变换嵌入卡尔曼框架中，替代 EKF 中的线性化步骤。

预测步：将状态和过程噪声堆叠，生成 sigma 点，通过非线性运动模型传播，重组得到 $(\overset{ˇ}{x}_{k}, \overset{ˇ}{P}_{k})$

校正步：将预测状态和观测噪声堆叠，生成 sigma 点，通过非线性观测模型传播，重组得到所需矩：

$\boldsymbol{\mu}_{y,k} = \sum_{i=0}^{2L}\alpha_i\check{\mathbf{y}}_{k,i}, \quad \boldsymbol{\Sigma}_{yy,k} = \sum_{i=0}^{2L}\alpha_i(\check{\mathbf{y}}_{k,i} - \boldsymbol{\mu}_{y,k})(\check{\mathbf{y}}_{k,i} - \boldsymbol{\mu}_{y,k})^T \tag{4.9a}$

$\boldsymbol{\Sigma}_{xy,k} = \sum_{i=0}^{2L}\alpha_i(\check{\mathbf{x}}_{k,i} - \check{\mathbf{x}}_k)(\check{\mathbf{y}}_{k,i} - \boldsymbol{\mu}_{y,k})^T \tag{4.9b}$

然后用广义卡尔曼增益（Generalized Kalman gain）完成校正：

$\mathbf{K}_k = \boldsymbol{\Sigma}_{xy,k}\boldsymbol{\Sigma}_{yy,k}^{-1}, \quad \hat{\mathbf{P}}_k = \check{\mathbf{P}}_k - \mathbf{K}_k\boldsymbol{\Sigma}_{xy,k}^T, \quad \hat{\mathbf{x}}_k = \check{\mathbf{x}}_k + \mathbf{K}_k(\mathbf{y}_k - \boldsymbol{\mu}_{y,k}) \tag{4.10}$

SPKF 的优势：

不需要计算 Jacobian（对于不光滑的非线性函数很有用）
对非线性的近似精度更高（精确到三阶矩）
非线性函数可以是”黑盒”软件函数

ISPKF（迭代 SPKF）：类似于 IEKF 对 EKF 的迭代改进，ISPKF 通过迭代更新线性化点，逼近后验的均值（而不是 IEKF 的众数/MAP）。

English

The SPKF (UKF) replaces EKF’s linearization with sigma-point transforms:

Prediction: Stack state + process noise; generate sigma points; pass through nonlinear $f (\cdot)$ exactly; recombine.

Correction: Stack predicted state + observation noise; generate sigma points; pass through nonlinear $g (\cdot)$ exactly; compute $μ_{y, k}$ , $Σ_{yy, k}$ , $Σ_{x y, k}$ ; use generalized Kalman equations.

When the nonlinearity is linearized, SPKF reduces to EKF. Without linearization, SPKF is more accurate.

ISPKF (iterated SPKF) iteratively updates the operating point and converges toward the posterior mean (unlike IEKF which converges to the MAP/mode).

4.2.8 各滤波器的分类图谱 / Taxonomy of Filters

中文

滤波器	信念表示	传播方法	对应后验什么？
精确贝叶斯	完整 PDF	精确积分	完整后验
粒子滤波器 (PF)	加权粒子集合	Monte Carlo	后验（近似）
EKF	高斯	线性化（一次）	后验近似均值（实际不对应任何特定量）
IEKF	高斯	迭代线性化	后验众数（MAP）
SPKF (UKF)	高斯	Sigma 点变换（一次）	后验近似均值（实际不对应任何特定量）
ISPKF	高斯	迭代 Sigma 点	后验均值（近似）

核心洞察：迭代（iteration）将估计对应到后验的某个有意义的量——不迭代则难以说清估计对应后验的哪个部分。

English

Filter	Belief	Propagation	Approximates
Exact Bayes	Full PDF	Exact	Full posterior
Particle Filter	Weighted samples	Monte Carlo	Posterior (approx.)
EKF	Gaussian	Linearization (once)	Unclear
IEKF	Gaussian	Iterated linearization	Posterior mode (MAP)
SPKF	Gaussian	Sigma points (once)	Unclear
ISPKF	Gaussian	Iterated sigma points	Posterior mean (approx.)

Key lesson: Iteration ties the estimate to a meaningful quantity of the full posterior.

4.3 批量离散时间估计 / Batch Discrete-Time Estimation

4.3.1 MAP 估计与 Gauss-Newton 方法 / MAP Estimation and Gauss-Newton

中文

批量方法估计整条轨迹 $x = [x_{0}^{T}, \dots, x_{K}^{T}]^{T}$ ，而不是一步一步向前。

目标函数（与线性高斯情形相同，只是误差现在是非线性函数）：

$J(\mathbf{x}) = \frac{1}{2}\mathbf{e}(\mathbf{x})^T\mathbf{W}^{-1}\mathbf{e}(\mathbf{x}) \tag{4.11}$

其中误差项为：

$e_{v, 0} (x) = \overset{ˇ}{x}_{0} - x_{0} (先验误差)$

$e_{v, k} (x) = f (x_{k - 1}, v_{k}, 0) - x_{k} (运动误差, k = 1 \dots K)$

$e_{y, k} (x) = y_{k} - g (x_{k}, 0) (观测误差, k = 0 \dots K)$

如何最小化 $J (x)$ ？

由于 $J$ 是非线性的，不能直接求导令其为零得到解析解。需要迭代优化方法。

Newton 方法：用目标函数的二阶 Taylor 展开近似，然后跳到该二次函数的极小值点：

$J (x_{op} + δ x) \approx J (x_{op}) + Jacobian \frac{\partial J}{\partial x}_{x_{op}} δ x + \frac{1}{2} δ x^{T} Hessian \frac{\partial ^{2} J}{\partial x \partial x ^{T}}_{x_{op}} δ x$

设对 $δ x$ 的导数为零： $Hessian \cdot δ x^{*} = - Jacobian^{T}$ ，然后更新 $x_{op} \leftarrow x_{op} + δ x^{*}$

Gauss-Newton 方法：在目标函数 $J = \frac{1}{2} u^{T} u$ 的形式下，Hessian 的近似为：

$Hessian \approx (\frac{\partial u}{\partial x})^{T} (\frac{\partial u}{\partial x}) = H^{T} W^{- 1} H$

这个近似丢弃了涉及 $u_{i} (x)$ 的二阶导数项（在接近最优时这些项很小）。Gauss-Newton 更新方程为：

$\boxed{(\mathbf{H}^T\mathbf{W}^{-1}\mathbf{H})\,\delta\mathbf{x}^* = \mathbf{H}^T\mathbf{W}^{-1}\mathbf{e}(\mathbf{x}_{\text{op}})} \tag{4.12}$

其中 $H = - \partial e / \partial x ∣_{x_{op}}$ 是误差关于状态的负 Jacobian。

Gauss-Newton 的美妙之处：

方程 (4.12) 与线性高斯的法方程形式完全相同！区别在于现在 $H$ 是在当前操作点处计算的 Jacobian，而且需要迭代到收敛。这就像是把线性最小二乘问题反复求解，每次在更好的操作点处更新 $H$ 和 $e$ 。

收敛性：Gauss-Newton 局部收敛（initial guess must be close enough），不保证全局最优。实际补丁：

Line search：每步按 $α δ x^{*}$ （ $α \in (0, 1]$ ）步进，减缓更新速度

Levenberg-Marquardt：在信息矩阵中加阻尼项 $λ D$ ，当 $λ$ 大时等同于梯度下降，当 $λ = 0$ 时等同于 Gauss-Newton

English

The Gauss-Newton (batch MAP) method minimizes: $J (x) = \frac{1}{2} e (x)^{T} W^{- 1} e (x)$

by iteratively solving: $(H^{T} W^{- 1} H) δ x^{*} = H^{T} W^{- 1} e (x_{op})$

This is the same structure as the linear normal equations, with $H$ being the Jacobian evaluated at the operating point. Iterate until $δ x^{*} \to 0$ .

The information matrix $H^{T} W^{- 1} H$ remains block-tridiagonal — the nonlinear structure preserves the sparsity of the linear case.

Practical improvements:

Line search: step $α δ x^{*}$ with $α \in [0, 1]$ for robustness
Levenberg-Marquardt: add damping $λ D$ to the normal equations for better conditioning

4.3.2 Laplace 近似：从 MAP 获得不确定性 / Laplace Approximation

中文

MAP 只给出点估计。如何获得不确定性？

Laplace 近似：在 MAP 解 $\hat{x}$ 处用高斯 $N (\hat{x}, \hat{P})$ 近似完整后验，其中后验协方差为目标函数 Hessian 的逆：

$\hat{\mathbf{P}} = (\mathbf{H}^T\mathbf{W}^{-1}\mathbf{H})^{-1} \tag{4.13}$

这与线性高斯情形完全对应——那时后验协方差也是 $(H^{T} W^{- 1} H)^{- 1}$ ，且后验恰好是高斯的。非线性情形中，Laplace 近似只是近似，因为后验并非真正的高斯。

English

The Laplace approximation approximates the full posterior by a Gaussian centered at the MAP solution:

$p (x ∣ y) \approx N (\hat{x}, (H^{T} W^{- 1} H)^{- 1})$

This matches the linear-Gaussian result exactly and is a reasonable approximation for mildly nonlinear systems.

4.3.3 最大似然估计与偏差 / Maximum Likelihood and Bias

中文

最大似然（ML）估计：丢弃先验，只最大化测量的似然：

$\hat{\mathbf{x}}_{\text{ml}} = \arg\max_{\mathbf{x}} p(\mathbf{y} \mid \mathbf{x}) = \arg\min_{\mathbf{x}}\sum_k\frac{1}{2}(\mathbf{y}_k - \mathbf{g}_k(\mathbf{x}))^T\mathbf{R}_k^{-1}(\mathbf{y}_k - \mathbf{g}_k(\mathbf{x})) \tag{4.14}$

ML 估计也是有偏的（当观测模型非线性时）。Box (1971) 给出了 ML 偏差的近似解析表达式：

$E[\hat{\mathbf{x}} - \mathbf{x}] \approx -\frac{1}{2}\mathbf{W}(\mathbf{x})^{-1}\sum_k\mathbf{G}_k(\mathbf{x})^T\mathbf{R}_k^{-1}\sum_j\mathbf{1}_j\,\text{tr}(\mathcal{G}_{jk}(\mathbf{x})\mathbf{W}(\mathbf{x})^{-1}) \tag{4.15}$

其中 $G_{jk} = \partial^{2} g_{jk} / \partial x \partial x^{T}$ 是观测模型的 Hessian， $W (x) = \sum_{k} G_{k}^{T} R_{k}^{- 1} G_{k}$ 。

知道偏差后，可以从估计中减去偏差进行校正： $\hat{x} \leftarrow \hat{x} - E [\hat{x} - x]$ 。

English

Maximum likelihood (ML): Discard the prior; minimize only the measurement residuals. Also biased for nonlinear models. Box (1971) gives an approximate expression for the ML bias involving the Hessian of the observation model. After estimating the bias, one can subtract it from the ML estimate.

4.3.4 滑动窗口滤波器 / Sliding-Window Filters

中文

EKF 的根本问题是只在一个时刻迭代，无法在整条轨迹上收敛。批量 Gauss-Newton 的问题是必须离线运行，无法在线使用。

**滑动窗口滤波器（Sliding-Window Filter, SWF）**是折中方案：在一段固定长度的时间窗口内进行批量优化，然后窗口向前滑动。

算法步骤（窗口大小 $W$ ）：

对初始窗口 $[0, W - 1]$ 建立批量问题并迭代至收敛
窗口右扩一步（加入时刻 $W$ ）
对最左侧状态 $x_{0}$ 进行边缘化（marginalization）： $\overset{ˉ}{A}_{1, 1} = A_{1, 1} - A_{1, 0} A_{0, 0}^{- 1} A_{1, 0}^{T} （ Schur complement ）$ 这将 $x_{0}$ 的信息”压缩”进新的先验项，不丢失任何信息
窗口左缩一步（ $x_{0}$ 退出窗口），输出其估计
迭代新窗口至收敛，重复

三种方法的对比：

方法迭代范围在线/离线时间复杂度/步
EKF/IEKF 单个时刻在线 $O (N^{3})$
全批量 GN 全轨迹离线 $O (N^{3} K^{3})$
滑动窗口 $W$ 个时刻在线 $O (N^{3} W^{3})$

SWF 的窗口越大，越接近批量解；窗口大小为 1 时（只迭代校正步）类似 IEKF。

方法	迭代范围	在线/离线	时间复杂度/步
EKF/IEKF	单个时刻	在线	$O (N^{3})$
全批量 GN	全轨迹	离线	$O (N^{3} K^{3})$
滑动窗口	$W$ 个时刻	在线	$O (N^{3} W^{3})$

English

The sliding-window filter (SWF) bridges the gap between EKF and full batch:

Maintains a fixed-size window of $W$ timesteps
Iterates the batch problem within the window to convergence
Slides forward by marginalizing out the oldest state (via Schur complement), preserving information
Online and constant-time per step

Window size $W = 1$ ≈ IEKF; $W = K$ = full batch. Larger windows → more accurate, more expensive.

The key operation is marginalization (Schur complement), which compresses information from the oldest state into an updated prior for the next state, without loss.

4.4 批量连续时间估计 / Batch Continuous-Time Estimation

中文

第三章中，连续时间估计基于线性 SDE 先验（GP 回归）。本节将其推广到非线性运动模型：

$\dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t), \mathbf{v}(t), \mathbf{w}(t), t) \tag{4.16}$

关键思路：迭代线性化（Iterated Linearization）

在当前轨迹估计 $x_{op} (t)$ 附近线性化：

$\dot{x} (t) \approx ν (t) f (x_{op}, v, 0) - F (t) x_{op} (t) + \partial f / \partial x ∣_{x_{op}} F (t) x (t) + \partial f / \partial w ∣_{x_{op}} L (t) w (t)$

线性化后变成 LTV SDE，可以用第三章的 GP 框架建立先验。离散化到测量时刻后，法方程变成：

$\underbrace{(\mathbf{F}^{-T}\mathbf{Q}'^{-1}\mathbf{F}^{-1} + \mathbf{G}^T\mathbf{R}'^{-1}\mathbf{G})}_{\text{块三对角}}\delta\mathbf{x}^* = \mathbf{F}^{-T}\mathbf{Q}'^{-1}(\boldsymbol{\nu} - \mathbf{F}^{-1}\mathbf{x}_{\text{op}}) + \mathbf{G}^T\mathbf{R}'^{-1}(\mathbf{y} - \mathbf{y}_{\text{op}}) \tag{4.17}$

这与非线性离散时间批量估计形式完全相同！

算法流程：

初始化操作轨迹 $x_{op} (t)$
计算线性化量 $ν$ , $F^{- 1}$ , $Q^{' - 1}$ （用当前轨迹和 GP 插值获取连续时间值）
计算 $y_{op}$ , $G$ , $R^{' - 1}$
求解块三对角线性系统，得到 $δ x^{*}$
更新： $x_{op} \leftarrow x_{op} + δ x^{*}$ ；检查收敛，否则回到步骤 2
用 GP 插值公式查询任意时刻的后验状态

English

For nonlinear continuous-time estimation, linearize the motion model around the current trajectory estimate $x_{op} (t)$ to obtain an approximately LTV SDE. Build a GP prior, discretize at measurement times, and the resulting normal equations have the same block-tridiagonal form as the discrete nonlinear batch case. Iterate to convergence, using GP interpolation to query the operating trajectory at arbitrary times needed for the linearization integrals.

4.5 本章小结 / Chapter Summary

中文

方法	类型	收敛到	在线？	迭代？
Bayes 滤波器	递推	完整后验（理论）	是	—
EKF	递推（高斯）	近似（不对应特定量）	是	否
IEKF	递推（高斯）	后验众数（MAP）	是	是（校正步）
SPKF/UKF	递推（高斯）	近似	是	否
ISPKF	递推（高斯）	后验均值（近似）	是	是（校正步）
粒子滤波器	递推（粒子）	后验（精确，样本数→∞）	是	—
批量 GN MAP	批量	后验众数（MAP）	否	是（全轨迹）
滑动窗口	批量（在线）	接近 MAP	是	是（窗口）

四条核心结论：

非线性后验不是高斯的：MAP（众数）≠ 后验均值，这使得不同方法之间的比较需要注意它们各自逼近的是哪个量。
近似是不可避免的：所有实际方法都对贝叶斯后验做了某种近似——关键是要了解自己的方法做了什么假设、可能在什么情况下失效。
迭代将估计对应到有意义的量：不迭代的 EKF/SPKF 难以说清楚在逼近什么；迭代的 IEKF 收敛到 MAP，迭代的批量方法也收敛到 MAP。
批量方法优于递推方法（但在线受限）：批量 Gauss-Newton 在整条轨迹上迭代，因此比 EKF 更准确；滑动窗口是实用折中。

English

Four core takeaways:

Non-Gaussian posteriors: MAP ≠ posterior mean for nonlinear models. Methods that “iterate” converge to the MAP (mode); methods that target the mean need different tools (ISPKF, etc.).
Approximation is unavoidable: Every practical filter approximates the Bayes filter. The choice of approximation determines the failure modes.
Iteration matters: Un-iterated EKF/SPKF have no clear relationship to the full posterior. Iterated versions converge to meaningful quantities (MAP or posterior mean).
Batch > recursive, but offline: Full batch Gauss-Newton is more accurate than recursive filters because it iterates over the whole trajectory. Sliding-window filters are the online compromise.

习题 / Exercises

4.1 考虑一个移动机器人（状态 $[x_{k}, y_{k}, θ_{k}]^{T}$ ）的非线性运动模型： $x_{k} y_{k} θ_{k} = x_{k - 1} y_{k - 1} θ_{k - 1} + T cos θ_{k - 1} sin θ_{k - 1} 0 001 ([v_{k} ω_{k}] + w_{k})$ 测量模型为到原点的距离和方位角。推导 EKF 方程，特别是计算 Jacobian $F_{k - 1}$ 和 $G_{k}$ 。

For the given nonlinear robot pose model, derive the EKF equations including Jacobians $F_{k - 1}$ and $G_{k}$ .

4.2 对先验 $N (μ_{x}, σ_{x}^{2})$ 通过非线性 $f (x) = x^{3}$ 变换，用蒙特卡洛、线性化、Sigma 点方法分别计算输出均值和方差，并比较结果。

Transform $N (μ_{x}, σ_{x}^{2})$ through $f (x) = x^{3}$ using Monte Carlo, linearization, and sigma-point methods. Compare results.

4.3 考虑以下一维系统： $x_{k} = x_{k - 1} + v_{k} + w_{k}, y_{k} = x_{k}^{2} + h^{2} + n_{k}$ （小车测量到旗杆顶部的距离）取 $h = 1, Q = 1, R = 1/2$ ，手动执行 EKF 的前三步（ $k = 0, 1, 2$ ），并分析 $\hat{P}_{k}$ 的趋势。

Execute EKF for the given 1D system for $k = 0, 1, 2$ . Comment on the trend of $\hat{P}_{k}$ .

4.4 证明：在 Gauss-Newton 方法中，若误差项 $e (x)$ 在操作点 $x_{op}$ 处为零，则 Gauss-Newton 的一步等同于线性化问题的精确解。

Show that if the residual $e (x_{op}) = 0$ , one Gauss-Newton step gives the exact solution to the linearized problem.

4.5 解释为什么滑动窗口滤波器在边缘化最旧的状态时需要使用 Schur 补（Schur complement），而不是直接删除对应的行和列。这样做的好处是什么？

Explain why marginalization via Schur complement is preferred over simply deleting rows/columns when sliding the window.

下一章将讨论估计器偏差、测量离群值和协方差估计等实际问题。/ The next chapter addresses practical issues: estimator bias, outlier measurements, and covariance estimation.

Chunibyo

Explorer

ch04_nonlinear

第四章　非线性非高斯估计

Chapter 4 Nonlinear Non-Gaussian Estimation

4.1 引言：非线性带来的根本挑战 / Introduction: The Fundamental Challenge of Nonlinearity

4.1.1 一个直觉性例子：立体相机 / Motivating Example: Stereo Camera

4.1.2 MAP 估计的偏差 / Bias of MAP Estimation

4.2 递推离散时间估计 / Recursive Discrete-Time Estimation

4.2.1 非线性模型设置 / Nonlinear Problem Setup

4.2.2 贝叶斯滤波器 / The Bayes Filter

4.2.3 扩展卡尔曼滤波器（EKF）/ Extended Kalman Filter

4.2.4 迭代扩展卡尔曼滤波器（IEKF）/ Iterated Extended Kalman Filter

4.2.5 将 PDF 通过非线性传播的三种方法 / Three Ways to Pass a PDF Through a Nonlinearity

4.2.6 粒子滤波器 / Particle Filter

4.2.7 Sigma 点卡尔曼滤波器（SPKF）/ Sigma-Point Kalman Filter

4.2.8 各滤波器的分类图谱 / Taxonomy of Filters

4.3 批量离散时间估计 / Batch Discrete-Time Estimation

4.3.1 MAP 估计与 Gauss-Newton 方法 / MAP Estimation and Gauss-Newton

4.3.2 Laplace 近似：从 MAP 获得不确定性 / Laplace Approximation

4.3.3 最大似然估计与偏差 / Maximum Likelihood and Bias

4.3.4 滑动窗口滤波器 / Sliding-Window Filters

4.4 批量连续时间估计 / Batch Continuous-Time Estimation

4.5 本章小结 / Chapter Summary

习题 / Exercises

Graph View

Table of Contents

Backlinks

Chunibyo

Explorer

ch04_nonlinear

第四章 非线性非高斯估计 §

Chapter 4 Nonlinear Non-Gaussian Estimation §

4.1 引言：非线性带来的根本挑战 / Introduction: The Fundamental Challenge of Nonlinearity §

4.1.1 一个直觉性例子：立体相机 / Motivating Example: Stereo Camera §

4.1.2 MAP 估计的偏差 / Bias of MAP Estimation §

4.2 递推离散时间估计 / Recursive Discrete-Time Estimation §

4.2.1 非线性模型设置 / Nonlinear Problem Setup §

4.2.2 贝叶斯滤波器 / The Bayes Filter §

4.2.3 扩展卡尔曼滤波器（EKF）/ Extended Kalman Filter §

4.2.4 迭代扩展卡尔曼滤波器（IEKF）/ Iterated Extended Kalman Filter §

4.2.5 将 PDF 通过非线性传播的三种方法 / Three Ways to Pass a PDF Through a Nonlinearity §

4.2.6 粒子滤波器 / Particle Filter §

4.2.7 Sigma 点卡尔曼滤波器（SPKF）/ Sigma-Point Kalman Filter §

4.2.8 各滤波器的分类图谱 / Taxonomy of Filters §

4.3 批量离散时间估计 / Batch Discrete-Time Estimation §

4.3.1 MAP 估计与 Gauss-Newton 方法 / MAP Estimation and Gauss-Newton §

4.3.2 Laplace 近似：从 MAP 获得不确定性 / Laplace Approximation §

4.3.3 最大似然估计与偏差 / Maximum Likelihood and Bias §

4.3.4 滑动窗口滤波器 / Sliding-Window Filters §

4.4 批量连续时间估计 / Batch Continuous-Time Estimation §

4.5 本章小结 / Chapter Summary §

习题 / Exercises §

Graph View

Table of Contents

Backlinks

第四章　非线性非高斯估计

Chapter 4 Nonlinear Non-Gaussian Estimation

4.1 引言：非线性带来的根本挑战 / Introduction: The Fundamental Challenge of Nonlinearity

4.1.1 一个直觉性例子：立体相机 / Motivating Example: Stereo Camera

4.1.2 MAP 估计的偏差 / Bias of MAP Estimation

4.2 递推离散时间估计 / Recursive Discrete-Time Estimation

4.2.1 非线性模型设置 / Nonlinear Problem Setup

4.2.2 贝叶斯滤波器 / The Bayes Filter

4.2.3 扩展卡尔曼滤波器（EKF）/ Extended Kalman Filter

4.2.4 迭代扩展卡尔曼滤波器（IEKF）/ Iterated Extended Kalman Filter

4.2.5 将 PDF 通过非线性传播的三种方法 / Three Ways to Pass a PDF Through a Nonlinearity

4.2.6 粒子滤波器 / Particle Filter

4.2.7 Sigma 点卡尔曼滤波器（SPKF）/ Sigma-Point Kalman Filter

4.2.8 各滤波器的分类图谱 / Taxonomy of Filters

4.3 批量离散时间估计 / Batch Discrete-Time Estimation

4.3.1 MAP 估计与 Gauss-Newton 方法 / MAP Estimation and Gauss-Newton

4.3.2 Laplace 近似：从 MAP 获得不确定性 / Laplace Approximation

4.3.3 最大似然估计与偏差 / Maximum Likelihood and Bias

4.3.4 滑动窗口滤波器 / Sliding-Window Filters

4.4 批量连续时间估计 / Batch Continuous-Time Estimation

4.5 本章小结 / Chapter Summary

习题 / Exercises