20.1 性能优化与调试

CGAL作为计算几何算法库，在处理大规模数据时性能至关重要。本章将深入探讨CGAL程序的性能优化策略、调试技巧以及常见性能陷阱的避免方法。

20.1.1 性能剖析基础

性能优化首先需要准确识别瓶颈所在。CGAL提供了多种内置工具，同时也支持标准的外部剖析工具。

使用CGAL内置计时器

CGAL在<CGAL/Real_timer.h>和<CGAL/Timer.h>中提供了高精度计时器，用于测量代码段的执行时间。

#include <CGAL/Real_timer.h>
#include <iostream>
 
void benchmark_example()
{
    CGAL::Real_timer timer;
    
    // 开始计时
    timer.start();
    
    // 执行需要测量的代码
    for (int i = 0; i < 1000000; ++i) {
        // 某些计算
    }
    
    // 停止计时
    timer.stop();
    
    std::cout << "Elapsed time: " << timer.time() << " seconds" << std::endl;
    std::cout << "Precision: " << timer.precision() << " seconds" << std::endl;
    
    // 重置计时器用于下一轮测试
    timer.reset();
}

Real_timer与Timer的区别在于：

Real_timer：测量墙上时钟时间（wall-clock time），包含所有系统活动
Timer：测量进程CPU时间，仅包含当前进程消耗的时间

对于多线程程序，通常使用Real_timer更能够反映实际等待时间。

性能剖析宏

当定义了CGAL_PROFILE宏时，CGAL提供了一系列剖析工具：

#define CGAL_PROFILE  // 必须在包含头文件之前定义
 
#include <CGAL/Profile_timer.h>
#include <CGAL/Profile_counter.h>
 
// 函数级时间剖析
void expensive_function()
{
    CGAL_TIME_PROFILER("expensive_function");
    // 函数体
    for (int i = 0; i < 1000; ++i) {
        // 某些计算
    }
}
 
// 计数器剖析
void count_invocations()
{
    CGAL_PROFILER("count_invocations called");
    // 函数体
}
 
// 分支预测剖析
void branch_analysis(bool condition)
{
    CGAL_BRANCH_PROFILER("branch hit/miss", branch_stats);
    if (condition) {
        CGAL_BRANCH_PROFILER_BRANCH(branch_stats);
        // 分支代码
    }
}
 
// 直方图计数器
void histogram_example(int value)
{
    CGAL_HISTOGRAM_PROFILER("value distribution", value);
}
 
int main()
{
    for (int i = 0; i < 100000; ++i) {
        expensive_function();
        count_invocations();
        branch_analysis(i % 2 == 0);
        histogram_example(i % 10);
    }
    return 0;
}

编译并运行后，程序会在退出时输出剖析结果：

[CGAL::Profile_timer]      0.523 seconds spent in this function (total time)
[CGAL::Profile_counter]  100.000 count_invocations called
[CGAL::Profile_branch_counter]     50.000 /    100.000 branch hit/miss
[CGAL::Profile_histogram_counter] value distribution [          0 :     10.000 ]
[CGAL::Profile_histogram_counter] value distribution [          1 :     10.000 ]
...

内核谓词剖析

对于几何计算密集型应用，可以使用Kernel_profiler来统计各个几何谓词的调用次数：

#include <CGAL/Kernel_profiler.h>
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
 
typedef CGAL::Exact_predicates_inexact_constructions_kernel Base_kernel;
typedef CGAL::Kernel_profiler<Base_kernel> Profiled_kernel;
 
typedef Profiled_kernel::Point_2 Point_2;
typedef Profiled_kernel::Triangle_2 Triangle_2;
 
void profile_predicates()
{
    Point_2 p1(0, 0), p2(1, 0), p3(0, 1);
    Triangle_2 tri(p1, p2, p3);
    
    // 每次谓词调用都会被记录
    for (int i = 0; i < 1000; ++i) {
        Point_2 q(i * 0.001, i * 0.001);
        tri.bounded_side(q);  // 记录bounded_side谓词调用
    }
}

外部剖析工具集成

对于更详细的性能分析，可以结合外部工具：

使用gprof：

# 编译时添加-pg选项
g++ -pg -O2 program.cpp -lCGAL -o program
./program
gprof ./program gmon.out > analysis.txt

使用perf（Linux）：

# 记录性能数据
perf record ./program
 
# 生成报告
perf report
 
# 火焰图生成
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg

使用Valgrind/Callgrind：

valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

使用Intel VTune：

vtune -collect hotspots ./program
vtune -report summary

内存使用监控

CGAL提供了Memory_sizer类来监控内存使用情况：

#include <CGAL/Memory_sizer.h>
#include <iostream>
 
void monitor_memory()
{
    CGAL::Memory_sizer mem;
    
    std::cout << "Virtual memory: " << mem.virtual_size() << " bytes" << std::endl;
    std::cout << "Resident memory: " << mem.resident_size() << " bytes" << std::endl;
    
    // 分配大量内存
    std::vector<double> large_vector(1000000);
    
    std::cout << "After allocation - Virtual: " << mem.virtual_size() << " bytes" << std::endl;
    std::cout << "After allocation - Resident: " << mem.resident_size() << " bytes" << std::endl;
}

20.1.2 内核选择的性能影响

内核（Kernel）是CGAL中最重要的性能决定因素之一。选择合适的内核可以在保证正确性的同时显著提升性能。

EPICK vs EPECK性能对比

特性	EPICK (Exact Predicates, Inexact Constructions)	EPECK (Exact Predicates, Exact Constructions)
谓词精度	精确	精确
构造精度	近似（double）	精确（任意精度）
典型速度	快（1x）	慢（10-100x）
内存占用	低	高
适用场景	大多数算法	需要精确坐标的算法

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Exact_predicates_exact_constructions_kernel.h>
#include <CGAL/Delaunay_triangulation_3.h>
#include <CGAL/Real_timer.h>
#include <CGAL/point_generators_3.h>
 
typedef CGAL::Exact_predicates_inexact_constructions_kernel EPICK;
typedef CGAL::Exact_predicates_exact_constructions_kernel EPECK;
 
template <typename Kernel>
void benchmark_triangulation(int num_points)
{
    typedef CGAL::Delaunay_triangulation_3<Kernel> Triangulation;
    typedef typename Kernel::Point_3 Point;
    
    CGAL::Random_points_in_cube_3<Point> rnd(1.0);
    std::vector<Point> points;
    points.reserve(num_points);
    
    for (int i = 0; i < num_points; ++i) {
        points.push_back(*rnd++);
    }
    
    CGAL::Real_timer timer;
    timer.start();
    
    Triangulation dt(points.begin(), points.end());
    
    timer.stop();
    std::cout << "Time: " << timer.time() << " seconds" << std::endl;
}
 
int main()
{
    const int NUM_POINTS = 10000;
    
    std::cout << "EPICK benchmark:" << std::endl;
    benchmark_triangulation<EPICK>(NUM_POINTS);
    
    std::cout << "\nEPECK benchmark:" << std::endl;
    benchmark_triangulation<EPECK>(NUM_POINTS);
    
    return 0;
}

精确构造的代价

精确构造使用任意精度算术（如GMP），其代价主要体现在：

算术运算成本：大整数运算比浮点运算慢10-100倍
内存分配：精确数需要动态内存分配
缓存效率：大数无法放入寄存器，缓存命中率降低

#include <CGAL/Exact_predicates_exact_constructions_kernel.h>
#include <CGAL/Lazy_exact_nt.h>
#include <iostream>
 
typedef CGAL::Exact_predicates_exact_constructions_kernel EPECK;
 
void demonstrate_exact_cost()
{
    EPECK::Point_3 p1(0, 0, 0);
    EPECK::Point_3 p2(1, 0, 0);
    EPECK::Point_3 p3(0, 1, 0);
    
    // 这个构造会产生精确的有理数坐标
    EPECK::Point_3 centroid = CGAL::centroid(p1, p2, p3);
    
    // 坐标是Lazy_exact_nt类型，仅在需要时才计算精确值
    std::cout << "Centroid: " << centroid << std::endl;
    
    // 强制转换为double（触发精确计算）
    double x = CGAL::to_double(centroid.x());
    std::cout << "As double: " << x << std::endl;
}

混合精度方法

在许多应用中，可以结合使用不同精度的内核：

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Exact_predicates_exact_constructions_kernel.h>
#include <CGAL/Cartesian_converter.h>
 
typedef CGAL::Exact_predicates_inexact_constructions_kernel EPICK;
typedef CGAL::Exact_predicates_exact_constructions_kernel EPECK;
 
// 内核转换器
CGAL::Cartesian_converter<EPICK, EPECK> to_exact;
CGAL::Cartesian_converter<EPECK, EPICK> to_inexact;
 
void hybrid_precision_example()
{
    // 使用EPICK进行大部分计算
    std::vector<EPICK::Point_3> points;
    // ... 填充points ...
    
    // 只在需要时转换为精确内核
    std::vector<EPECK::Point_3> exact_points;
    for (const auto& p : points) {
        exact_points.push_back(to_exact(p));
    }
    
    // 执行需要精确坐标的操作
    // ...
    
    // 转换回近似内核
    EPICK::Point_3 result = to_inexact(exact_points[0]);
}

内核选择决策树

算法是否需要精确构造？
├── 是（如：需要精确坐标输出）
│   ├── 输出精度要求有限 → EPECK + to_double转换
│   └── 需要任意精度 → 纯EPECK
└── 否（大多数情况）
    ├── 需要鲁棒谓词 → EPICK（推荐）
    └── 简单计算 → Simple_cartesian<double>

静态过滤器的影响

CGAL的过滤内核默认启用静态过滤器，可以在编译时通过宏控制：

// 禁用静态过滤器（通常不推荐）
#define CGAL_NO_STATIC_FILTERS
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
 
// 或者通过编译选项
// g++ -DCGAL_NO_STATIC_FILTERS program.cpp

静态过滤器通过快速区间算术过滤大部分谓词调用，只有当区间包含零时才使用精确计算。禁用后性能通常会显著下降。

20.1.3 空间数据结构优化

空间数据结构（如AABB树、kd树）的性能对许多几何算法至关重要。

AABB树的构建策略

AABB树的构建质量直接影响查询性能。CGAL提供了多种分割策略：

#include <CGAL/AABB_tree.h>
#include <CGAL/AABB_traits_3.h>
#include <CGAL/AABB_triangle_primitive_3.h>
#include <CGAL/Simple_cartesian.h>
#include <vector>
 
typedef CGAL::Simple_cartesian<double> K;
typedef K::FT FT;
typedef K::Ray_3 Ray;
typedef K::Point_3 Point;
typedef K::Triangle_3 Triangle;
 
typedef std::vector<Triangle>::const_iterator Iterator;
typedef CGAL::AABB_triangle_primitive_3<K, Iterator> Primitive;
typedef CGAL::AABB_traits_3<K, Primitive> AABB_triangle_traits;
typedef CGAL::AABB_tree<AABB_triangle_traits> Tree;
 
void build_optimized_aabb_tree(const std::vector<Triangle>& triangles)
{
    // 默认构建使用中间分割策略
    Tree tree(triangles.begin(), triangles.end());
    
    // 可以自定义分割策略（通过traits）
    // 注意：CGAL 5.x中分割策略已内置优化
    
    // 预计算加速结构
    tree.accelerate_distance_queries();
    
    // 构建后统计
    std::cout << "Tree depth: " << tree.depth() << std::endl;
}

批量查询优化

对于大量查询，批量处理可以更好地利用缓存和并行性：

#include <CGAL/AABB_tree.h>
#include <vector>
#include <future>
 
// 批量最近点查询
void batch_nearest_queries(Tree& tree, const std::vector<Point>& queries, 
                           std::vector<Point>& nearest_points)
{
    nearest_points.resize(queries.size());
    
    // 顺序版本
    for (size_t i = 0; i < queries.size(); ++i) {
        nearest_points[i] = tree.closest_point(queries[i]);
    }
}
 
// 并行批量查询（使用OpenMP）
void parallel_batch_queries(Tree& tree, const std::vector<Point>& queries,
                            std::vector<Point>& nearest_points)
{
    nearest_points.resize(queries.size());
    
    #pragma omp parallel for
    for (size_t i = 0; i < queries.size(); ++i) {
        nearest_points[i] = tree.closest_point(queries[i]);
    }
}

缓存友好布局

当原始几何数据布局不连续时，可以创建连续的副本：

// 原始数据：可能分散在内存中
std::list<Triangle> scattered_triangles;
 
// 优化：复制到连续数组
std::vector<Triangle> contiguous_triangles(scattered_triangles.begin(), 
                                           scattered_triangles.end());
 
// 使用连续数据构建树
Tree tree(contiguous_triangles.begin(), contiguous_triangles.end());

查询加速技术

// 距离查询加速
void optimize_distance_queries(Tree& tree)
{
    // 预计算内部加速结构
    tree.accelerate_distance_queries();
    
    // 使用提示进行查询（当已知上界时）
    Point query_point(0.5, 0.5, 0.5);
    double max_distance = 1.0;
    
    auto result = tree.closest_point(query_point, max_distance);
}
 
// 相交查询加速
void optimize_intersection_queries(Tree& tree)
{
    // 使用包围盒预过滤
    K::Iso_cuboid_3 bbox(Point(0, 0, 0), Point(1, 1, 1));
    
    std::list<Tree::Primitive_id> intersections;
    tree.all_intersected_primitives(bbox, std::back_inserter(intersections));
}

20.1.4 内存管理最佳实践

高效的内存管理对CGAL程序性能至关重要，特别是在处理大规模网格时。

避免不必要的拷贝

#include <CGAL/Surface_mesh.h>
#include <CGAL/boost/graph/copy_face_graph.h>
 
typedef CGAL::Simple_cartesian<double> K;
typedef CGAL::Surface_mesh<K::Point_3> Mesh;
 
// 不好的做法：不必要的拷贝
Mesh process_mesh_bad(Mesh input)  // 按值传递导致拷贝
{
    // 处理...
    return input;  // 再次拷贝
}
 
// 好的做法：使用引用
void process_mesh_good(Mesh& input)
{
    // 直接修改，无拷贝
}
 
// 如果需要保持原始数据，使用const引用和输出参数
void process_mesh_better(const Mesh& input, Mesh& output)
{
    // 只拷贝需要的数据
    CGAL::copy_face_graph(input, output);
    // 处理output...
}

使用适当的属性映射

Surface_mesh的属性系统提供了高效的属性存储：

#include <CGAL/Surface_mesh.h>
 
typedef CGAL::Surface_mesh<K::Point_3> Mesh;
typedef boost::graph_traits<Mesh>::vertex_descriptor vertex_descriptor;
 
void efficient_property_usage(Mesh& mesh)
{
    // 预分配属性存储
    mesh.reserve(num_vertices, num_edges, num_faces);
    
    // 添加自定义属性
    auto vertex_color = mesh.add_property_map<vertex_descriptor, CGAL::Color>("v:color").first;
    auto vertex_normal = mesh.add_property_map<vertex_descriptor, K::Vector_3>("v:normal").first;
    
    // 批量访问属性（缓存友好）
    for (vertex_descriptor v : mesh.vertices()) {
        vertex_color[v] = CGAL::Color(255, 0, 0);
        vertex_normal[v] = K::Vector_3(0, 0, 1);
    }
    
    // 删除不再需要的属性以释放内存
    mesh.remove_property_map(vertex_color);
}

预分配内存

void preallocate_memory()
{
    Mesh mesh;
    
    // 如果已知大致规模，预先分配
    const size_t expected_vertices = 1000000;
    const size_t expected_faces = 2000000;
    
    mesh.reserve(expected_vertices, expected_faces * 3 / 2, expected_faces);
    
    // 对于属性数组，也可以预分配
    auto custom_prop = mesh.add_property_map<vertex_descriptor, double>("v:custom").first;
    custom_prop.reserve(expected_vertices);
    
    // 现在添加元素不会触发重新分配
    for (size_t i = 0; i < expected_vertices; ++i) {
        mesh.add_vertex(K::Point_3(i, 0, 0));
    }
}

Compact Container的使用

CGAL的Compact_container提供了内存高效的元素存储：

#include <CGAL/Compact_container.h>
#include <CGAL/Real_timer.h>
 
struct Element {
    int data;
    void* p;  // 用于Compact_container的指针
    
    void* for_compact_container() const { return p; }
    void for_compact_container(void* ptr) { p = ptr; }
};
 
void compare_containers()
{
    const int N = 1000000;
    
    // 标准vector
    CGAL::Real_timer t;
    t.start();
    std::vector<Element> vec;
    vec.reserve(N);
    for (int i = 0; i < N; ++i) {
        vec.push_back(Element{i, nullptr});
    }
    t.stop();
    std::cout << "Vector time: " << t.time() << std::endl;
    
    // Compact_container
    t.reset();
    t.start();
    CGAL::Compact_container<Element> cc;
    for (int i = 0; i < N; ++i) {
        cc.emplace(i);
    }
    t.stop();
    std::cout << "Compact_container time: " << t.time() << std::endl;
    
    // Compact_container的优势：
    // 1. 稳定的句柄（即使添加/删除元素，现有句柄仍然有效）
    // 2. 内存局部性更好
    // 3. 支持高效遍历已使用元素
}

内存池和自定义分配器

对于频繁的小内存分配，可以使用内存池：

#include <CGAL/Concurrent_compact_container.h>
#include <tbb/cache_aligned_allocator.h>
 
// 使用TBB的缓存对齐分配器
void use_aligned_allocator()
{
    typedef CGAL::Concurrent_compact_container<
        Element,
        CGAL::Default,
        CGAL::Default,
        tbb::cache_aligned_allocator<Element>
    > AlignedContainer;
    
    AlignedContainer container;
    // 元素将按缓存行对齐，减少伪共享
}

20.1.5 并行计算

CGAL从版本4.3开始逐步引入并行算法支持，可以显著加速大规模计算。

CGAL的并发支持

CGAL的并行支持主要基于Intel TBB（Threading Building Blocks）库：

#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Delaunay_triangulation_3.h>
#include <CGAL/Delaunay_triangulation_cell_base_3.h>
#include <CGAL/Triangulation_vertex_base_3.h>
#include <CGAL/Real_timer.h>
 
typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef K::Point_3 Point;
 
// 并行TDS配置
typedef CGAL::Triangulation_data_structure_3<
    CGAL::Triangulation_vertex_base_3<K>,
    CGAL::Delaunay_triangulation_cell_base_3<K>,
    CGAL::Parallel_tag>  // 关键：使用Parallel_tag
    Parallel_TDS;
 
typedef CGAL::Delaunay_triangulation_3<K, Parallel_TDS> Parallel_DT;
 
// 串行TDS配置
typedef CGAL::Delaunay_triangulation_3<K> Sequential_DT;
 
void compare_parallel_sequential(const std::vector<Point>& points)
{
    CGAL::Real_timer t;
    
    // 串行版本
    t.start();
    Sequential_DT seq_dt(points.begin(), points.end());
    t.stop();
    std::cout << "Sequential: " << t.time() << " seconds" << std::endl;
    
#ifdef CGAL_LINKED_WITH_TBB
    // 并行版本
    Parallel_DT::Lock_data_structure locking_ds(
        CGAL::Bbox_3(-1., -1., -1., 1., 1., 1.), 50);
    
    t.reset();
    t.start();
    Parallel_DT par_dt(points.begin(), points.end(), &locking_ds);
    t.stop();
    std::cout << "Parallel: " << t.time() << " seconds" << std::endl;
    std::cout << "Speedup: " << t.time() / t.time() << "x" << std::endl;
#endif
}

线程安全问题

在使用CGAL并行功能时需要注意：

// 线程安全的操作
void thread_safe_operations()
{
#ifdef CGAL_LINKED_WITH_TBB
    Parallel_DT dt;
    Parallel_DT::Lock_data_structure locking_ds(
        CGAL::Bbox_3(-1, -1, -1, 1, 1, 1), 50);
    
    // 并行插入是线程安全的
    std::vector<Point> points = generate_points();
    dt.insert(points.begin(), points.end(), &locking_ds);
    
    // 但遍历和修改不能同时进行
    // 以下是不安全的：
    // #pragma omp parallel for
    // for (auto v : dt.finite_vertex_handles()) { dt.remove(v); }
#endif
}
 
// 使用Spatial_lock_grid_3进行细粒度锁定
void fine_grained_locking()
{
#ifdef CGAL_LINKED_WITH_TBB
    // 创建空间锁网格
    CGAL::Spatial_lock_grid_3<CGAL::Tag_priority_blocking> lock_grid(
        CGAL::Bbox_3(-1, -1, -1, 1, 1, 1),  // 包围盒
        50                                   // 每轴网格数
    );
    
    // 在并行算法中使用锁网格
    // 每个线程在访问特定空间区域前获取锁
#endif
}

OpenMP与TBB集成

CGAL可以与OpenMP结合使用：

#include <omp.h>
#include <CGAL/AABB_tree.h>
 
void hybrid_parallelism(const std::vector<Point>& queries, Tree& tree)
{
    std::vector<Point> results(queries.size());
    
    // 使用OpenMP并行化查询
    #pragma omp parallel for schedule(dynamic)
    for (size_t i = 0; i < queries.size(); ++i) {
        results[i] = tree.closest_point(queries[i]);
    }
}
 
// 注意：混合使用TBB（CGAL内部）和OpenMP时需谨慎
// 建议设置线程数以避免过度订阅
void set_thread_counts()
{
    int num_threads = std::thread::hardware_concurrency();
    
    // 设置TBB线程数
    tbb::task_scheduler_init init(num_threads / 2);
    
    // 设置OpenMP线程数
    omp_set_num_threads(num_threads / 2);
}

并行算法最佳实践

// 1. 数据并行：确保工作负载均衡
void balanced_workload(const std::vector<Mesh>& meshes)
{
    #pragma omp parallel for schedule(dynamic, 1)
    for (size_t i = 0; i < meshes.size(); ++i) {
        // 处理不同大小的网格
        process_mesh(meshes[i]);
    }
}
 
// 2. 任务并行：使用TBB任务组
void task_parallelism()
{
    tbb::task_group group;
    
    group.run([&] { operation_a(); });
    group.run([&] { operation_b(); });
    group.run([&] { operation_c(); });
    
    group.wait();
}
 
// 3. 流水线并行
void pipeline_parallelism()
{
    // 使用TBB的parallel_pipeline
    tbb::parallel_pipeline(
        4,  // 最大并发token数
        tbb::make_filter<void, Data>(
            tbb::filter_mode::serial_in_order,
            [&](tbb::flow_control& fc) -> Data {
                return read_data(fc);
            }
        ) &
        tbb::make_filter<Data, Data>(
            tbb::filter_mode::parallel,
            [&](Data d) -> Data {
                return process_data(d);
            }
        ) &
        tbb::make_filter<Data, void>(
            tbb::filter_mode::serial_in_order,
            [&](Data d) {
                write_data(d);
            }
        )
    );
}

20.1.6 常见性能陷阱

陷阱1：内核不匹配

// 问题：混合使用不同内核导致类型转换开销
void kernel_mismatch()
{
    EPICK::Point_3 p1(0, 0, 0);
    EPECK::Point_2 p2(1, 1);  // 维度也不同！
    
    // 编译错误或隐式转换开销
}
 
// 解决：统一内核类型
void kernel_consistency()
{
    typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
    K::Point_3 p1(0, 0, 0);
    K::Point_3 p2(1, 1, 1);
}

陷阱2：过度使用精确构造

// 问题：使用EPECK进行简单可视化计算
void overuse_exact()
{
    typedef CGAL::Exact_predicates_exact_constructions_kernel K;
    K::Point_3 p(0.1, 0.2, 0.3);
    
    // 简单平移不需要精确构造
    K::Point_3 translated(p.x() + 1, p.y() + 2, p.z() + 3);
    // 每次算术运算都涉及Lazy_exact_nt的开销
}
 
// 解决：使用EPICK
void use_inexact()
{
    typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
    K::Point_3 p(0.1, 0.2, 0.3);
    K::Point_3 translated(p.x() + 1, p.y() + 2, p.z() + 3);  // 快速浮点运算
}

陷阱3：频繁的动态内存分配

// 问题：循环内频繁分配
void frequent_allocation_bad()
{
    for (int i = 0; i < 1000000; ++i) {
        std::vector<Point> temp;  // 每次迭代都分配
        temp.push_back(points[i]);
        process(temp);
    }
}
 
// 解决：预分配和重用
void reuse_allocation_good()
{
    std::vector<Point> temp;
    temp.reserve(100);  // 预分配
    
    for (int i = 0; i < 1000000; ++i) {
        temp.clear();  // 重用内存
        temp.push_back(points[i]);
        process(temp);
    }
}

陷阱4：缓存不友好的访问模式

// 问题：跳跃式访问
void cache_unfriendly()
{
    // 假设mesh.vertices()返回的索引不连续
    for (size_t i = 0; i < num_vertices; ++i) {
        // 访问分散在内存中的属性
        auto v = mesh.vertex(i);
        normal[v] = compute_normal(v);  // 缓存未命中
    }
}
 
// 解决：顺序访问
void cache_friendly()
{
    // 使用迭代器进行顺序访问
    for (auto v : mesh.vertices()) {
        normal[v] = compute_normal(v);  // 更好的缓存局部性
    }
}

陷阱5：不必要的几何谓词调用

// 问题：重复计算相同的谓词
void redundant_predicates_bad(const std::vector<Point>& points)
{
    for (size_t i = 0; i < points.size(); ++i) {
        for (size_t j = i + 1; j < points.size(); ++j) {
            for (size_t k = j + 1; k < points.size(); ++k) {
                // 每次循环都重新计算方向
                if (CGAL::orientation(points[i], points[j], points[k]) 
                    == CGAL::LEFT_TURN) {
                    // ...
                }
            }
        }
    }
}
 
// 解决：缓存谓词结果
void cache_predicates_good(const std::vector<Point>& points)
{
    // 预计算并缓存方向测试结果
    std::vector<std::vector<int>> orientation_cache(n, 
        std::vector<int>(n));
    
    for (size_t i = 0; i < points.size(); ++i) {
        for (size_t j = i + 1; j < points.size(); ++j) {
            for (size_t k = j + 1; k < points.size(); ++k) {
                // 使用缓存的结果
                if (orientation_cache[i][j] == CGAL::LEFT_TURN) {
                    // ...
                }
            }
        }
    }
}

陷阱6：忽略算法复杂度

// 问题：使用O(n^2)算法处理大数据
void quadratic_algorithm(const Mesh& mesh)
{
    for (auto f1 : mesh.faces()) {
        for (auto f2 : mesh.faces()) {  // O(n^2)
            if (intersect(mesh, f1, f2)) {
                // ...
            }
        }
    }
}
 
// 解决：使用空间数据结构
void linearithmic_algorithm(const Mesh& mesh)
{
    // 构建AABB树，查询降为O(log n)
    Tree tree(mesh.faces_begin(), mesh.faces_end());
    
    for (auto f : mesh.faces()) {
        // 使用树进行高效查询
        tree.all_intersections(f);
    }
}

20.1.7 调试技巧

启用CGAL断言

CGAL提供了多级断言和调试支持：

// 在包含CGAL头文件之前定义以下宏之一
 
// 启用所有断言（最严格）
#define CGAL_DEBUG
 
// 或者只启用特定级别
#define CGAL_NDEBUG  // 禁用所有断言（发布模式）
#define CGAL_CHECK_EXACTNESS  // 检查精确性
#define CGAL_CHECK_EXPENSIVE  // 启用昂贵的检查
 
#include <CGAL/Surface_mesh.h>

断言行为自定义

#include <CGAL/assertions_behaviour.h>
#include <CGAL/exceptions.h>
 
void custom_assertion_handler()
{
    // 设置断言失败时的行为
    CGAL::set_error_handler([](const char* type, const char* expr, 
                               const char* file, int line, 
                               const char* msg) {
        std::cerr << "CGAL Error [" << type << "]: " << expr << std::endl;
        std::cerr << "  at " << file << ":" << line << std::endl;
        if (msg) std::cerr << "  message: " << msg << std::endl;
        
        // 可以抛出异常而不是终止
        throw CGAL::Assertion_exception(type, expr, file, line, msg);
    });
    
    // 设置警告处理
    CGAL::set_warning_handler([](const char* type, const char* expr,
                                 const char* file, int line,
                                 const char* msg) {
        std::cerr << "CGAL Warning [" << type << "]: " << expr << std::endl;
        // 继续执行
    });
}

几何验证

#include <CGAL/Polygon_mesh_processing/self_intersections.h>
#include <CGAL/Polygon_mesh_processing/orientation.h>
 
namespace PMP = CGAL::Polygon_mesh_processing;
 
void validate_mesh(const Mesh& mesh)
{
    // 检查自相交
    bool intersecting = PMP::does_self_intersect(mesh);
    if (intersecting) {
        std::cerr << "Warning: Mesh has self-intersections" << std::endl;
    }
    
    // 检查方向一致性
    bool oriented = PMP::is_outward_oriented(mesh);
    std::cout << "Outward oriented: " << oriented << std::endl;
    
    // 检查是否闭合
    bool closed = CGAL::is_closed(mesh);
    std::cout << "Closed mesh: " << closed << std::endl;
    
    // 检查有效性
    bool valid = mesh.is_valid();
    std::cout << "Valid mesh: " << valid << std::endl;
}

可视化调试

// 使用CGAL::draw进行可视化调试
#include <CGAL/draw_surface_mesh.h>
#include <CGAL/draw_triangulation_3.h>
 
void visualize_for_debug(const Mesh& mesh)
{
    // 打开交互式可视化窗口
    CGAL::draw(mesh);
    
    // 可以添加暂停以检查状态
    std::cout << "Press Enter to continue..." << std::endl;
    std::cin.get();
}
 
// 导出中间结果
void export_intermediate(const Mesh& mesh, int step)
{
    std::string filename = "debug_step_" + std::to_string(step) + ".off";
    CGAL::IO::write_OFF(filename, mesh);
}

条件断点与日志

// 使用条件日志进行调试
#ifdef CGAL_DEBUG_MESH_PROCESSING
    #define MESH_DEBUG_LOG(msg) std::cerr << "[DEBUG] " << msg << std::endl
#else
    #define MESH_DEBUG_LOG(msg)
#endif
 
void process_with_logging(Mesh& mesh)
{
    MESH_DEBUG_LOG("Starting mesh processing");
    MESH_DEBUG_LOG("Input vertices: " << mesh.number_of_vertices());
    
    // 处理步骤...
    
    MESH_DEBUG_LOG("After step 1: " << mesh.number_of_vertices() << " vertices");
}

20.1.8 性能优化检查清单

在发布CGAL应用程序之前，请检查以下项目：

编译配置

使用Release模式（-O3 -DNDEBUG）
启用链接时优化（-flto）
使用适合目标CPU的架构标志（-march=native）
确保CGAL_ASSERTIONS被禁用

内核选择

评估是否真的需要精确构造
考虑使用EPICK替代EPECK
验证静态过滤器已启用
检查内核类型在整个程序中一致

内存管理

预分配已知大小的容器
避免循环内频繁分配
删除不再需要的属性映射
考虑使用shrink_to_fit()释放未使用内存

算法优化

使用适当的空间数据结构（AABB树、kd树）
批量处理查询而非逐个处理
评估算法复杂度是否适合数据规模
考虑并行化独立计算

调试与验证

在调试版本中启用所有断言
验证几何输入的有效性
检查数值稳定性边界情况
使用剖析工具确认优化效果

20.1.9 本章小结

本章深入探讨了CGAL程序的性能优化与调试技术：

性能剖析：使用CGAL内置计时器和剖析宏，结合外部工具如perf、gprof进行详细分析。
内核选择：EPICK在大多数场景下提供了最佳性能，而EPECK仅在需要精确坐标时使用。混合精度方法可以平衡性能和精度。
空间数据结构：合理使用AABB树等结构可以将查询复杂度从O(n)降低到O(log n)。
内存管理：预分配、避免拷贝、使用属性映射等技巧可以显著提升大规模数据处理性能。
并行计算：CGAL基于TBB的并行支持可以充分利用多核CPU，但需要注意线程安全和数据竞争问题。
常见陷阱：避免内核不匹配、过度使用精确构造、缓存不友好的访问模式等问题。
调试技巧：合理使用CGAL的断言系统、几何验证和可视化工具可以快速定位问题。

性能优化是一个迭代过程，应当遵循”先剖析、后优化”的原则，避免过早优化带来的复杂性。

20.1.10 练习

练习1：内核性能对比

编写一个程序比较EPICK和EPECK在以下操作上的性能差异：

构建10万个点的Delaunay三角剖分
计算10万个三角形的重心
执行100万次方向测试

分析并解释性能差异的原因。

练习2：AABB树优化

给定一个包含100万个三角形的网格，实现以下优化：

批量最近点查询（批量大小1000）
使用OpenMP并行化查询
比较优化前后的性能

练习3：内存优化

优化以下代码片段，减少内存分配次数：

Mesh simplify_mesh(const Mesh& input)
{
    Mesh output;
    for (auto f : input.faces()) {
        std::vector<Point> face_points;
        for (auto v : vertices_around_face(f, input)) {
            face_points.push_back(input.point(v));
        }
        if (face_points.size() == 3) {
            output.add_face(face_points);
        }
    }
    return output;
}

练习4：并行三角剖分

使用CGAL的并行Delaunay三角剖分功能，测试不同线程数下的加速比。绘制加速比随线程数变化的曲线，并分析可扩展性限制。

练习5：性能剖析实战

对一个实际的CGAL程序进行完整的性能剖析：

使用CGAL_PROFILE宏识别热点
使用perf生成火焰图
识别并修复至少两个性能瓶颈
验证优化效果

本章参考代码位置：part7-practice/chapter20-performance/code/

Chunibyo

Explorer

20.1 性能优化与调试

20.1 性能优化与调试 §

相关章节 §

20.1.1 性能剖析基础 §

使用CGAL内置计时器 §

性能剖析宏 §

内核谓词剖析 §

外部剖析工具集成 §

内存使用监控 §

20.1.2 内核选择的性能影响 §

EPICK vs EPECK性能对比 §

精确构造的代价 §

混合精度方法 §

内核选择决策树 §

静态过滤器的影响 §

20.1.3 空间数据结构优化 §

AABB树的构建策略 §

批量查询优化 §

缓存友好布局 §

查询加速技术 §

20.1.4 内存管理最佳实践 §

避免不必要的拷贝 §

使用适当的属性映射 §

预分配内存 §

Compact Container的使用 §

内存池和自定义分配器 §

20.1.5 并行计算 §

CGAL的并发支持 §

线程安全问题 §

OpenMP与TBB集成 §

并行算法最佳实践 §

20.1.6 常见性能陷阱 §

陷阱1：内核不匹配 §

陷阱2：过度使用精确构造 §

陷阱3：频繁的动态内存分配 §

陷阱4：缓存不友好的访问模式 §

陷阱5：不必要的几何谓词调用 §

陷阱6：忽略算法复杂度 §

20.1.7 调试技巧 §

启用CGAL断言 §

断言行为自定义 §

几何验证 §

可视化调试 §

条件断点与日志 §

20.1.8 性能优化检查清单 §

编译配置 §

内核选择 §

内存管理 §

算法优化 §

调试与验证 §

20.1.9 本章小结 §

20.1.10 练习 §

练习1：内核性能对比 §

练习2：AABB树优化 §

练习3：内存优化 §

练习4：并行三角剖分 §

练习5：性能剖析实战 §

Graph View

Table of Contents

Backlinks

20.1 性能优化与调试

相关章节

20.1.1 性能剖析基础

使用CGAL内置计时器

性能剖析宏

内核谓词剖析

外部剖析工具集成

内存使用监控

20.1.2 内核选择的性能影响

EPICK vs EPECK性能对比

精确构造的代价

混合精度方法

内核选择决策树

静态过滤器的影响

20.1.3 空间数据结构优化

AABB树的构建策略

批量查询优化

缓存友好布局

查询加速技术

20.1.4 内存管理最佳实践

避免不必要的拷贝

使用适当的属性映射

预分配内存

Compact Container的使用

内存池和自定义分配器

20.1.5 并行计算

CGAL的并发支持

线程安全问题

OpenMP与TBB集成

并行算法最佳实践

20.1.6 常见性能陷阱

陷阱1：内核不匹配

陷阱2：过度使用精确构造

陷阱3：频繁的动态内存分配

陷阱4：缓存不友好的访问模式

陷阱5：不必要的几何谓词调用

陷阱6：忽略算法复杂度

20.1.7 调试技巧

启用CGAL断言

断言行为自定义

几何验证

可视化调试

条件断点与日志

20.1.8 性能优化检查清单

编译配置

内核选择

内存管理

算法优化

调试与验证

20.1.9 本章小结

20.1.10 练习

练习1：内核性能对比

练习2：AABB树优化

练习3：内存优化

练习4：并行三角剖分

练习5：性能剖析实战