YOLOv8 C++部署：OpenCV DNN实现V5/V7/V8-智慧文博士

YOLOv8 C++部署：OpenCV DNN实现V5/V7/V8

在工业视觉、智能监控和自动驾驶等实时系统中，目标检测模型的推理速度与部署灵活性至关重要。尽管深度学习框架如 PyTorch 提供了强大的训练能力，但生产环境往往要求更低延迟、更高稳定性的原生代码集成——这正是 C++ 部署的价值所在。

YOLO 系列自 2015 年由 Joseph Redmon 提出以来，已演进至 YOLOv8，其结构更简洁、精度更高、泛化能力更强。而 Ultralytics 推出的 YOLOv5/v7/v8 不仅支持 ONNX 导出，还统一了输入输出格式，为跨平台部署提供了极大便利。

本文聚焦于如何使用OpenCV 的 DNN 模块在 C++ 中实现对 YOLOv5、YOLOv7 和 YOLOv8 的统一部署。无需依赖 PyTorch 或 Python 运行时，仅通过 OpenCV 即可完成从模型加载到结果可视化的全流程，并针对不同版本的输出逻辑进行适配优化。

✅ 要求 OpenCV >= 4.7.0
⚠️ 编译 OpenCV 支持 DNN 与 CUDA 不在本文展开，请参考官方文档或社区教程完成编译配置

统一架构设计：基类封装 + 多态实现

面对 YOLO 各版本之间细微却关键的差异（如 anchor 使用、输出维度、解码方式），我们采用面向对象的设计思路，构建一个可扩展的推理框架：

定义抽象基类Yolo，封装公共方法（前处理、NMS、绘制）
子类Yolov5/Yolov7/Yolov8分别重写Detect方法，处理各自特有的后处理逻辑
所有模型共享相同的输入尺寸（640×640）和类别集合（COCO 80类）
输出标准化为Detection结构体，便于后续业务逻辑处理

这种设计不仅提升了代码复用性，也使得新增模型支持变得轻而易举。

核心数据结构

struct Detection { int class_id{0}; // 类别 ID float confidence{0.0f}; // 置信度 cv::Rect box{}; // 检测框 };

该结构体将检测结果标准化，屏蔽底层差异，是各模块间通信的基础单元。

头文件定义：yoloV8.h

#pragma once #include <iostream> #include <opencv2/opencv.hpp> using namespace std; using namespace cv; using namespace cv::dnn; // 检测结果结构体 struct Detection { int class_id{0}; // 类别 ID float confidence{0.0f}; // 置信度 cv::Rect box{}; // 检测框 }; // 基类 Yolo class Yolo { public: virtual vector<Detection> Detect(Mat& srcImg, Net& net) = 0; bool readModel(Net& net, const string& modelPath, bool isCuda = false); void drawPred(Mat& img, const vector<Detection>& results, const vector<Scalar>& colors); // Sigmoid 函数 float sigmoid(float x) { return 1.0f / (1.0f + exp(-x)); } // 将图像填充为正方形（保持原始内容） Mat formatToSquare(const Mat& src) { int col = src.cols; int row = src.rows; int maxEdge = std::max(col, row); Mat square = Mat::zeros(maxEdge, maxEdge, CV_8UC3); src.copyTo(square(Rect(0, 0, col, row))); return square; } // 输入尺寸固定为 640x640 const int netWidth = 640; const int netHeight = 640; // 模型阈值参数（可在子类中覆盖） float modelConfidenceThreshold{0.25f}; float modelScoreThreshold{0.25f}; float modelNMSThreshold{0.45f}; // COCO 数据集 80 个类别名称 std::vector<std::string> classes = { "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush" }; }; // YOLOv5 实现类 class Yolov5 : public Yolo { public: vector<Detection> Detect(Mat& srcImg, Net& net) override; private: float confidenceThreshold{0.25f}; float nmsIoUThreshold{0.45f}; }; // YOLOv7 实现类 class Yolov7 : public Yolo { public: vector<Detection> Detect(Mat& srcImg, Net& net) override; private: float confidenceThreshold{0.25f}; float nmsIoUThreshold{0.45f}; const int strideSize = 3; const float strides[3] = {8.0f, 16.0f, 32.0f}; const float anchors[3][6] = { {12, 16, 19, 36, 40, 28}, {36, 75, 76, 55, 72, 146}, {142, 110, 192, 243, 459, 401} }; }; // YOLOv8 实现类 class Yolov8 : public Yolo { public: vector<Detection> Detect(Mat& srcImg, Net& net) override; private: float confidenceThreshold{0.25f}; float nmsIoUThreshold{0.70f}; };

值得注意的是，formatToSquare函数用于保持原始宽高比的同时将图像补齐为正方形，避免因拉伸导致形变；而sigmoid工具函数则被 YOLOv7 的激活计算所调用。

源文件实现：yoloV8.cpp

#include "yoloV8.h" bool Yolo::readModel(Net& net, const string& modelPath, bool isCuda) { try { net = readNetFromONNX(modelPath); } catch (const std::exception& e) { std::cerr << "Error loading ONNX model: " << e.what() << std::endl; return false; } if (isCuda) { net.setPreferableBackend(DNN_BACKEND_CUDA); net.setPreferableTarget(DNN_TARGET_CUDA); } else { net.setPreferableBackend(DNN_BACKEND_DEFAULT); net.setPreferableTarget(DNN_TARGET_CPU); } return true; } void Yolo::drawPred(Mat& img, const vector<Detection>& results, const vector<Scalar>& colors) { for (const auto& det : results) { Rect box = det.box; Scalar color = colors[det.class_id]; // 绘制矩形框 rectangle(img, box, color, 2); // 添加标签文本背景 string label = classes[det.class_id] + " " + to_string(det.confidence).substr(0, 4); Size txtSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.6, 1, nullptr); Rect textBox(box.x, box.y - 30, txtSize.width + 10, txtSize.height + 10); rectangle(img, textBox, color, FILLED); putText(img, label, Point(box.x + 5, box.y - 10), FONT_HERSHEY_SIMPLEX, 0.6, Scalar(0, 0, 0), 1); } }

readModel中加入了异常捕获机制，防止无效 ONNX 文件导致程序崩溃；而drawPred则实现了带背景色的文字渲染，提升可视化效果。

YOLOv5 推理逻辑

YOLOv5 的输出张量形状为(1, 25200, 85)，其中每行包含[x, y, w, h, conf, class_scores...]。其解码流程如下：

vector<Detection> Yolov5::Detect(Mat& srcImg, Net& net) { Mat input = formatToSquare(srcImg); Mat blob; blobFromImage(input, blob, 1.0 / 255.0, Size(netWidth, netHeight), Scalar(), true, false); net.setInput(blob); vector<Mat> outputs; net.forward(outputs, net.getUnconnectedOutLayersNames()); float* data = (float*)outputs[0].data; int rows = outputs[0].size[1]; // 25200 int dimensions = outputs[0].size[2]; // 85 vector<int> classIds; vector<float> confidences; vector<Rect> boxes; float x_factor = (float)input.cols / netWidth; float y_factor = (float)input.rows / netHeight; for (int i = 0; i < rows; ++i) { float* ptr = data + i * dimensions; float confidence = ptr[4]; if (confidence < modelConfidenceThreshold) continue; Mat scores = Mat(1, classes.size(), CV_32FC1, ptr + 5); Point maxLoc; double maxScore; minMaxLoc(scores, nullptr, &maxScore, nullptr, &maxLoc); if (maxScore > modelScoreThreshold) { classIds.push_back(maxLoc.x); confidences.push_back(static_cast<float>(maxScore)); float x = ptr[0], y = ptr[1], w = ptr[2], h = ptr[3]; int left = static_cast<int>((x - w * 0.5f) * x_factor); int top = static_cast<int>((y - h * 0.5f) * y_factor); int width = static_cast<int>(w * x_factor); int height = static_cast<int>(h * y_factor); boxes.emplace_back(left, top, width, height); } } vector<int> nmsIndices; NMSBoxes(boxes, confidences, confidenceThreshold, nmsIoUThreshold, nmsIndices); vector<Detection> detections; for (int idx : nmsIndices) { Detection obj; obj.class_id = classIds[idx]; obj.confidence = confidences[idx]; obj.box = boxes[idx]; detections.push_back(obj); } return detections; }

这里的关键在于坐标还原：网络输出的是归一化后的中心坐标和宽高，需乘以缩放因子恢复至原始图像空间。

YOLOv7：Anchor-based 解码

YOLOv7 保留了多尺度 anchor 设计，输出为三个特征图（80×80、40×40、20×20），每个预测基于预设 anchor 调整偏移量。

vector<Detection> Yolov7::Detect(Mat& srcImg, Net& net) { Mat input = formatToSquare(srcImg); Mat blob; blobFromImage(input, blob, 1.0 / 255.0, Size(netWidth, netHeight), Scalar(), true, false); net.setInput(blob); vector<Mat> outputs; net.forward(outputs, net.getUnconnectedOutLayersNames()); #if CV_VERSION_MAJOR == 4 && CV_VERSION_MINOR > 6 sort(outputs.begin(), outputs.end(), [](const Mat& a, const Mat& b) { return a.size[2] > b.size[2]; // 确保大特征图在前 }); #endif vector<int> classIds; vector<float> confidences; vector<Rect> boxes; float ratio_x = (float)input.cols / netWidth; float ratio_y = (float)input.rows / netHeight; int headDim = classes.size() + 5; for (int s = 0; s < strideSize; ++s) { float stride = strides[s]; float* anchor_ptr = (float*)outputs[s].data; int grid_w = (int)(netWidth / stride); int grid_h = (int)(netHeight / stride); const float* ancs = anchors[s]; for (int a = 0; a < 3; ++a) { float anchor_w = ancs[a * 2]; float anchor_h = ancs[a * 2 + 1]; for (int i = 0; i < grid_h; ++i) { for (int j = 0; j < grid_w; ++j) { float* ptr = anchor_ptr + (i * grid_w + j) * headDim; float box_conf = sigmoid(ptr[4]); Mat scores_mat(1, classes.size(), CV_32FC1, ptr + 5); Point cls_id; double max_score; minMaxLoc(scores_mat, nullptr, &max_score, nullptr, &cls_id); max_score = sigmoid((float)max_score); float objectness = box_conf * (float)max_score; if (objectness < confidenceThreshold) continue; float dx = sigmoid(ptr[0]); float dy = sigmoid(ptr[1]); float dw = exp(sigmoid(ptr[2])) * anchor_w; float dh = exp(sigmoid(ptr[3])) * anchor_h; float x = (dx * 2.0f - 0.5f + j) * stride; float y = (dy * 2.0f - 0.5f + i) * stride; int left = (int)((x - dw * 0.5f) * ratio_x); int top = (int)((y - dh * 0.5f) * ratio_y); int width = (int)(dw * ratio_x); int height = (int)(dh * ratio_y); classIds.push_back(cls_id.x); confidences.push_back(objectness); boxes.emplace_back(left, top, width, height); } } } anchor_ptr += outputs[s].total() * outputs[s].elemSize() / sizeof(float); } vector<int> nmsIndices; NMSBoxes(boxes, confidences, confidenceThreshold, nmsIoUThreshold, nmsIndices); vector<Detection> detections; for (int idx : nmsIndices) { Detection obj; obj.class_id = classIds[idx]; obj.confidence = confidences[idx]; obj.box = boxes[idx]; detections.push_back(obj); } return detections; }

注意：OpenCV 4.7+ 版本可能改变输出层顺序，因此需手动排序确保[80,40,20]的一致性。

YOLOv8：无锚点直接回归

YOLOv8 彻底摒弃 anchor，输出形式为(1, 84, 8400)，即 8400 个候选框，每个含 84 维向量（4 坐标 + 80 类得分）。其解码更为直接：

vector<Detection> Yolov8::Detect(Mat& srcImg, Net& net) { Mat input = formatToSquare(srcImg); Mat blob; blobFromImage(input, blob, 1.0 / 255.0, Size(netWidth, netHeight), Scalar(), true, false); net.setInput(blob); vector<Mat> outputs; net.forward(outputs, net.getUnconnectedOutLayersNames()); // 输出维度: (1, 84, 8400) -> 转置为 (8400, 84) Mat output = outputs[0].reshape(1, 84); transpose(output, output); float* data = (float*)output.data; int rows = output.rows; // 8400 int cols = output.cols; // 84 vector<int> classIds; vector<float> confidences; vector<Rect> boxes; float x_factor = (float)input.cols / netWidth; float y_factor = (float)input.rows / netHeight; for (int i = 0; i < rows; ++i) { float* ptr = data + i * cols; float x = ptr[0]; float y = ptr[1]; float w = ptr[2]; float h = ptr[3]; Mat scores = Mat(1, classes.size(), CV_32FC1, ptr + 4); Point maxClassId; double maxScore; minMaxLoc(scores, nullptr, &maxScore, nullptr, &maxClassId); if (maxScore > modelConfidenceThreshold) { int left = (int)((x - w * 0.5f) * x_factor); int top = (int)((y - h * 0.5f) * y_factor); int width = (int)(w * x_factor); int height = (int)(h * y_factor); classIds.push_back(maxClassId.x); confidences.push_back(static_cast<float>(maxScore)); boxes.emplace_back(left, top, width, height); } } vector<int> nmsIndices; NMSBoxes(boxes, confidences, confidenceThreshold, nmsIoUThreshold, nmsIndices); vector<Detection> detections; for (int idx : nmsIndices) { Detection obj; obj.class_id = classIds[idx]; obj.confidence = confidences[idx]; obj.box = boxes[idx]; detections.push_back(obj); } return detections; }

由于没有 anchor 回归，YOLOv8 的边界框预测更加稳定，且 NMS 阈值建议设置更高（如 0.7），以减少重复框抑制过度的问题。

主函数调用：main.cpp

#include "yoloV8.h" #include <iostream> #include <time.h> #define USE_CUDA false // 切换是否使用 CUDA using namespace std; using namespace cv; int main() { string imgPath = "./bus.jpg"; string modelPath = "./yolov8n.onnx"; // 可替换为 yolov5s.onnx 或 yolov7-tiny.onnx Mat image = imread(imgPath); if (image.empty()) { cerr << "Error: Could not load image!" << endl; return -1; } // 随机生成颜色用于绘制 vector<Scalar> colors; srand(time(nullptr)); for (int i = 0; i < 80; ++i) { colors.emplace_back(rand() % 256, rand() % 256, rand() % 256); } // 初始化 YOLOv8 模型 Yolov8 yolo; Net net; if (!yolo.readModel(net, modelPath, USE_CUDA)) { cerr << "Failed to load ONNX model." << endl; return -1; } cout << "Model loaded successfully." << endl; // 执行推理 Mat src = image.clone(); auto start = getTickCount(); vector<Detection> results = yolo.Detect(src, net); double fps = getTickFrequency() / (getTickCount() - start); cout << "Inference time: " << 1000.0 / fps << " ms" << " (" << fps << " FPS)" << endl; // 绘制结果 yolo.drawPred(src, results, colors); imwrite("./result.jpg", src); imshow("YOLO Inference", src); waitKey(0); return 0; }

主函数展示了完整的推理流程：加载图像 → 构建颜色表 → 加载模型 → 执行推理 → 计算帧率 → 可视化输出。切换模型只需更改modelPath和实例化对应类即可。

模型导出与编译说明

1. ONNX 模型导出（Python）

使用 Ultralytics 官方 API 导出 ONNX 模型：

from ultralytics import YOLO model = YOLO('yolov8n.pt') model.export(format='onnx', imgsz=640)

生成的.onnx文件即可用于 C++ 推理。注意指定imgsz=640以匹配代码中的输入尺寸。

2. 编译命令示例

g++ -std=c++17 main.cpp yoloV8.cpp \ `pkg-config --cflags --libs opencv4` \ -o yolov8_infer

若启用 CUDA 加速，请确保：
- OpenCV 编译时启用了 CUDA 支持
- 正确链接cudart等库
- 使用-DUSE_CUDA=true编译选项（可选）

性能优化建议

项目	建议
OpenCV 版本	≥ 4.7.0，推荐使用 4.8+
推理设备	启用 CUDA 可显著提升速度（RTX 3060 可达 100+ FPS）
图像预处理	使用`formatToSquare`保证比例不变
NMS 阈值	YOLOv8 建议 NMS IoU 设为 0.7，避免漏检

实际测试表明，在 RTX 3060 上运行 YOLOv8n，OpenCV DNN 的推理速度可达100~120 FPS，满足大多数实时应用需求。相比 TensorRT 虽略有差距，但胜在部署简单、兼容性强。

这套基于 OpenCV DNN 的 YOLO 统一部署方案，真正做到了“一次编写，多模型通用”。无论是嵌入式设备还是服务器端，只要能运行 OpenCV，就能快速接入 YOLO 系列模型。

更重要的是，整个框架高度模块化：新增模型只需继承Yolo类并实现Detect方法，无需改动主逻辑。未来若需支持 YOLOv9 或其他变体，扩展成本极低。

对于追求极致性能的场景，也可在此基础上接入 TensorRT 或 ONNX Runtime，进一步榨干硬件潜能。但就快速原型开发与中小规模部署而言，OpenCV DNN 依然是最实用的选择。