如何调用MinerU API？Python接口集成实战教程代码实例-智慧文博士

如何调用MinerU API？Python接口集成实战教程代码实例

1. 引言

你是不是经常遇到这样的场景：收到一堆PDF报告需要整理，里面有表格、图表、文字混在一起，手动提取信息费时费力；或者需要从扫描的文档中快速找到关键数据，但OCR工具只能识别文字，无法理解内容含义。

今天我要分享的，就是解决这类问题的利器——MinerU智能文档理解模型。这个由上海人工智能实验室开发的1.2B小模型，专门为文档解析而生。虽然参数小，但在处理文档、表格、图表方面表现相当出色，而且对硬件要求极低，普通CPU就能流畅运行。

在这篇教程里，我会手把手教你如何通过Python调用MinerU的API，把文档理解能力集成到自己的项目中。无论你是想自动化处理办公文档，还是需要从学术论文中提取数据，这篇文章都能给你实用的解决方案。

学完这篇教程，你将掌握：

MinerU API的基本调用方法
如何处理不同类型的文档图片
如何构建完整的文档处理流程
实际项目中的集成技巧

不需要深度学习背景，只要会基本的Python编程，就能跟着我一步步实现。

2. 环境准备与快速部署

2.1 获取API访问权限

首先，你需要确保能够访问MinerU服务。如果你在CSDN星图平台使用，镜像启动后通常会提供一个HTTP访问地址。记下这个地址，它将是你的API端点。

如果你在其他环境部署，确保服务已经启动并监听在某个端口。默认情况下，MinerU服务通常运行在7860或类似的端口上。

2.2 安装必要的Python库

打开你的终端或命令行，安装几个必需的库：

pip install requests pillow opencv-python numpy

让我简单解释一下每个库的作用：

requests：用于发送HTTP请求到MinerU API
pillow（PIL）：处理图片文件，支持多种格式
opencv-python：可选，如果你需要更复杂的图像预处理
numpy：处理图像数据

如果你只需要基本功能，安装requests和pillow就足够了。

2.3 准备测试图片

找几张包含不同内容的图片作为测试素材：

纯文字图片：比如一段打印的文字截图
表格图片：Excel表格或网页表格的截图
图表图片：柱状图、折线图等
混合内容：既有文字又有图表的文档页面

把这些图片保存在一个文件夹里，比如test_images/，方便后续调用。

3. 基础API调用方法

3.1 最简单的调用示例

让我们从一个最简单的例子开始。假设你的MinerU服务运行在http://localhost:7860，下面是如何发送一个基本的请求：

import requests from PIL import Image import io # MinerU服务的API地址 API_URL = "http://localhost:7860/api/chat" def ask_mineru_simple(image_path, question): """ 向MinerU提问的最简单方法 参数： image_path: 图片文件路径 question: 你要问的问题 """ # 1. 读取图片文件 with open(image_path, 'rb') as f: image_data = f.read() # 2. 准备请求数据 files = { 'image': ('image.jpg', image_data, 'image/jpeg') } data = { 'question': question, 'temperature': 0.1, # 控制回答的随机性，值越小越确定 'max_tokens': 512 # 最大返回token数 } # 3. 发送请求 response = requests.post(API_URL, files=files, data=data) # 4. 处理响应 if response.status_code == 200: result = response.json() return result.get('answer', '没有获取到答案') else: return f"请求失败，状态码：{response.status_code}" # 使用示例 if __name__ == "__main__": # 测试一张图片 answer = ask_mineru_simple('test_images/report_page.jpg', '这张图片里有哪些关键数据？') print(f"MinerU的回答：{answer}")

这个函数做了几件事：

读取本地图片文件
构建包含图片和问题的请求
发送到MinerU API
解析返回的答案

3.2 处理不同类型的文档

MinerU擅长处理各种文档类型，但针对不同类型，提问方式可以稍作调整：

def process_document_by_type(image_path, doc_type): """ 根据文档类型选择合适的问题模板 参数： image_path: 图片路径 doc_type: 文档类型，可选 'text', 'table', 'chart', 'mixed' """ # 定义不同文档类型的问题模板 question_templates = { 'text': '请提取图片中的所有文字内容，保持原文格式。', 'table': '请识别表格中的数据，以Markdown表格格式返回。', 'chart': '请描述这张图表展示的数据趋势和关键信息。', 'mixed': '请总结这张图片的主要内容，包括文字和图表信息。', 'academic': '请提取这篇学术论文片段的标题、作者、摘要和关键结论。' } # 获取对应的问题 question = question_templates.get(doc_type, '请描述这张图片的内容。') # 调用API return ask_mineru_simple(image_path, question) # 使用示例 results = {} image_types = [ ('contract.jpg', 'text'), ('sales_data.png', 'table'), ('growth_chart.jpg', 'chart'), ('research_paper.png', 'academic') ] for image_file, doc_type in image_types: result = process_document_by_type(f'test_images/{image_file}', doc_type) results[image_file] = result print(f"处理 {image_file} ({doc_type}) 完成")

通过这种方式，你可以根据文档类型自动选择最合适的提问方式，提高信息提取的准确性。

4. 高级功能与实用技巧

4.1 批量处理文档

在实际工作中，我们经常需要处理大量文档。下面是一个批量处理的示例：

import os from concurrent.futures import ThreadPoolExecutor, as_completed import time def batch_process_documents(image_folder, output_file='results.txt', max_workers=3): """ 批量处理文件夹中的所有图片 参数： image_folder: 图片文件夹路径 output_file: 结果输出文件 max_workers: 最大并发数，根据你的硬件调整 """ # 获取所有图片文件 image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.tiff'] image_files = [] for file in os.listdir(image_folder): if any(file.lower().endswith(ext) for ext in image_extensions): image_files.append(os.path.join(image_folder, file)) print(f"找到 {len(image_files)} 个图片文件") results = [] processed_count = 0 total_count = len(image_files) # 使用线程池并发处理 with ThreadPoolExecutor(max_workers=max_workers) as executor: # 提交所有任务 future_to_file = { executor.submit(ask_mineru_simple, img_file, '请提取并总结这张图片的主要内容。'): img_file for img_file in image_files } # 处理完成的任务 for future in as_completed(future_to_file): img_file = future_to_file[future] try: result = future.result() results.append(f"文件: {os.path.basename(img_file)}\n结果: {result}\n{'='*50}\n") processed_count += 1 print(f"进度: {processed_count}/{total_count} - {os.path.basename(img_file)}") except Exception as e: results.append(f"文件: {os.path.basename(img_file)}\n错误: {str(e)}\n{'='*50}\n") # 保存结果 with open(output_file, 'w', encoding='utf-8') as f: f.writelines(results) print(f"批量处理完成，结果已保存到 {output_file}") return results # 使用示例 if __name__ == "__main__": # 处理整个文件夹的图片 batch_process_documents('documents_to_process/', 'extraction_results.txt')

这个批量处理函数有几个实用特性：

自动识别图片文件：支持常见的图片格式
并发处理：可以同时处理多个文件，提高效率
进度显示：实时显示处理进度
错误处理：单个文件失败不会影响其他文件
结果保存：自动保存到文本文件

4.2 图像预处理优化

有时候原始图片质量不高，会影响识别效果。我们可以添加一些预处理步骤：

from PIL import Image, ImageEnhance, ImageFilter import cv2 import numpy as np def preprocess_image(image_path, output_path=None, enhance_contrast=True, remove_noise=True): """ 对图片进行预处理，提高识别准确率 参数： image_path: 原始图片路径 output_path: 预处理后保存路径（可选） enhance_contrast: 是否增强对比度 remove_noise: 是否去除噪点 """ # 使用PIL打开图片 img = Image.open(image_path) # 转换为灰度图（如果是彩色） if img.mode != 'L': img = img.convert('L') # 增强对比度 if enhance_contrast: enhancer = ImageEnhance.Contrast(img) img = enhancer.enhance(1.5) # 增强1.5倍 # 转换为numpy数组进行OpenCV处理 img_array = np.array(img) # 去除噪点 if remove_noise: # 使用中值滤波去除椒盐噪声 img_array = cv2.medianBlur(img_array, 3) # 使用高斯模糊平滑图像 img_array = cv2.GaussianBlur(img_array, (3, 3), 0) # 二值化处理（对于扫描文档特别有效） _, img_array = cv2.threshold(img_array, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) # 转换回PIL Image processed_img = Image.fromarray(img_array) # 保存处理后的图片 if output_path: processed_img.save(output_path) print(f"预处理完成，保存到: {output_path}") return processed_img def process_with_preprocessing(image_path, question): """ 带预处理的MinerU调用 """ # 预处理图片 temp_path = 'temp_processed.jpg' processed_img = preprocess_image(image_path, temp_path) try: # 调用API result = ask_mineru_simple(temp_path, question) return result finally: # 清理临时文件 if os.path.exists(temp_path): os.remove(temp_path) # 使用示例：处理质量较差的扫描文档 result = process_with_preprocessing( 'poor_quality_scan.jpg', '请提取这份扫描文档中的所有文字内容' ) print(f"预处理后的识别结果：{result}")

预处理可以显著提高低质量图片的识别准确率，特别是：

老旧的扫描文档
手机拍摄的文档照片
低对比度的打印材料

4.3 构建完整的文档处理流水线

让我们把这些功能组合起来，构建一个完整的文档处理系统：

import json from datetime import datetime from pathlib import Path class DocumentProcessor: """文档处理流水线""" def __init__(self, api_url="http://localhost:7860/api/chat"): self.api_url = api_url self.results_cache = {} # 缓存处理结果 def detect_document_type(self, image_path): """ 自动检测文档类型 实际项目中可以用更复杂的模型，这里用简单规则 """ # 这里可以扩展为使用图像分类模型 # 暂时用文件名简单判断 filename = Path(image_path).name.lower() if 'table' in filename or '数据' in filename: return 'table' elif 'chart' in filename or '图表' in filename: return 'chart' elif 'contract' in filename or '合同' in filename: return 'text' elif 'paper' in filename or '论文' in filename: return 'academic' else: # 默认让MinerU自己判断 return 'auto' def process_document(self, image_path, question=None, use_cache=True): """ 处理单个文档的完整流程 """ # 检查缓存 cache_key = f"{image_path}_{question}" if use_cache and cache_key in self.results_cache: print(f"使用缓存结果: {image_path}") return self.results_cache[cache_key] # 1. 预处理（如果需要） if self.needs_preprocessing(image_path): print(f"对 {image_path} 进行预处理...") processed_img = preprocess_image(image_path) # 这里可以保存预处理后的图片或直接使用 # 简化处理，实际使用预处理后的图片 # 2. 自动检测文档类型 doc_type = self.detect_document_type(image_path) # 3. 构建问题 if question is None: if doc_type == 'auto': question = '请详细描述这张图片的内容。' else: question_templates = { 'text': '请提取所有文字内容，保持段落结构。', 'table': '请识别表格数据，用Markdown表格格式返回。', 'chart': '请分析图表，描述数据趋势和关键点。', 'academic': '请提取论文片段的标题、摘要、方法和结论。' } question = question_templates.get(doc_type, '请描述这张图片的内容。') # 4. 调用API print(f"处理文档: {Path(image_path).name}") start_time = time.time() result = ask_mineru_simple(image_path, question) processing_time = time.time() - start_time print(f"处理完成，耗时: {processing_time:.2f}秒") # 5. 保存结果到缓存 result_data = { 'filename': Path(image_path).name, 'doc_type': doc_type, 'question': question, 'answer': result, 'processing_time': processing_time, 'timestamp': datetime.now().isoformat() } self.results_cache[cache_key] = result_data return result_data def needs_preprocessing(self, image_path): """ 判断是否需要预处理 可以根据文件大小、分辨率等判断 """ try: with Image.open(image_path) as img: width, height = img.size # 如果图片太小，可能需要预处理 return width < 800 or height < 600 except: return False def export_results(self, output_format='json', output_file='document_results'): """ 导出处理结果 """ if not self.results_cache: print("没有可导出的结果") return timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") output_file = f"{output_file}_{timestamp}" if output_format == 'json': with open(f'{output_file}.json', 'w', encoding='utf-8') as f: json.dump(list(self.results_cache.values()), f, ensure_ascii=False, indent=2) print(f"结果已导出到 {output_file}.json") elif output_format == 'txt': with open(f'{output_file}.txt', 'w', encoding='utf-8') as f: for key, data in self.results_cache.items(): f.write(f"文件: {data['filename']}\n") f.write(f"类型: {data['doc_type']}\n") f.write(f"问题: {data['question']}\n") f.write(f"回答: {data['answer']}\n") f.write(f"处理时间: {data['processing_time']:.2f}秒\n") f.write(f"时间戳: {data['timestamp']}\n") f.write("="*60 + "\n\n") print(f"结果已导出到 {output_file}.txt") elif output_format == 'csv': import csv with open(f'{output_file}.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['文件名', '文档类型', '问题', '回答', '处理时间', '时间戳']) for data in self.results_cache.values(): writer.writerow([ data['filename'], data['doc_type'], data['question'], data['answer'], data['processing_time'], data['timestamp'] ]) print(f"结果已导出到 {output_file}.csv") # 使用示例：完整的文档处理流程 if __name__ == "__main__": # 初始化处理器 processor = DocumentProcessor() # 处理多个文档 documents = [ 'documents/sales_report.png', 'documents/research_paper.jpg', 'documents/contract_scan.jpg' ] for doc in documents: if os.path.exists(doc): result = processor.process_document(doc) print(f"\n处理结果摘要: {result['filename']}") print(f"文档类型: {result['doc_type']}") print(f"回答长度: {len(result['answer'])} 字符") print("-" * 50) # 导出结果 processor.export_results(output_format='json') processor.export_results(output_format='txt')

这个DocumentProcessor类提供了一个完整的解决方案，包括：

自动文档类型检测
智能问题生成
结果缓存（避免重复处理）
多种格式导出
处理状态跟踪

5. 实际应用案例

5.1 案例一：自动化合同信息提取

假设你是一家公司的法务助理，每天需要处理大量合同扫描件，提取关键信息如合同双方、金额、日期等。

def extract_contract_info(contract_image_path): """ 从合同扫描件中提取关键信息 """ # 定义需要提取的信息 questions = [ "合同双方的全称是什么？", "合同总金额是多少？", "合同签署日期是什么时候？", "合同的有效期是多久？", "违约责任条款的主要内容是什么？" ] extracted_info = {} for question in questions: print(f"提取: {question}") answer = ask_mineru_simple(contract_image_path, question) extracted_info[question] = answer time.sleep(0.5) # 避免请求过快 # 结构化输出 print("\n=== 合同信息提取结果 ===") for q, a in extracted_info.items(): print(f"Q: {q}") print(f"A: {a}\n") return extracted_info # 使用示例 contract_info = extract_contract_info('contracts/agreement_2024.jpg')

5.2 案例二：学术论文摘要生成

研究人员需要快速阅读大量论文，提取核心观点：

def generate_paper_summary(paper_image_path): """ 从论文图片生成摘要 """ # 分步骤提取信息 steps = [ ("提取论文标题和作者", "请提取这篇论文的标题和所有作者姓名。"), ("提取摘要", "请提取论文的摘要部分。"), ("提取研究方法", "这篇论文使用了什么研究方法或实验设计？"), ("提取主要结论", "论文的主要结论或发现是什么？"), ("提取创新点", "这篇论文的主要创新点或贡献是什么？") ] summary = {} for step_name, question in steps: print(f"正在{step_name}...") answer = ask_mineru_simple(paper_image_path, question) summary[step_name] = answer # 生成综合摘要 print("\n=== 论文摘要 ===") print(f"标题和作者: {summary['提取论文标题和作者']}") print(f"\n摘要: {summary['提取摘要']}") print(f"\n研究方法: {summary['提取研究方法']}") print(f"\n主要结论: {summary['提取主要结论']}") print(f"\n创新点: {summary['提取创新点']}") return summary # 使用示例 paper_summary = generate_paper_summary('papers/ai_research.png')

5.3 案例三：财务报表数据分析

财务人员需要从报表图片中提取数据并分析：

def analyze_financial_statement(statement_image_path): """ 分析财务报表图片 """ analysis_questions = [ "请提取利润表中的营业收入、营业成本、净利润数据。", "请提取资产负债表中的总资产、总负债、净资产数据。", "计算毛利率和净利率。", "分析公司的偿债能力（资产负债率）。", "简要总结公司的财务状况。" ] analysis_results = [] for i, question in enumerate(analysis_questions, 1): print(f"分析步骤 {i}/5...") result = ask_mineru_simple(statement_image_path, question) analysis_results.append(result) # 保存中间结果 with open(f'financial_analysis_step_{i}.txt', 'w', encoding='utf-8') as f: f.write(f"问题: {question}\n") f.write(f"分析结果: {result}\n") time.sleep(0.5) # 生成分析报告 report = f"""财务报表分析报告 生成时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} 分析文件: {os.path.basename(statement_image_path)} 1. 关键财务数据提取: {analysis_results[0]} 2. 资产负债表数据: {analysis_results[1]} 3. 盈利能力分析: {analysis_results[2]} 4. 偿债能力分析: {analysis_results[3]} 5. 总体评价: {analysis_results[4]} """ with open('financial_analysis_report.txt', 'w', encoding='utf-8') as f: f.write(report) print("财务报表分析完成，报告已保存。") return report # 使用示例 financial_report = analyze_financial_statement('financials/q3_statement.jpg')

6. 常见问题与解决方案

6.1 图片上传问题

问题：上传图片后API返回错误或无法识别。

解决方案：

def validate_and_prepare_image(image_path, max_size_mb=10): """ 验证并准备图片文件 """ # 检查文件是否存在 if not os.path.exists(image_path): raise FileNotFoundError(f"图片文件不存在: {image_path}") # 检查文件大小 file_size_mb = os.path.getsize(image_path) / (1024 * 1024) if file_size_mb > max_size_mb: # 压缩图片 print(f"图片过大 ({file_size_mb:.1f}MB)，进行压缩...") compressed_path = compress_image(image_path) return compressed_path # 检查图片格式 try: with Image.open(image_path) as img: img.verify() # 验证图片完整性 print(f"图片验证通过: {image_path}, 格式: {img.format}, 尺寸: {img.size}") return image_path except Exception as e: raise ValueError(f"图片文件损坏或格式不支持: {str(e)}") def compress_image(image_path, quality=85, max_dimension=2000): """ 压缩图片文件 """ with Image.open(image_path) as img: # 调整尺寸 if max(img.size) > max_dimension: ratio = max_dimension / max(img.size) new_size = (int(img.size[0] * ratio), int(img.size[1] * ratio)) img = img.resize(new_size, Image.Resampling.LANCZOS) # 保存为JPEG（压缩） compressed_path = image_path.replace('.', '_compressed.') if not compressed_path.endswith('.jpg'): compressed_path += '.jpg' img.save(compressed_path, 'JPEG', quality=quality, optimize=True) new_size_mb = os.path.getsize(compressed_path) / (1024 * 1024) print(f"压缩完成: {new_size_mb:.1f}MB") return compressed_path

6.2 提高识别准确率

问题：某些复杂文档识别不准确。

解决方案：

分区域识别：将大图分成多个小区域分别识别
多角度提问：对同一内容从不同角度提问，综合结果
后处理校验：对识别结果进行逻辑校验

def improve_recognition_accuracy(image_path, content_type): """ 提高识别准确率的策略 """ strategies = { 'table': [ "请识别这个表格的所有数据", "请以表格形式返回这些数据", "请提取表格的每一行每一列数据" ], 'chart': [ "请描述这个图表的数据", "请分析图表的趋势", "请提取图表的坐标轴信息和数据点" ], 'text': [ "请提取所有文字", "请保持原文格式提取文字", "请分段提取文字内容" ] } questions = strategies.get(content_type, ["请描述图片内容"]) results = [] for question in questions: result = ask_mineru_simple(image_path, question) results.append(result) time.sleep(0.3) # 简单的结果融合策略 if len(results) > 1: # 取最长的回答（通常包含最多信息） best_result = max(results, key=len) return best_result else: return results[0] if results else ""

6.3 处理速度优化

问题：处理大量文档时速度较慢。

解决方案：

from concurrent.futures import ThreadPoolExecutor import threading class OptimizedDocumentProcessor: """优化后的文档处理器""" def __init__(self, api_url, max_workers=5, batch_size=10): self.api_url = api_url self.max_workers = max_workers self.batch_size = batch_size self.lock = threading.Lock() self.results = [] def process_batch(self, image_paths, questions=None): """批量处理文档""" with ThreadPoolExecutor(max_workers=self.max_workers) as executor: futures = [] for i, img_path in enumerate(image_paths): if questions and i < len(questions): question = questions[i] else: question = "请提取图片主要内容" future = executor.submit(self._process_single, img_path, question) futures.append((img_path, future)) # 收集结果 for img_path, future in futures: try: result = future.result(timeout=30) # 30秒超时 with self.lock: self.results.append({ 'file': img_path, 'result': result, 'status': 'success' }) except Exception as e: with self.lock: self.results.append({ 'file': img_path, 'result': str(e), 'status': 'failed' }) return self.results def _process_single(self, image_path, question): """处理单个文档（内部方法）""" # 这里可以添加重试机制 max_retries = 3 for attempt in range(max_retries): try: return ask_mineru_simple(image_path, question) except Exception as e: if attempt == max_retries - 1: raise time.sleep(1 * (attempt + 1)) # 指数退避

7. 总结

通过这篇教程，你应该已经掌握了如何使用Python调用MinerU API进行文档理解。我们从最简单的API调用开始，逐步深入到批量处理、图像预处理、完整流水线构建，最后还探讨了几个实际应用案例。

让我总结一下关键要点：

基础调用很简单：只需要几行代码就能让MinerU帮你理解文档内容
预处理很重要：对于质量较差的图片，适当的预处理能显著提高识别准确率
批量处理提效率：使用并发处理可以大幅提升处理大量文档的速度
问题设计有技巧：针对不同类型的文档，设计合适的问题能得到更好的结果
实际应用广泛：从合同处理到论文分析，从财务报表到日常文档，MinerU都能派上用场

在实际使用中，你可能会遇到各种具体情况。我的建议是：

先从简单的文档开始，熟悉API的响应模式
针对你的具体需求，设计合适的问题模板
对于重要文档，可以采用"多角度提问+结果融合"的策略
记得处理异常情况，比如网络超时、图片格式问题等

MinerU虽然是个小模型，但在文档理解这个特定任务上表现相当不错。特别是它的轻量级特性，使得在普通硬件上也能快速运行，这对于很多实际应用场景来说是非常实用的。

获取更多AI镜像
想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

如何调用MinerU API？Python接口集成实战教程代码实例