feat: optimize padding and formula fallback

fix: layout detection & format conversion robustness
Three targeted fixes for layout processing issues: 1. formula_number type mapping (layout_detector.py) - Changed formula_number region type from "formula" to "text" - Ensures Text Recognition prompt, preventing $$-wrapped output - Prevents malformed \tag{$$...\n$$} in merged formulas 2. Reading order (ocr_service.py) - Sort layout regions by (y1, x1) after detection - Ensures top-to-bottom, left-to-right processing order - Fixes paragraph ordering issues in output 3. Formula number cleaning (glm_postprocess.py) - clean_formula_number() now strips $$, $, \[...\] delimiters - Handles edge case where vLLM still returns math-mode wrapped content - Prevents delimiter leakage into \tag{} placeholders Also adds logging: - Warning when empty formula content is skipped - Warning when region crop is too small (< 10×10 px) - Warning when vLLM parallel call fails - Warning when format conversion fails Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-03-10 09:54:54 +08:00 · 2026-03-09 17:57:05 +08:00 · 2026-03-09 17:19:53 +08:00 · 2026-03-09 17:13:19 +08:00 · 2026-02-26 17:01:23 +08:00 · 2026-02-25 09:52:45 +08:00
30 changed files with 3595 additions and 535 deletions
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@@ -0,0 +1,14 @@
 {
  "permissions": {
    "allow": [
      "WebFetch(domain:deepwiki.com)",
      "WebFetch(domain:github.com)",
      "Read(//private/tmp/**)",
      "Bash(gh api repos/zai-org/GLM-OCR/contents/glmocr --jq '.[].name')",
      "WebFetch(domain:raw.githubusercontent.com)",
      "Bash(python -c \"\nfrom app.services.glm_postprocess import GLMResultFormatter, clean_repeated_content, clean_formula_number\nf = GLMResultFormatter\\(\\)\nprint\\('GLMResultFormatter OK'\\)\nprint\\('clean_formula_number:', clean_formula_number\\('\\(2.1\\)'\\)\\)\nregions = [\n    {'index': 0, 'label': 'text', 'native_label': 'doc_title', 'content': 'Introduction', 'bbox_2d': [10,10,990,50]},\n    {'index': 1, 'label': 'formula', 'native_label': 'display_formula', 'content': r'\\\\frac{a}{b}', 'bbox_2d': [10,60,990,200]},\n    {'index': 2, 'label': 'text', 'native_label': 'formula_number', 'content': '\\(1\\)', 'bbox_2d': [900,60,990,200]},\n]\nmd = f.process\\(regions\\)\nprint\\('process output:'\\)\nprint\\(md\\)\n\" 2>&1 | grep -v \"^$\")",
      "Bash(python3 -c \"\nfrom app.services.glm_postprocess import GLMResultFormatter, clean_repeated_content, clean_formula_number\nf = GLMResultFormatter\\(\\)\nprint\\('GLMResultFormatter OK'\\)\nprint\\('clean_formula_number:', clean_formula_number\\('\\(2.1\\)'\\)\\)\nregions = [\n    {'index': 0, 'label': 'text', 'native_label': 'doc_title', 'content': 'Introduction', 'bbox_2d': [10,10,990,50]},\n    {'index': 1, 'label': 'formula', 'native_label': 'display_formula', 'content': r'\\\\frac{a}{b}', 'bbox_2d': [10,60,990,200]},\n    {'index': 2, 'label': 'text', 'native_label': 'formula_number', 'content': '\\(1\\)', 'bbox_2d': [900,60,990,200]},\n]\nmd = f.process\\(regions\\)\nprint\\('process output:'\\)\nprint\\(repr\\(md\\)\\)\n\" 2>&1)",
      "Bash(ls .venv 2>/dev/null || ls venv 2>/dev/null || echo \"no venv found\" && find . -name \"activate\" -path \"*/bin/activate\" 2>/dev/null | head -3)"
    ]
  }
 }
--- a/.dockerignore
+++ b/.dockerignore
@@ -53,3 +53,14 @@ Thumbs.db
 test/
 # Claude Code / Development
 .claude/
 # Development and CI/CD
 .github/
 .gitpod.yml
 Makefile
 # Local development scripts
 scripts/local/
--- a/.gitignore
+++ b/.gitignore
@@ -72,4 +72,12 @@ uv.lock
 model/
-test/
+test/
 # Claude Code / Development
 .claude/
 # Test outputs and reports
 test_report/
 coverage_report/
 .coverage.json
--- a/4
+++ b/4
@@ -2,7 +2,7 @@
 # Optimized for RTX 5080 GPU deployment
 # Use NVIDIA CUDA base image with Python 3.10
-FROM nvidia/cuda:12.8.0-runtime-ubuntu24.04
+FROM nvidia/cuda:12.9.0-runtime-ubuntu24.04
 # Set environment variables
 ENV PYTHONUNBUFFERED=1 \
@@ -15,7 +15,7 @@ ENV PYTHONUNBUFFERED=1 \
    # Application config (override defaults for container)
    # Use 127.0.0.1 for --network host mode, or override with -e for bridge mode
    PP_DOCLAYOUT_MODEL_DIR=/root/.cache/modelscope/hub/models/PaddlePaddle/PP-DocLayoutV2 \
-    PADDLEOCR_VL_URL=http://127.0.0.1:8000/v1
+    PADDLEOCR_VL_URL=http://127.0.0.1:8001/v1
 # Set working directory
 WORKDIR /app
--- a/PORT_CONFIGURATION.md
+++ b/PORT_CONFIGURATION.md
@@ -0,0 +1,148 @@
 # 端口配置检查总结
 ## 搜索命令
 ```bash
 # 搜索所有 8000 端口引用
 rg "(127\.0\.0\.1|localhost):8000"
 # 或使用 grep
 grep -r -n -E "(127\.0\.0\.1|localhost):8000" . \
  --exclude-dir=.git \
  --exclude-dir=__pycache__ \
  --exclude-dir=.venv \
  --exclude="*.pyc"
 ```
 ## 当前端口配置 ✅
 ### PaddleOCR-VL 服务 (端口 8001)
 **代码文件** - 全部正确 ✅:
 - `app/core/config.py:25` → `http://127.0.0.1:8001/v1`
 - `app/services/ocr_service.py:492` → `http://localhost:8001/v1`
 - `app/core/dependencies.py:53` → `http://localhost:8001/v1` (fallback)
 - `Dockerfile:18` → `http://127.0.0.1:8001/v1`
 ### Mineru API 服务 (端口 8000)
 **代码文件** - 全部正确 ✅:
 - `app/core/config.py:28` → `http://127.0.0.1:8000/file_parse`
 - `app/services/ocr_service.py:489` → `http://127.0.0.1:8000/file_parse`
 - `app/core/dependencies.py:52` → `http://127.0.0.1:8000/file_parse` (fallback)
 ### 文档和示例文件
 以下文件包含示例命令，使用 `localhost:8000`，这些是文档用途，不影响实际运行：
 - `docs/*.md` - 各种 curl 示例
 - `README.md` - 配置示例 (使用 8080)
 - `docker-compose.yml` - 使用 8080
 - `openspec/changes/add-doc-processing-api/design.md` - 设计文档
 ## 验证服务端口
 ### 1. 检查 vLLM (PaddleOCR-VL)
 ```bash
 # 应该在 8001
 lsof -i:8001
 # 验证模型
 curl http://127.0.0.1:8001/v1/models
 ```
 ### 2. 检查 Mineru API
 ```bash
 # 应该在 8000
 lsof -i:8000
 # 验证健康状态
 curl http://127.0.0.1:8000/health
 ```
 ### 3. 检查你的 FastAPI 应用
 ```bash
 # 应该在 8053
 lsof -i:8053
 # 验证健康状态
 curl http://127.0.0.1:8053/health
 ```
 ## 修复历史
 ### 已修复的问题 ✅
 1. **app/services/ocr_service.py:492**
   - 从: `paddleocr_vl_url: str = "http://localhost:8000/v1"`
   - 到: `paddleocr_vl_url: str = "http://localhost:8001/v1"`
 2. **Dockerfile:18**
   - 从: `PADDLEOCR_VL_URL=http://127.0.0.1:8000/v1`
   - 到: `PADDLEOCR_VL_URL=http://127.0.0.1:8001/v1`
 3. **app/core/config.py:25**
   - 已经是正确的 8001
 ## 环境变量配置
 如果需要自定义端口，可以设置环境变量：
 ```bash
 # PaddleOCR-VL (默认 8001)
 export PADDLEOCR_VL_URL=http://127.0.0.1:8001/v1
 # Mineru API (默认 8000)
 export MINER_OCR_API_URL=http://127.0.0.1:8000/file_parse
 ```
 或在 `.env` 文件中：
 ```env
 PADDLEOCR_VL_URL=http://127.0.0.1:8001/v1
 MINER_OCR_API_URL=http://127.0.0.1:8000/file_parse
 ```
 ## Docker 部署注意事项
 在 Docker 容器中，使用：
 - `--network host`: 使用 `127.0.0.1`
 - `--network bridge`: 使用 `host.docker.internal` 或容器名
 示例：
 ```bash
 docker run \
  --network host \
  -e PADDLEOCR_VL_URL=http://127.0.0.1:8001/v1 \
  -e MINER_OCR_API_URL=http://127.0.0.1:8000/file_parse \
  doc-processer
 ```
 ## 快速验证脚本
 ```bash
 #!/bin/bash
 echo "检查端口配置..."
 # 检查代码中的配置
 echo -e "\n=== PaddleOCR-VL URLs (应该是 8001) ==="
 rg "paddleocr_vl.*8\d{3}" app/
 echo -e "\n=== Mineru API URLs (应该是 8000) ==="
 rg "miner.*8\d{3}" app/
 # 检查服务状态
 echo -e "\n=== 检查运行中的服务 ==="
 echo "Port 8000 (Mineru):"
 lsof -i:8000 | grep LISTEN || echo "  未运行"
 echo "Port 8001 (PaddleOCR-VL):"
 lsof -i:8001 | grep LISTEN || echo "  未运行"
 echo "Port 8053 (FastAPI):"
 lsof -i:8053 | grep LISTEN || echo "  未运行"
 ```
 保存为 `check_ports.sh`，然后运行：
 ```bash
 chmod +x check_ports.sh
 ./check_ports.sh
 ```
--- a/app/api/v1/endpoints/image.py
+++ b/app/api/v1/endpoints/image.py
@@ -1,52 +1,68 @@
 """Image OCR endpoint."""
-from fastapi import APIRouter, Depends, HTTPException
+import time
 import uuid
-from app.core.dependencies import get_image_processor, get_layout_detector, get_ocr_service, get_mineru_ocr_service
+from fastapi import APIRouter, Depends, HTTPException, Request, Response
 from app.core.dependencies import (
    get_image_processor,
    get_glmocr_endtoend_service,
 )
 from app.core.logging_config import get_logger, RequestIDAdapter
 from app.schemas.image import ImageOCRRequest, ImageOCRResponse
 from app.services.image_processor import ImageProcessor
-from app.services.layout_detector import LayoutDetector
+from app.services.ocr_service import GLMOCREndToEndService
 from app.services.ocr_service import OCRService, MineruOCRService
 router = APIRouter()
 logger = get_logger()
@router.post("/ocr", response_model=ImageOCRResponse)
 async def process_image_ocr(
    request: ImageOCRRequest,
    http_request: Request,
    response: Response,
    image_processor: ImageProcessor = Depends(get_image_processor),
-    layout_detector: LayoutDetector = Depends(get_layout_detector),
+    glmocr_service: GLMOCREndToEndService = Depends(get_glmocr_endtoend_service),
    mineru_service: MineruOCRService = Depends(get_mineru_ocr_service),
    paddle_service: OCRService = Depends(get_ocr_service),
 ) -> ImageOCRResponse:
    """Process an image and extract content as LaTeX, Markdown, and MathML.
    The processing pipeline:
-    1. Load and preprocess image (add 30% whitespace padding)
+    1. Load and preprocess image
-    2. Detect layout using DocLayout-YOLO
+    2. Detect layout regions using PP-DocLayoutV3
-    3. Based on layout:
+    3. Crop each region and recognize with GLM-OCR via vLLM (task-specific prompts)
-       - If plain text exists: use PP-DocLayoutV2 for mixed recognition
+    4. Aggregate region results into Markdown
-       - Otherwise: use PaddleOCR-VL with formula prompt
+    5. Convert to LaTeX, Markdown, and MathML formats
    4. Convert output to LaTeX, Markdown, and MathML formats
    Note: OMML conversion is not included due to performance overhead.
    Use the /convert/latex-to-omml endpoint to convert LaTeX to OMML separately.
    """
    request_id = http_request.headers.get("x-request-id", str(uuid.uuid4()))
    response.headers["x-request-id"] = request_id
-    image = image_processor.preprocess(
+    log = RequestIDAdapter(logger, {"request_id": request_id})
-        image_url=request.image_url,
+    log.request_id = request_id
        image_base64=request.image_base64,
    )
    try:
-        if request.model_name == "mineru":
+        log.info("Starting image OCR processing")
-            ocr_result = mineru_service.recognize(image)
+        start = time.time()
-        elif request.model_name == "paddle":
+
-            ocr_result = paddle_service.recognize(image)
+        image = image_processor.preprocess(
-        else:
+            image_url=request.image_url,
-            raise HTTPException(status_code=400, detail="Invalid model name")
+            image_base64=request.image_base64,
        )
        ocr_result = glmocr_service.recognize(image)
        log.info(f"OCR completed in {time.time() - start:.3f}s")
    except RuntimeError as e:
        log.error(f"OCR processing failed: {str(e)}", exc_info=True)
        raise HTTPException(status_code=503, detail=str(e))
    except Exception as e:
        log.error(f"Unexpected error during OCR processing: {str(e)}", exc_info=True)
        raise HTTPException(status_code=500, detail="Internal server error")
    return ImageOCRResponse(
        latex=ocr_result.get("latex", ""),
--- a/app/core/config.py
+++ b/app/core/config.py
@@ -3,9 +3,8 @@
 from functools import lru_cache
 from pathlib import Path
 from pydantic_settings import BaseSettings, SettingsConfigDict
 import torch
-from typing import Optional
+from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
@@ -21,25 +20,54 @@ class Settings(BaseSettings):
    api_prefix: str = "/doc_process/v1"
    debug: bool = False
    # Base Host Settings (can be overridden via .env file)
    # Default: 127.0.0.1 (production)
    # Dev: Set BASE_HOST=100.115.184.74 in .env file
    base_host: str = "127.0.0.1"
    # PaddleOCR-VL Settings
-    paddleocr_vl_url: str = "http://127.0.0.1:8000/v1"
+    @property
    def paddleocr_vl_url(self) -> str:
        """Get PaddleOCR-VL URL based on base_host."""
        return f"http://{self.base_host}:8001/v1"
    # MinerOCR Settings
-    miner_ocr_api_url: str = "http://127.0.0.1:8000/file_parse"
+    @property
    def miner_ocr_api_url(self) -> str:
        """Get MinerOCR API URL based on base_host."""
        return f"http://{self.base_host}:8000/file_parse"
    # GLM OCR Settings
    @property
    def glm_ocr_url(self) -> str:
        """Get GLM OCR URL based on base_host."""
        return f"http://{self.base_host}:8002/v1"
    # padding ratio
    is_padding: bool = True
    padding_ratio: float = 0.1
    max_tokens: int = 4096
    # Model Paths
-    pp_doclayout_model_dir: Optional[str] = "/home/yoge/.cache/modelscope/hub/models/PaddlePaddle/PP-DocLayoutV2"
+    pp_doclayout_model_dir: str | None = (
        "/home/yoge/.cache/modelscope/hub/models/PaddlePaddle/PP-DocLayoutV3"
    )
    # Image Processing
    max_image_size_mb: int = 10
-    image_padding_ratio: float = 0.15  # 15% on each side = 30% total expansion
+    image_padding_ratio: float = 0.1  # 10% on each side = 20% total expansion
-    device: torch.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  # cuda:0 or cpu
+    device: torch.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    # Server Settings
    host: str = "0.0.0.0"
    port: int = 8053
    # Logging Settings
    log_dir: str | None = None  # Defaults to /app/logs in container or ./logs locally
    log_level: str = "INFO"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
    @property
    def pp_doclayout_dir(self) -> Path:
        """Get the PP-DocLayout model directory path."""
--- a/app/core/dependencies.py
+++ b/app/core/dependencies.py
@@ -2,7 +2,7 @@
 from app.services.image_processor import ImageProcessor
 from app.services.layout_detector import LayoutDetector
-from app.services.ocr_service import OCRService, MineruOCRService
+from app.services.ocr_service import GLMOCREndToEndService
 from app.services.converter import Converter
 from app.core.config import get_settings
@@ -31,28 +31,17 @@ def get_image_processor() -> ImageProcessor:
    return ImageProcessor()
 def get_ocr_service() -> OCRService:
    """Get an OCR service instance."""
    return OCRService(
        vl_server_url=get_settings().paddleocr_vl_url,
        layout_detector=get_layout_detector(),
        image_processor=get_image_processor(),
        converter=get_converter(),
    )
 def get_converter() -> Converter:
    """Get a DOCX converter instance."""
    return Converter()
-def get_mineru_ocr_service() -> MineruOCRService:
+def get_glmocr_endtoend_service() -> GLMOCREndToEndService:
-    """Get a MinerOCR service instance."""
+    """Get end-to-end GLM-OCR service (layout detection + per-region OCR)."""
    settings = get_settings()
-    api_url = getattr(settings, 'miner_ocr_api_url', 'http://127.0.0.1:8000/file_parse')
+    return GLMOCREndToEndService(
-    return MineruOCRService(
+        vl_server_url=settings.glm_ocr_url,
        api_url=api_url,
        converter=get_converter(),
        image_processor=get_image_processor(),
        converter=get_converter(),
        layout_detector=get_layout_detector(),
    )
--- a/app/core/logging_config.py
+++ b/app/core/logging_config.py
@@ -0,0 +1,157 @@
 """Logging configuration with rotation by day and size."""
 import logging
 import logging.handlers
 from pathlib import Path
 from typing import Any, Optional
 from app.core.config import get_settings
 class TimedRotatingAndSizeFileHandler(logging.handlers.TimedRotatingFileHandler):
    """File handler that rotates by both time (daily) and size (100MB)."""
    def __init__(
        self,
        filename: str,
        when: str = "midnight",
        interval: int = 1,
        backupCount: int = 30,
        maxBytes: int = 100 * 1024 * 1024,  # 100MB
        encoding: Optional[str] = None,
        delay: bool = False,
        utc: bool = False,
        atTime: Optional[Any] = None,
    ):
        """Initialize handler with both time and size rotation.
        Args:
            filename: Log file path
            when: When to rotate (e.g., 'midnight', 'H', 'M')
            interval: Rotation interval
            backupCount: Number of backup files to keep
            maxBytes: Maximum file size before rotation (in bytes)
            encoding: File encoding
            delay: Delay file opening until first emit
            utc: Use UTC time
            atTime: Time to rotate (for 'midnight' rotation)
        """
        super().__init__(
            filename=filename,
            when=when,
            interval=interval,
            backupCount=backupCount,
            encoding=encoding,
            delay=delay,
            utc=utc,
            atTime=atTime,
        )
        self.maxBytes = maxBytes
    def shouldRollover(self, record):
        """Check if rollover should occur based on time or size."""
        # Check time-based rotation first
        if super().shouldRollover(record):
            return True
        # Check size-based rotation
        if self.stream is None:
            self.stream = self._open()
        if self.maxBytes > 0:
            msg = "%s\n" % self.format(record)
            self.stream.seek(0, 2)  # Seek to end
            if self.stream.tell() + len(msg) >= self.maxBytes:
                return True
        return False
 def setup_logging(log_dir: Optional[str] = None) -> logging.Logger:
    """Setup application logging with rotation by day and size.
    Args:
        log_dir: Directory for log files. Defaults to /app/logs in container or ./logs locally.
    Returns:
        Configured logger instance.
    """
    settings = get_settings()
    # Determine log directory
    if log_dir is None:
        log_dir = Path("/app/logs") if Path("/app/logs").exists() else Path("./logs")
    else:
        log_dir = Path(log_dir)
    # Create log directory if it doesn't exist
    log_dir.mkdir(parents=True, exist_ok=True)
    # Create logger
    logger = logging.getLogger("doc_processer")
    logger.setLevel(logging.DEBUG if settings.debug else logging.INFO)
    # Remove existing handlers to avoid duplicates
    logger.handlers.clear()
    # Create custom formatter that handles missing request_id
    class RequestIDFormatter(logging.Formatter):
        """Formatter that handles request_id in log records."""
        def format(self, record):
            # Add request_id if not present
            if not hasattr(record, "request_id"):
                record.request_id = getattr(record, "request_id", "unknown")
            return super().format(record)
    formatter = RequestIDFormatter(
        fmt="%(asctime)s - %(name)s - %(levelname)s - [%(request_id)s] - %(message)s",
        datefmt="%Y-%m-%d %H:%M:%S",
    )
    # File handler with rotation by day and size
    # Rotates daily at midnight OR when file exceeds 100MB, keeps 30 days
    log_file = log_dir / "doc_processer.log"
    file_handler = TimedRotatingAndSizeFileHandler(
        filename=str(log_file),
        when="midnight",
        interval=1,
        backupCount=30,
        maxBytes=100 * 1024 * 1024,  # 100MB
        encoding="utf-8",
    )
    file_handler.setLevel(logging.DEBUG if settings.debug else logging.INFO)
    file_handler.setFormatter(formatter)
    # Console handler
    console_handler = logging.StreamHandler()
    console_handler.setLevel(logging.INFO)
    console_handler.setFormatter(formatter)
    # Add handlers
    logger.addHandler(file_handler)
    logger.addHandler(console_handler)
    return logger
 # Global logger instance
 _logger: Optional[logging.Logger] = None
 def get_logger() -> logging.Logger:
    """Get the global logger instance."""
    global _logger
    if _logger is None:
        _logger = setup_logging()
    return _logger
 class RequestIDAdapter(logging.LoggerAdapter):
    """Logger adapter that adds request_id to log records."""
    def process(self, msg, kwargs):
        """Add request_id to extra if not present."""
        if "extra" not in kwargs:
            kwargs["extra"] = {}
        if "request_id" not in kwargs["extra"]:
            kwargs["extra"]["request_id"] = getattr(self, "request_id", "unknown")
        return msg, kwargs
--- a/app/main.py
+++ b/app/main.py
@@ -7,9 +7,13 @@ from fastapi import FastAPI
 from app.api.v1.router import api_router
 from app.core.config import get_settings
 from app.core.dependencies import init_layout_detector
 from app.core.logging_config import setup_logging
 settings = get_settings()
 # Initialize logging
 setup_logging()
@asynccontextmanager
 async def lifespan(app: FastAPI):
--- a/app/schemas/image.py
+++ b/app/schemas/image.py
@@ -7,6 +7,7 @@ class LayoutRegion(BaseModel):
    """A detected layout region in the document."""
    type: str = Field(..., description="Region type: text, formula, table, figure")
    native_label: str = Field("", description="Raw label before type mapping (e.g. doc_title, formula_number)")
    bbox: list[float] = Field(..., description="Bounding box [x1, y1, x2, y2]")
    confidence: float = Field(..., description="Detection confidence score")
    score: float = Field(..., description="Detection score")
--- a/app/services/converter.py
+++ b/app/services/converter.py
@@ -136,6 +136,7 @@ class Converter:
        """Get cached XSLT transform for MathML to mml: conversion."""
        if cls._mml_xslt_transform is None:
            from lxml import etree
            xslt_doc = etree.fromstring(MML_XSLT.encode("utf-8"))
            cls._mml_xslt_transform = etree.XSLT(xslt_doc)
        return cls._mml_xslt_transform
@@ -197,14 +198,17 @@ class Converter:
            return ConvertResult(latex="", mathml="", mml="")
        try:
            # Detect if formula is display (block) or inline
            is_display = self._is_display_formula(md_text)
            # Extract the LaTeX formula content (remove delimiters)
            latex_formula = self._extract_latex_formula(md_text)
            # Preprocess formula for better conversion (fix array specifiers, etc.)
            preprocessed_formula = self._preprocess_formula_for_conversion(latex_formula)
-            # Convert to MathML
+            # Convert to MathML (pass display flag to use correct delimiters)
-            mathml = self._latex_to_mathml(preprocessed_formula)
+            mathml = self._latex_to_mathml(preprocessed_formula, is_display=is_display)
            # Convert MathML to mml:math format (with namespace prefix)
            mml = self._mathml_to_mml(mathml)
@@ -238,18 +242,18 @@ class Converter:
        # Preprocess formula using the same preprocessing as export
        preprocessed = self._preprocess_formula_for_conversion(latex_formula.strip())
-        
+
        return self._latex_to_omml(preprocessed)
    def _preprocess_formula_for_conversion(self, latex_formula: str) -> str:
        """Preprocess LaTeX formula for any conversion (MathML, OMML, etc.).
        Applies the same preprocessing steps as preprocess_for_export to ensure
-        consistency across all conversion paths. This fixes common issues that 
+        consistency across all conversion paths. This fixes common issues that
        cause Pandoc conversion to fail.
-        Note: OCR number errors are fixed earlier in the pipeline (in ocr_service.py),
+        Note: OCR errors (number errors, command spacing) are fixed earlier in the
-        so we don't need to handle them here.
+        pipeline (in ocr_service.py), so we don't need to handle them here.
        Args:
            latex_formula: Pure LaTeX formula.
@@ -259,18 +263,38 @@ class Converter:
        """
        # 1. Convert matrix environments
        latex_formula = self._convert_matrix_environments(latex_formula)
-        
+
        # 2. Fix array column specifiers (remove spaces)
        latex_formula = self._fix_array_column_specifiers(latex_formula)
-        
+
        # 3. Fix brace spacing
        latex_formula = self._fix_brace_spacing(latex_formula)
-        
+
        # 4. Convert special environments (cases, aligned)
        latex_formula = self._convert_special_environments(latex_formula)
-        
+
        return latex_formula
    def _is_display_formula(self, text: str) -> bool:
        """Check if the formula is a display (block) formula.
        Args:
            text: Text containing LaTeX formula with delimiters.
        Returns:
            True if display formula ($$...$$ or \\[...\\]), False if inline.
        """
        text = text.strip()
        # Display math delimiters: $$...$$ or \[...\]
        if text.startswith("$$") and text.endswith("$$"):
            return True
        if text.startswith("\\[") and text.endswith("\\]"):
            return True
        # Inline math delimiters: $...$ or \(...\)
        return False
    def _extract_latex_formula(self, text: str) -> str:
        """Extract LaTeX formula from text by removing delimiters.
@@ -299,18 +323,30 @@ class Converter:
    @staticmethod
    @lru_cache(maxsize=256)
-    def _latex_to_mathml_cached(latex_formula: str) -> str:
+    def _latex_to_mathml_cached(latex_formula: str, is_display: bool = False) -> str:
        """Cached conversion of LaTeX formula to MathML.
        Uses Pandoc for conversion to ensure Word compatibility.
        Pandoc generates standard MathML that Word can properly import.
-        Uses LRU cache to avoid recomputing for repeated formulas.
+        Args:
            latex_formula: Pure LaTeX formula (without delimiters).
            is_display: True if display (block) formula, False if inline.
        Returns:
            Standard MathML representation.
        """
        # Use appropriate delimiters based on formula type
        # Display formulas use $$...$$, inline formulas use $...$
        if is_display:
            pandoc_input = f"$${latex_formula}$$"
        else:
            pandoc_input = f"${latex_formula}$"
        try:
            # Use Pandoc for Word-compatible MathML (primary method)
            mathml_html = pypandoc.convert_text(
-                f"${latex_formula}$",
+                pandoc_input,
                "html",
                format="markdown+tex_math_dollars",
                extra_args=["--mathml"],
@@ -321,24 +357,23 @@ class Converter:
                mathml = match.group(0)
                # Post-process for Word compatibility
                return Converter._postprocess_mathml_for_word(mathml)
-            
+
-            # If no match, return as-is
+            # If Pandoc didn't generate MathML (returned HTML instead), use fallback
-            return mathml_html.rstrip("\n")
+            # This happens when Pandoc's mathml output format is not available or fails
-            
+            raise ValueError("Pandoc did not generate MathML, got HTML instead")
        except Exception as pandoc_error:
            # Fallback: try latex2mathml (less Word-compatible)
            try:
                mathml = latex_to_mathml(latex_formula)
                return Converter._postprocess_mathml_for_word(mathml)
            except Exception as e:
-                raise RuntimeError(
+                raise RuntimeError(f"MathML conversion failed: {pandoc_error}. latex2mathml fallback also failed: {e}") from e
-                    f"MathML conversion failed: {pandoc_error}. latex2mathml fallback also failed: {e}"
+
                ) from e
    @staticmethod
    def _postprocess_mathml_for_word(mathml: str) -> str:
        """Post-process MathML to improve Word compatibility.
-        
+
        Applies transformations to make MathML more compatible and concise:
        - Remove <semantics> and <annotation> wrappers (Word doesn't need them)
        - Remove unnecessary attributes (form, stretchy, fence, columnalign, etc.)
@@ -346,32 +381,32 @@ class Converter:
        - Change display="inline" to display="block" for better rendering
        - Decode Unicode entities to actual characters (Word prefers this)
        - Ensure proper namespace
-        
+
        Args:
            mathml: MathML string.
-            
+
        Returns:
            Simplified, Word-compatible MathML string.
        """
        import re
-        
+
        # Step 1: Remove <semantics> and <annotation> wrappers
        # These often cause Word import issues
-        if '<semantics>' in mathml:
+        if "<semantics>" in mathml:
            # Extract content between <semantics> and <annotation>
-            match = re.search(r'<semantics>(.*?)<annotation', mathml, re.DOTALL)
+            match = re.search(r"<semantics>(.*?)<annotation", mathml, re.DOTALL)
            if match:
                content = match.group(1).strip()
-                
+
                # Get the math element attributes
                math_attrs = ""
-                math_match = re.search(r'<math([^>]*)>', mathml)
+                math_match = re.search(r"<math([^>]*)>", mathml)
                if math_match:
                    math_attrs = math_match.group(1)
-                
+
                # Rebuild without semantics
-                mathml = f'<math{math_attrs}>{content}</math>'
+                mathml = f"<math{math_attrs}>{content}</math>"
-        
+
        # Step 2: Remove unnecessary attributes that don't affect rendering
        # These are verbose and Word doesn't need them
        unnecessary_attrs = [
@@ -390,234 +425,231 @@ class Converter:
            r'\s+class="[^"]*"',
            r'\s+style="[^"]*"',
        ]
-        
+
        for attr_pattern in unnecessary_attrs:
-            mathml = re.sub(attr_pattern, '', mathml)
+            mathml = re.sub(attr_pattern, "", mathml)
-        
+
        # Step 3: Remove redundant single <mrow> wrapper at the top level
        # Pattern: <math ...><mrow>content</mrow></math>
        # Simplify to: <math ...>content</math>
-        mrow_pattern = r'(<math[^>]*>)\s*<mrow>(.*?)</mrow>\s*(</math>)'
+        mrow_pattern = r"(<math[^>]*>)\s*<mrow>(.*?)</mrow>\s*(</math>)"
        match = re.search(mrow_pattern, mathml, re.DOTALL)
        if match:
            # Check if there's only one mrow at the top level
            content = match.group(2)
            # Only remove if the content doesn't have other top-level elements
-            if not re.search(r'</[^>]+>\s*<[^/]', content):
+            if not re.search(r"</[^>]+>\s*<[^/]", content):
-                mathml = f'{match.group(1)}{content}{match.group(3)}'
+                mathml = f"{match.group(1)}{content}{match.group(3)}"
-        
+
        # Step 4: Change display to block for better Word rendering
        mathml = mathml.replace('display="inline"', 'display="block"')
-        
+
        # Step 5: If no display attribute, add it
-        if 'display=' not in mathml and '<math' in mathml:
+        if "display=" not in mathml and "<math" in mathml:
-            mathml = mathml.replace('<math', '<math display="block"', 1)
+            mathml = mathml.replace("<math", '<math display="block"', 1)
-        
+
        # Step 6: Ensure xmlns is present
-        if 'xmlns=' not in mathml and '<math' in mathml:
+        if "xmlns=" not in mathml and "<math" in mathml:
-            mathml = mathml.replace('<math', '<math xmlns="http://www.w3.org/1998/Math/MathML"', 1)
+            mathml = mathml.replace("<math", '<math xmlns="http://www.w3.org/1998/Math/MathML"', 1)
-        
+
        # Step 7: Decode common Unicode entities to actual characters (Word prefers this)
        unicode_map = {
            # Basic operators
-            '&#x0002B;': '+',
+            "&#x0002B;": "+",
-            '&#x0002D;': '-',
+            "&#x0002D;": "-",
-            '&#x0002A;': '*',
+            "&#x0002A;": "*",
-            '&#x0002F;': '/',
+            "&#x0002F;": "/",
-            '&#x0003D;': '=',
+            "&#x0003D;": "=",
-            '&#x0003C;': '<',
+            "&#x0003C;": "<",
-            '&#x0003E;': '>',
+            "&#x0003E;": ">",
-            '&#x00028;': '(',
+            "&#x00028;": "(",
-            '&#x00029;': ')',
+            "&#x00029;": ")",
-            '&#x0002C;': ',',
+            "&#x0002C;": ",",
-            '&#x0002E;': '.',
+            "&#x0002E;": ".",
-            '&#x0007C;': '|',
+            "&#x0007C;": "|",
-            '&#x00B0;': '°',
+            "&#x00B0;": "°",
-            '&#x00D7;': '×',  # times
+            "&#x00D7;": "×",  # times
-            '&#x00F7;': '÷',  # div
+            "&#x00F7;": "÷",  # div
-            '&#x00B1;': '±',  # pm
+            "&#x00B1;": "±",  # pm
-            '&#x2213;': '∓',  # mp
+            "&#x2213;": "∓",  # mp
            # Ellipsis symbols
-            '&#x02026;': '…',  # ldots (horizontal)
+            "&#x02026;": "…",  # ldots (horizontal)
-            '&#x022EE;': '⋮',  # vdots (vertical)
+            "&#x022EE;": "⋮",  # vdots (vertical)
-            '&#x022EF;': '⋯',  # cdots (centered)
+            "&#x022EF;": "⋯",  # cdots (centered)
-            '&#x022F0;': '⋰',  # iddots (diagonal up)
+            "&#x022F0;": "⋰",  # iddots (diagonal up)
-            '&#x022F1;': '⋱',  # ddots (diagonal down)
+            "&#x022F1;": "⋱",  # ddots (diagonal down)
            # Greek letters (lowercase)
-            '&#x03B1;': 'α',  # alpha
+            "&#x03B1;": "α",  # alpha
-            '&#x03B2;': 'β',  # beta
+            "&#x03B2;": "β",  # beta
-            '&#x03B3;': 'γ',  # gamma
+            "&#x03B3;": "γ",  # gamma
-            '&#x03B4;': 'δ',  # delta
+            "&#x03B4;": "δ",  # delta
-            '&#x03B5;': 'ε',  # epsilon
+            "&#x03B5;": "ε",  # epsilon
-            '&#x03B6;': 'ζ',  # zeta
+            "&#x03B6;": "ζ",  # zeta
-            '&#x03B7;': 'η',  # eta
+            "&#x03B7;": "η",  # eta
-            '&#x03B8;': 'θ',  # theta
+            "&#x03B8;": "θ",  # theta
-            '&#x03B9;': 'ι',  # iota
+            "&#x03B9;": "ι",  # iota
-            '&#x03BA;': 'κ',  # kappa
+            "&#x03BA;": "κ",  # kappa
-            '&#x03BB;': 'λ',  # lambda
+            "&#x03BB;": "λ",  # lambda
-            '&#x03BC;': 'μ',  # mu
+            "&#x03BC;": "μ",  # mu
-            '&#x03BD;': 'ν',  # nu
+            "&#x03BD;": "ν",  # nu
-            '&#x03BE;': 'ξ',  # xi
+            "&#x03BE;": "ξ",  # xi
-            '&#x03BF;': 'ο',  # omicron
+            "&#x03BF;": "ο",  # omicron
-            '&#x03C0;': 'π',  # pi
+            "&#x03C0;": "π",  # pi
-            '&#x03C1;': 'ρ',  # rho
+            "&#x03C1;": "ρ",  # rho
-            '&#x03C2;': 'ς',  # final sigma
+            "&#x03C2;": "ς",  # final sigma
-            '&#x03C3;': 'σ',  # sigma
+            "&#x03C3;": "σ",  # sigma
-            '&#x03C4;': 'τ',  # tau
+            "&#x03C4;": "τ",  # tau
-            '&#x03C5;': 'υ',  # upsilon
+            "&#x03C5;": "υ",  # upsilon
-            '&#x03C6;': 'φ',  # phi
+            "&#x03C6;": "φ",  # phi
-            '&#x03C7;': 'χ',  # chi
+            "&#x03C7;": "χ",  # chi
-            '&#x03C8;': 'ψ',  # psi
+            "&#x03C8;": "ψ",  # psi
-            '&#x03C9;': 'ω',  # omega
+            "&#x03C9;": "ω",  # omega
-            '&#x03D5;': 'ϕ',  # phi variant
+            "&#x03D5;": "ϕ",  # phi variant
            # Greek letters (uppercase)
-            '&#x0391;': 'Α',  # Alpha
+            "&#x0391;": "Α",  # Alpha
-            '&#x0392;': 'Β',  # Beta
+            "&#x0392;": "Β",  # Beta
-            '&#x0393;': 'Γ',  # Gamma
+            "&#x0393;": "Γ",  # Gamma
-            '&#x0394;': 'Δ',  # Delta
+            "&#x0394;": "Δ",  # Delta
-            '&#x0395;': 'Ε',  # Epsilon
+            "&#x0395;": "Ε",  # Epsilon
-            '&#x0396;': 'Ζ',  # Zeta
+            "&#x0396;": "Ζ",  # Zeta
-            '&#x0397;': 'Η',  # Eta
+            "&#x0397;": "Η",  # Eta
-            '&#x0398;': 'Θ',  # Theta
+            "&#x0398;": "Θ",  # Theta
-            '&#x0399;': 'Ι',  # Iota
+            "&#x0399;": "Ι",  # Iota
-            '&#x039A;': 'Κ',  # Kappa
+            "&#x039A;": "Κ",  # Kappa
-            '&#x039B;': 'Λ',  # Lambda
+            "&#x039B;": "Λ",  # Lambda
-            '&#x039C;': 'Μ',  # Mu
+            "&#x039C;": "Μ",  # Mu
-            '&#x039D;': 'Ν',  # Nu
+            "&#x039D;": "Ν",  # Nu
-            '&#x039E;': 'Ξ',  # Xi
+            "&#x039E;": "Ξ",  # Xi
-            '&#x039F;': 'Ο',  # Omicron
+            "&#x039F;": "Ο",  # Omicron
-            '&#x03A0;': 'Π',  # Pi
+            "&#x03A0;": "Π",  # Pi
-            '&#x03A1;': 'Ρ',  # Rho
+            "&#x03A1;": "Ρ",  # Rho
-            '&#x03A3;': 'Σ',  # Sigma
+            "&#x03A3;": "Σ",  # Sigma
-            '&#x03A4;': 'Τ',  # Tau
+            "&#x03A4;": "Τ",  # Tau
-            '&#x03A5;': 'Υ',  # Upsilon
+            "&#x03A5;": "Υ",  # Upsilon
-            '&#x03A6;': 'Φ',  # Phi
+            "&#x03A6;": "Φ",  # Phi
-            '&#x03A7;': 'Χ',  # Chi
+            "&#x03A7;": "Χ",  # Chi
-            '&#x03A8;': 'Ψ',  # Psi
+            "&#x03A8;": "Ψ",  # Psi
-            '&#x03A9;': 'Ω',  # Omega
+            "&#x03A9;": "Ω",  # Omega
            # Math symbols
-            '&#x2205;': '∅',  # emptyset
+            "&#x2205;": "∅",  # emptyset
-            '&#x2208;': '∈',  # in
+            "&#x2208;": "∈",  # in
-            '&#x2209;': '∉',  # notin
+            "&#x2209;": "∉",  # notin
-            '&#x220B;': '∋',  # ni
+            "&#x220B;": "∋",  # ni
-            '&#x220C;': '∌',  # nni
+            "&#x220C;": "∌",  # nni
-            '&#x2211;': '∑',  # sum
+            "&#x2211;": "∑",  # sum
-            '&#x220F;': '∏',  # prod
+            "&#x220F;": "∏",  # prod
-            '&#x221A;': '√',  # sqrt
+            "&#x221A;": "√",  # sqrt
-            '&#x221B;': '∛',  # cbrt
+            "&#x221B;": "∛",  # cbrt
-            '&#x221C;': '∜',  # fourthroot
+            "&#x221C;": "∜",  # fourthroot
-            '&#x221E;': '∞',  # infty
+            "&#x221E;": "∞",  # infty
-            '&#x2229;': '∩',  # cap
+            "&#x2229;": "∩",  # cap
-            '&#x222A;': '∪',  # cup
+            "&#x222A;": "∪",  # cup
-            '&#x222B;': '∫',  # int
+            "&#x222B;": "∫",  # int
-            '&#x222C;': '∬',  # iint
+            "&#x222C;": "∬",  # iint
-            '&#x222D;': '∭',  # iiint
+            "&#x222D;": "∭",  # iiint
-            '&#x222E;': '∮',  # oint
+            "&#x222E;": "∮",  # oint
-            '&#x2282;': '⊂',  # subset
+            "&#x2282;": "⊂",  # subset
-            '&#x2283;': '⊃',  # supset
+            "&#x2283;": "⊃",  # supset
-            '&#x2284;': '⊄',  # nsubset
+            "&#x2284;": "⊄",  # nsubset
-            '&#x2285;': '⊅',  # nsupset
+            "&#x2285;": "⊅",  # nsupset
-            '&#x2286;': '⊆',  # subseteq
+            "&#x2286;": "⊆",  # subseteq
-            '&#x2287;': '⊇',  # supseteq
+            "&#x2287;": "⊇",  # supseteq
-            '&#x2288;': '⊈',  # nsubseteq
+            "&#x2288;": "⊈",  # nsubseteq
-            '&#x2289;': '⊉',  # nsupseteq
+            "&#x2289;": "⊉",  # nsupseteq
-            '&#x2264;': '≤',  # leq
+            "&#x2264;": "≤",  # leq
-            '&#x2265;': '≥',  # geq
+            "&#x2265;": "≥",  # geq
-            '&#x2260;': '≠',  # neq
+            "&#x2260;": "≠",  # neq
-            '&#x2261;': '≡',  # equiv
+            "&#x2261;": "≡",  # equiv
-            '&#x2248;': '≈',  # approx
+            "&#x2248;": "≈",  # approx
-            '&#x2243;': '≃',  # simeq
+            "&#x2243;": "≃",  # simeq
-            '&#x2245;': '≅',  # cong
+            "&#x2245;": "≅",  # cong
-            '&#x2202;': '∂',  # partial
+            "&#x2202;": "∂",  # partial
-            '&#x2207;': '∇',  # nabla
+            "&#x2207;": "∇",  # nabla
-            '&#x2200;': '∀',  # forall
+            "&#x2200;": "∀",  # forall
-            '&#x2203;': '∃',  # exists
+            "&#x2203;": "∃",  # exists
-            '&#x2204;': '∄',  # nexists
+            "&#x2204;": "∄",  # nexists
-            '&#x00AC;': '¬',  # neg/lnot
+            "&#x00AC;": "¬",  # neg/lnot
-            '&#x2227;': '∧',  # wedge/land
+            "&#x2227;": "∧",  # wedge/land
-            '&#x2228;': '∨',  # vee/lor
+            "&#x2228;": "∨",  # vee/lor
-            '&#x2192;': '→',  # to/rightarrow
+            "&#x2192;": "→",  # to/rightarrow
-            '&#x2190;': '←',  # leftarrow
+            "&#x2190;": "←",  # leftarrow
-            '&#x2194;': '↔',  # leftrightarrow
+            "&#x2194;": "↔",  # leftrightarrow
-            '&#x21D2;': '⇒',  # Rightarrow
+            "&#x21D2;": "⇒",  # Rightarrow
-            '&#x21D0;': '⇐',  # Leftarrow
+            "&#x21D0;": "⇐",  # Leftarrow
-            '&#x21D4;': '⇔',  # Leftrightarrow
+            "&#x21D4;": "⇔",  # Leftrightarrow
-            '&#x2191;': '↑',  # uparrow
+            "&#x2191;": "↑",  # uparrow
-            '&#x2193;': '↓',  # downarrow
+            "&#x2193;": "↓",  # downarrow
-            '&#x21D1;': '⇑',  # Uparrow
+            "&#x21D1;": "⇑",  # Uparrow
-            '&#x21D3;': '⇓',  # Downarrow
+            "&#x21D3;": "⇓",  # Downarrow
-            '&#x2195;': '↕',  # updownarrow
+            "&#x2195;": "↕",  # updownarrow
-            '&#x21D5;': '⇕',  # Updownarrow
+            "&#x21D5;": "⇕",  # Updownarrow
-            '&#x2260;': '≠',  # ne
+            "&#x2260;": "≠",  # ne
-            '&#x226A;': '≪',  # ll
+            "&#x226A;": "≪",  # ll
-            '&#x226B;': '≫',  # gg
+            "&#x226B;": "≫",  # gg
-            '&#x2A7D;': '⩽',  # leqslant
+            "&#x2A7D;": "⩽",  # leqslant
-            '&#x2A7E;': '⩾',  # geqslant
+            "&#x2A7E;": "⩾",  # geqslant
-            '&#x22A5;': '⊥',  # perp
+            "&#x22A5;": "⊥",  # perp
-            '&#x2225;': '∥',  # parallel
+            "&#x2225;": "∥",  # parallel
-            '&#x2220;': '∠',  # angle
+            "&#x2220;": "∠",  # angle
-            '&#x25B3;': '△',  # triangle
+            "&#x25B3;": "△",  # triangle
-            '&#x25A1;': '□',  # square
+            "&#x25A1;": "□",  # square
-            '&#x25CA;': '◊',  # diamond
+            "&#x25CA;": "◊",  # diamond
-            '&#x2660;': '♠',  # spadesuit
+            "&#x2660;": "♠",  # spadesuit
-            '&#x2661;': '♡',  # heartsuit
+            "&#x2661;": "♡",  # heartsuit
-            '&#x2662;': '♢',  # diamondsuit
+            "&#x2662;": "♢",  # diamondsuit
-            '&#x2663;': '♣',  # clubsuit
+            "&#x2663;": "♣",  # clubsuit
-            '&#x2113;': 'ℓ',  # ell
+            "&#x2113;": "ℓ",  # ell
-            '&#x2118;': '℘',  # wp (Weierstrass p)
+            "&#x2118;": "℘",  # wp (Weierstrass p)
-            '&#x211C;': 'ℜ',  # Re (real part)
+            "&#x211C;": "ℜ",  # Re (real part)
-            '&#x2111;': 'ℑ',  # Im (imaginary part)
+            "&#x2111;": "ℑ",  # Im (imaginary part)
-            '&#x2135;': 'ℵ',  # aleph
+            "&#x2135;": "ℵ",  # aleph
-            '&#x2136;': 'ℶ',  # beth
+            "&#x2136;": "ℶ",  # beth
        }
-        
+
        for entity, char in unicode_map.items():
            mathml = mathml.replace(entity, char)
-        
+
        # Also handle decimal entity format (&#NNNN;) for common characters
        # Convert decimal to hex-based lookup
        decimal_patterns = [
-            (r'&#955;', 'λ'),    # lambda (decimal 955 = hex 03BB)
+            (r"&#955;", "λ"),  # lambda (decimal 955 = hex 03BB)
-            (r'&#8942;', '⋮'),   # vdots (decimal 8942 = hex 22EE)
+            (r"&#8942;", "⋮"),  # vdots (decimal 8942 = hex 22EE)
-            (r'&#8943;', '⋯'),   # cdots (decimal 8943 = hex 22EF)
+            (r"&#8943;", "⋯"),  # cdots (decimal 8943 = hex 22EF)
-            (r'&#8230;', '…'),   # ldots (decimal 8230 = hex 2026)
+            (r"&#8230;", "…"),  # ldots (decimal 8230 = hex 2026)
-            (r'&#8734;', '∞'),   # infty (decimal 8734 = hex 221E)
+            (r"&#8734;", "∞"),  # infty (decimal 8734 = hex 221E)
-            (r'&#8721;', '∑'),   # sum (decimal 8721 = hex 2211)
+            (r"&#8721;", "∑"),  # sum (decimal 8721 = hex 2211)
-            (r'&#8719;', '∏'),   # prod (decimal 8719 = hex 220F)
+            (r"&#8719;", "∏"),  # prod (decimal 8719 = hex 220F)
-            (r'&#8730;', '√'),   # sqrt (decimal 8730 = hex 221A)
+            (r"&#8730;", "√"),  # sqrt (decimal 8730 = hex 221A)
-            (r'&#8712;', '∈'),   # in (decimal 8712 = hex 2208)
+            (r"&#8712;", "∈"),  # in (decimal 8712 = hex 2208)
-            (r'&#8713;', '∉'),   # notin (decimal 8713 = hex 2209)
+            (r"&#8713;", "∉"),  # notin (decimal 8713 = hex 2209)
-            (r'&#8745;', '∩'),   # cap (decimal 8745 = hex 2229)
+            (r"&#8745;", "∩"),  # cap (decimal 8745 = hex 2229)
-            (r'&#8746;', '∪'),   # cup (decimal 8746 = hex 222A)
+            (r"&#8746;", "∪"),  # cup (decimal 8746 = hex 222A)
-            (r'&#8804;', '≤'),   # leq (decimal 8804 = hex 2264)
+            (r"&#8804;", "≤"),  # leq (decimal 8804 = hex 2264)
-            (r'&#8805;', '≥'),   # geq (decimal 8805 = hex 2265)
+            (r"&#8805;", "≥"),  # geq (decimal 8805 = hex 2265)
-            (r'&#8800;', '≠'),   # neq (decimal 8800 = hex 2260)
+            (r"&#8800;", "≠"),  # neq (decimal 8800 = hex 2260)
-            (r'&#8776;', '≈'),   # approx (decimal 8776 = hex 2248)
+            (r"&#8776;", "≈"),  # approx (decimal 8776 = hex 2248)
-            (r'&#8801;', '≡'),   # equiv (decimal 8801 = hex 2261)
+            (r"&#8801;", "≡"),  # equiv (decimal 8801 = hex 2261)
        ]
-        
+
        for pattern, char in decimal_patterns:
            mathml = mathml.replace(pattern, char)
-        
+
        # Step 8: Clean up extra whitespace
-        mathml = re.sub(r'>\s+<', '><', mathml)
+        mathml = re.sub(r">\s+<", "><", mathml)
-        
+
        return mathml
-    def _latex_to_mathml(self, latex_formula: str) -> str:
+    def _latex_to_mathml(self, latex_formula: str, is_display: bool = False) -> str:
        """Convert LaTeX formula to standard MathML.
        Args:
            latex_formula: Pure LaTeX formula (without delimiters).
            is_display: True if display (block) formula, False if inline.
        Returns:
            Standard MathML representation.
        """
-        return self._latex_to_mathml_cached(latex_formula)
+        return self._latex_to_mathml_cached(latex_formula, is_display=is_display)
    def _mathml_to_mml(self, mathml: str) -> str:
        """Convert standard MathML to mml:math format with namespace prefix.
--- a/app/services/glm_postprocess.py
+++ b/app/services/glm_postprocess.py
@@ -0,0 +1,428 @@
 """GLM-OCR postprocessing logic adapted for this project.
 Ported from glm-ocr/glmocr/postprocess/result_formatter.py and
 glm-ocr/glmocr/utils/result_postprocess_utils.py.
 Covers:
 - Repeated-content / hallucination detection
 - Per-region content cleaning and formatting (titles, bullets, formulas)
 - formula_number merging (→ \\tag{})
 - Hyphenated text-block merging (via wordfreq)
 - Missing bullet-point detection
 """
 from __future__ import annotations
 import logging
 import re
 import json
 logger = logging.getLogger(__name__)
 from collections import Counter
 from copy import deepcopy
 from typing import Any, Dict, List, Optional, Tuple
 try:
    from wordfreq import zipf_frequency
    _WORDFREQ_AVAILABLE = True
 except ImportError:
    _WORDFREQ_AVAILABLE = False
 # ---------------------------------------------------------------------------
 # result_postprocess_utils (ported)
 # ---------------------------------------------------------------------------
 def find_consecutive_repeat(s: str, min_unit_len: int = 10, min_repeats: int = 10) -> Optional[str]:
    """Detect and truncate a consecutively-repeated pattern.
    Returns the string with the repeat removed, or None if not found.
    """
    n = len(s)
    if n < min_unit_len * min_repeats:
        return None
    max_unit_len = n // min_repeats
    if max_unit_len < min_unit_len:
        return None
    pattern = re.compile(
        r"(.{" + str(min_unit_len) + "," + str(max_unit_len) + r"}?)\1{" + str(min_repeats - 1) + ",}",
        re.DOTALL,
    )
    match = pattern.search(s)
    if match:
        return s[: match.start()] + match.group(1)
    return None
 def clean_repeated_content(
    content: str,
    min_len: int = 10,
    min_repeats: int = 10,
    line_threshold: int = 10,
 ) -> str:
    """Remove hallucination-style repeated content (consecutive or line-level)."""
    stripped = content.strip()
    if not stripped:
        return content
    # 1. Consecutive repeat (multi-line aware)
    if len(stripped) > min_len * min_repeats:
        result = find_consecutive_repeat(stripped, min_unit_len=min_len, min_repeats=min_repeats)
        if result is not None:
            return result
    # 2. Line-level repeat
    lines = [line.strip() for line in content.split("\n") if line.strip()]
    total_lines = len(lines)
    if total_lines >= line_threshold and lines:
        common, count = Counter(lines).most_common(1)[0]
        if count >= line_threshold and (count / total_lines) >= 0.8:
            for i, line in enumerate(lines):
                if line == common:
                    consecutive = sum(1 for j in range(i, min(i + 3, len(lines))) if lines[j] == common)
                    if consecutive >= 3:
                        original_lines = content.split("\n")
                        non_empty_count = 0
                        for idx, orig_line in enumerate(original_lines):
                            if orig_line.strip():
                                non_empty_count += 1
                                if non_empty_count == i + 1:
                                    return "\n".join(original_lines[: idx + 1])
                        break
    return content
 def clean_formula_number(number_content: str) -> str:
    """Strip delimiters from a formula number string, e.g. '(1)' → '1'.
    Also strips math-mode delimiters ($$, $, \\[...\\]) that vLLM may add
    when the region is processed with a formula prompt.
    """
    s = number_content.strip()
    # Strip display math delimiters
    for start, end in [("$$", "$$"), (r"\[", r"\]"), ("$", "$"), (r"\(", r"\)")]:
        if s.startswith(start) and s.endswith(end) and len(s) > len(start) + len(end):
            s = s[len(start):-len(end)].strip()
            break
    # Strip CJK/ASCII parentheses
    if s.startswith("(") and s.endswith(")"):
        return s[1:-1]
    if s.startswith("（") and s.endswith("）"):
        return s[1:-1]
    return s
 # ---------------------------------------------------------------------------
 # GLMResultFormatter
 # ---------------------------------------------------------------------------
 # Label → canonical category mapping (mirrors GLM-OCR label_visualization_mapping)
 _LABEL_TO_CATEGORY: Dict[str, str] = {
    # text
    "abstract": "text",
    "algorithm": "text",
    "content": "text",
    "doc_title": "text",
    "figure_title": "text",
    "paragraph_title": "text",
    "reference_content": "text",
    "text": "text",
    "vertical_text": "text",
    "vision_footnote": "text",
    "seal": "text",
    "formula_number": "text",
    # table
    "table": "table",
    # formula
    "display_formula": "formula",
    "inline_formula": "formula",
    # image (skip OCR)
    "chart": "image",
    "image": "image",
 }
 class GLMResultFormatter:
    """Port of GLM-OCR's ResultFormatter for use in our pipeline.
    Accepts a list of region dicts (each with label, native_label, content,
    bbox_2d) and returns a final Markdown string.
    """
    # ------------------------------------------------------------------ #
    # Public entry-point
    # ------------------------------------------------------------------ #
    def process(self, regions: List[Dict[str, Any]]) -> str:
        """Run the full postprocessing pipeline and return Markdown.
        Args:
            regions: List of dicts with keys:
                - index        (int)  reading order from layout detection
                - label        (str)  mapped category: text/formula/table/figure
                - native_label (str)  raw PP-DocLayout label (e.g. doc_title)
                - content      (str)  raw OCR output from vLLM
                - bbox_2d      (list) [x1, y1, x2, y2] in 0-1000 normalised coords
        Returns:
            Markdown string.
        """
        # Sort by reading order
        items = sorted(deepcopy(regions), key=lambda x: x.get("index", 0))
        # Per-region cleaning + formatting
        processed: List[Dict] = []
        for item in items:
            item["native_label"] = item.get("native_label", item.get("label", "text"))
            item["label"] = self._map_label(item.get("label", "text"), item["native_label"])
            item["content"] = self._format_content(
                item.get("content") or "",
                item["label"],
                item["native_label"],
            )
            if not (item.get("content") or "").strip():
                continue
            processed.append(item)
        # Re-index
        for i, item in enumerate(processed):
            item["index"] = i
        # Structural merges
        processed = self._merge_formula_numbers(processed)
        processed = self._merge_text_blocks(processed)
        processed = self._format_bullet_points(processed)
        # Assemble Markdown
        parts: List[str] = []
        for item in processed:
            content = item.get("content") or ""
            if item["label"] == "image":
                parts.append(f"![](bbox={item.get('bbox_2d', [])})")
            elif content.strip():
                parts.append(content)
        return "\n\n".join(parts)
    # ------------------------------------------------------------------ #
    # Label mapping
    # ------------------------------------------------------------------ #
    def _map_label(self, label: str, native_label: str) -> str:
        return _LABEL_TO_CATEGORY.get(native_label, _LABEL_TO_CATEGORY.get(label, "text"))
    # ------------------------------------------------------------------ #
    # Content cleaning
    # ------------------------------------------------------------------ #
    def _clean_content(self, content: str) -> str:
        """Remove artefacts: leading/trailing \\t, repeated punctuation, long repeats."""
        if content is None:
            return ""
        content = re.sub(r"^(\\t)+", "", content).lstrip()
        content = re.sub(r"(\\t)+$", "", content).rstrip()
        content = re.sub(r"(\.)\1{2,}", r"\1\1\1", content)
        content = re.sub(r"(·)\1{2,}", r"\1\1\1", content)
        content = re.sub(r"(_)\1{2,}", r"\1\1\1", content)
        content = re.sub(r"(\\_)\1{2,}", r"\1\1\1", content)
        if len(content) >= 2048:
            content = clean_repeated_content(content)
        return content.strip()
    # ------------------------------------------------------------------ #
    # Per-region content formatting
    # ------------------------------------------------------------------ #
    def _format_content(self, content: Any, label: str, native_label: str) -> str:
        """Clean and format a single region's content."""
        if content is None:
            return ""
        content = self._clean_content(str(content))
        # Heading formatting
        if native_label == "doc_title":
            content = re.sub(r"^#+\s*", "", content)
            content = "# " + content
        elif native_label == "paragraph_title":
            if content.startswith("- ") or content.startswith("* "):
                content = content[2:].lstrip()
            content = re.sub(r"^#+\s*", "", content)
            content = "## " + content.lstrip()
        # Formula wrapping
        if label == "formula":
            content = content.strip()
            for s, e in [("$$", "$$"), (r"\[", r"\]"), (r"\(", r"\)")]:
                if content.startswith(s) and content.endswith(e):
                    content = content[len(s) : -len(e)].strip()
                    break
            if not content:
                logger.warning("Skipping formula region with empty content after stripping delimiters")
                return ""
            content = "$$\n" + content + "\n$$"
        # Text formatting
        if label == "text":
            if content.startswith("·") or content.startswith("•") or content.startswith("* "):
                content = "- " + content[1:].lstrip()
            match = re.match(r"^(\(|\（)(\d+|[A-Za-z])(\)|\）)(.*)$", content)
            if match:
                _, symbol, _, rest = match.groups()
                content = f"({symbol}) {rest.lstrip()}"
            match = re.match(r"^(\d+|[A-Za-z])(\.|\)|\）)(.*)$", content)
            if match:
                symbol, sep, rest = match.groups()
                sep = ")" if sep == "）" else sep
                content = f"{symbol}{sep} {rest.lstrip()}"
            # Single newline → double newline
            content = re.sub(r"(?<!\n)\n(?!\n)", "\n\n", content)
        return content
    # ------------------------------------------------------------------ #
    # Structural merges
    # ------------------------------------------------------------------ #
    def _merge_formula_numbers(self, items: List[Dict]) -> List[Dict]:
        """Merge formula_number region into adjacent formula with \\tag{}."""
        if not items:
            return items
        merged: List[Dict] = []
        skip: set = set()
        for i, block in enumerate(items):
            if i in skip:
                continue
            native = block.get("native_label", "")
            # Case 1: formula_number then formula
            if native == "formula_number":
                if i + 1 < len(items) and items[i + 1].get("label") == "formula":
                    num_clean = clean_formula_number(block.get("content", "").strip())
                    formula_content = items[i + 1].get("content", "")
                    merged_block = deepcopy(items[i + 1])
                    if formula_content.endswith("\n$$"):
                        merged_block["content"] = formula_content[:-3] + f" \\tag{{{num_clean}}}\n$$"
                    merged.append(merged_block)
                    skip.add(i + 1)
                continue  # always skip the formula_number block itself
            # Case 2: formula then formula_number
            if block.get("label") == "formula":
                if i + 1 < len(items) and items[i + 1].get("native_label") == "formula_number":
                    num_clean = clean_formula_number(items[i + 1].get("content", "").strip())
                    formula_content = block.get("content", "")
                    merged_block = deepcopy(block)
                    if formula_content.endswith("\n$$"):
                        merged_block["content"] = formula_content[:-3] + f" \\tag{{{num_clean}}}\n$$"
                    merged.append(merged_block)
                    skip.add(i + 1)
                    continue
            merged.append(block)
        for i, block in enumerate(merged):
            block["index"] = i
        return merged
    def _merge_text_blocks(self, items: List[Dict]) -> List[Dict]:
        """Merge hyphenated text blocks when the combined word is valid (wordfreq)."""
        if not items or not _WORDFREQ_AVAILABLE:
            return items
        merged: List[Dict] = []
        skip: set = set()
        for i, block in enumerate(items):
            if i in skip:
                continue
            if block.get("label") != "text":
                merged.append(block)
                continue
            content = block.get("content", "")
            if not isinstance(content, str) or not content.rstrip().endswith("-"):
                merged.append(block)
                continue
            content_stripped = content.rstrip()
            did_merge = False
            for j in range(i + 1, len(items)):
                if items[j].get("label") != "text":
                    continue
                next_content = items[j].get("content", "")
                if not isinstance(next_content, str):
                    continue
                next_stripped = next_content.lstrip()
                if next_stripped and next_stripped[0].islower():
                    words_before = content_stripped[:-1].split()
                    next_words = next_stripped.split()
                    if words_before and next_words:
                        merged_word = words_before[-1] + next_words[0]
                        if zipf_frequency(merged_word.lower(), "en") >= 2.5:
                            merged_block = deepcopy(block)
                            merged_block["content"] = content_stripped[:-1] + next_content.lstrip()
                            merged.append(merged_block)
                            skip.add(j)
                            did_merge = True
                break
            if not did_merge:
                merged.append(block)
        for i, block in enumerate(merged):
            block["index"] = i
        return merged
    def _format_bullet_points(self, items: List[Dict], left_align_threshold: float = 10.0) -> List[Dict]:
        """Add missing bullet prefix when a text block is sandwiched between two bullet items."""
        if len(items) < 3:
            return items
        for i in range(1, len(items) - 1):
            cur = items[i]
            prev = items[i - 1]
            nxt = items[i + 1]
            if cur.get("native_label") != "text":
                continue
            if prev.get("native_label") != "text" or nxt.get("native_label") != "text":
                continue
            cur_content = cur.get("content", "")
            if cur_content.startswith("- "):
                continue
            prev_content = prev.get("content", "")
            nxt_content = nxt.get("content", "")
            if not (prev_content.startswith("- ") and nxt_content.startswith("- ")):
                continue
            cur_bbox = cur.get("bbox_2d", [])
            prev_bbox = prev.get("bbox_2d", [])
            nxt_bbox = nxt.get("bbox_2d", [])
            if not (cur_bbox and prev_bbox and nxt_bbox):
                continue
            if (
                abs(cur_bbox[0] - prev_bbox[0]) <= left_align_threshold
                and abs(cur_bbox[0] - nxt_bbox[0]) <= left_align_threshold
            ):
                cur["content"] = "- " + cur_content
        return items
--- a/app/services/image_processor.py
+++ b/app/services/image_processor.py
@@ -104,7 +104,8 @@ class ImageProcessor:
        """Add whitespace padding around the image.
        Adds padding equal to padding_ratio * max(height, width) on each side.
-        This expands the image by approximately 30% total (15% on each side).
+        For small images (height < 80 or width < 500), uses reduced padding_ratio 0.2.
        This expands the image by approximately 30% total (15% on each side) for normal images.
        Args:
            image: Input image as numpy array in BGR format.
@@ -113,7 +114,9 @@ class ImageProcessor:
            Padded image as numpy array.
        """
        height, width = image.shape[:2]
-        padding = int(max(height, width) * self.padding_ratio)
+        # Use smaller padding ratio for small images to preserve detail
        padding_ratio = 0.2 if height < 80 or width < 500 else self.padding_ratio
        padding = int(max(height, width) * padding_ratio)
        # Add white padding on all sides
        padded_image = cv2.copyMakeBorder(
--- a/app/services/layout_detector.py
+++ b/app/services/layout_detector.py
@@ -1,9 +1,10 @@
-"""PP-DocLayoutV2 wrapper for document layout detection."""
+"""PP-DocLayoutV3 wrapper for document layout detection."""
 import numpy as np
 from app.schemas.image import LayoutInfo, LayoutRegion
 from app.core.config import get_settings
 from app.services.layout_postprocess import apply_layout_postprocess
 from paddleocr import LayoutDetection
 from typing import Optional
@@ -65,7 +66,9 @@ class LayoutDetector:
        # Formula types
        "display_formula": "formula",
        "inline_formula": "formula",
-        "formula_number": "formula",
+        # formula_number is a plain text annotation "(2.9)" next to a formula,
        # not a formula itself — use text prompt so vLLM returns plain text
        "formula_number": "text",
        # Table types
        "table": "table",
        # Figure types
@@ -87,11 +90,11 @@ class LayoutDetector:
    def _get_layout_detector(self):
        """Get or create LayoutDetection instance."""
        if LayoutDetector._layout_detector is None:
-            LayoutDetector._layout_detector = LayoutDetection(model_name="PP-DocLayoutV2")
+            LayoutDetector._layout_detector = LayoutDetection(model_name="PP-DocLayoutV3")
        return LayoutDetector._layout_detector
    def detect(self, image: np.ndarray) -> LayoutInfo:
-        """Detect layout of the image using PP-DocLayoutV2.
+        """Detect layout of the image using PP-DocLayoutV3.
        Args:
            image: Input image as numpy array.
@@ -116,6 +119,17 @@ class LayoutDetector:
        else:
            boxes = []
        # Apply GLM-OCR layout post-processing (NMS, containment, unclip, clamp)
        if boxes:
            h, w = image.shape[:2]
            boxes = apply_layout_postprocess(
                boxes,
                img_size=(w, h),
                layout_nms=True,
                layout_unclip_ratio=None,
                layout_merge_bboxes_mode="large",
            )
        for box in boxes:
            cls_id = box.get("cls_id")
            label = box.get("label") or self.CLS_ID_TO_LABEL.get(cls_id, "other")
@@ -125,15 +139,17 @@ class LayoutDetector:
            # Normalize label to region type
            region_type = self.LABEL_TO_TYPE.get(label, "text")
-            regions.append(LayoutRegion(
+            regions.append(
-                type=region_type,
+                LayoutRegion(
-                bbox=coordinate,
+                    type=region_type,
-                confidence=score,
+                    native_label=label,
-                score=score,
+                    bbox=coordinate,
-            ))
+                    confidence=score,
                    score=score,
                )
            )
-
+        mixed_recognition = any(region.type == "text" and region.score > 0.3 for region in regions)
        mixed_recognition = any(region.type == "text" and region.score > 0.85 for region in regions)
        return LayoutInfo(regions=regions, MixedRecognition=mixed_recognition)
@@ -144,14 +160,14 @@ if __name__ == "__main__":
    from app.services.image_processor import ImageProcessor
    from app.services.converter import Converter
    from app.services.ocr_service import OCRService
-    
+
    settings = get_settings()
-    
+
    # Initialize dependencies
    layout_detector = LayoutDetector()
    image_processor = ImageProcessor(padding_ratio=settings.image_padding_ratio)
    converter = Converter()
-    
+
    # Initialize OCR service
    ocr_service = OCRService(
        vl_server_url=settings.paddleocr_vl_url,
@@ -159,20 +175,20 @@ if __name__ == "__main__":
        image_processor=image_processor,
        converter=converter,
    )
-    
+
    # Load test image
-    image_path = "test/complex_formula.png"
+    image_path = "test/timeout.jpg"
    image = cv2.imread(image_path)
-    
+
    if image is None:
        print(f"Failed to load image: {image_path}")
    else:
        print(f"Image loaded: {image.shape}")
-        
+
        # Run OCR recognition
        result = ocr_service.recognize(image)
-        
+
        print("\n=== OCR Result ===")
        print(f"Markdown:\n{result['markdown']}")
        print(f"\nLaTeX:\n{result['latex']}")
-        print(f"\nMathML:\n{result['mathml']}")
+        print(f"\nMathML:\n{result['mathml']}")
--- a/app/services/layout_postprocess.py
+++ b/app/services/layout_postprocess.py
@@ -0,0 +1,343 @@
 """Layout post-processing utilities ported from GLM-OCR.
 Source: glm-ocr/glmocr/utils/layout_postprocess_utils.py
 Algorithms applied after PaddleOCR LayoutDetection.predict():
  1. NMS with dual IoU thresholds (same-class vs cross-class)
  2. Large-image-region filtering (remove image boxes that fill most of the page)
  3. Containment analysis (merge_bboxes_mode: keep large parent, remove contained child)
  4. Unclip ratio (optional bbox expansion)
  5. Invalid bbox skipping
 These steps run on top of PaddleOCR's built-in detection to replicate
 the quality of the GLM-OCR SDK's layout pipeline.
 """
 from __future__ import annotations
 from typing import Dict, List, Optional, Tuple, Union
 import numpy as np
 # ---------------------------------------------------------------------------
 # Primitive geometry helpers
 # ---------------------------------------------------------------------------
 def iou(box1: List[float], box2: List[float]) -> float:
    """Compute IoU of two bounding boxes [x1, y1, x2, y2]."""
    x1, y1, x2, y2 = box1
    x1_p, y1_p, x2_p, y2_p = box2
    x1_i = max(x1, x1_p)
    y1_i = max(y1, y1_p)
    x2_i = min(x2, x2_p)
    y2_i = min(y2, y2_p)
    inter_area = max(0, x2_i - x1_i + 1) * max(0, y2_i - y1_i + 1)
    box1_area = (x2 - x1 + 1) * (y2 - y1 + 1)
    box2_area = (x2_p - x1_p + 1) * (y2_p - y1_p + 1)
    return inter_area / float(box1_area + box2_area - inter_area)
 def is_contained(box1: List[float], box2: List[float], overlap_threshold: float = 0.8) -> bool:
    """Return True if box1 is contained within box2 (overlap ratio >= threshold).
    box format: [cls_id, score, x1, y1, x2, y2]
    """
    _, _, x1, y1, x2, y2 = box1
    _, _, x1_p, y1_p, x2_p, y2_p = box2
    box1_area = (x2 - x1) * (y2 - y1)
    if box1_area <= 0:
        return False
    xi1 = max(x1, x1_p)
    yi1 = max(y1, y1_p)
    xi2 = min(x2, x2_p)
    yi2 = min(y2, y2_p)
    inter_area = max(0, xi2 - xi1) * max(0, yi2 - yi1)
    return (inter_area / box1_area) >= overlap_threshold
 # ---------------------------------------------------------------------------
 # NMS
 # ---------------------------------------------------------------------------
 def nms(
    boxes: np.ndarray,
    iou_same: float = 0.6,
    iou_diff: float = 0.98,
 ) -> List[int]:
    """NMS with separate IoU thresholds for same-class and cross-class overlaps.
    Args:
        boxes: Array of shape (N, 6+) — [cls_id, score, x1, y1, x2, y2, ...].
        iou_same: Suppression threshold for boxes of the same class.
        iou_diff: Suppression threshold for boxes of different classes.
    Returns:
        List of kept row indices.
    """
    scores = boxes[:, 1]
    indices = np.argsort(scores)[::-1].tolist()
    selected: List[int] = []
    while indices:
        current = indices[0]
        selected.append(current)
        current_class = int(boxes[current, 0])
        current_coords = boxes[current, 2:6].tolist()
        indices = indices[1:]
        kept = []
        for i in indices:
            box_class = int(boxes[i, 0])
            box_coords = boxes[i, 2:6].tolist()
            threshold = iou_same if current_class == box_class else iou_diff
            if iou(current_coords, box_coords) < threshold:
                kept.append(i)
        indices = kept
    return selected
 # ---------------------------------------------------------------------------
 # Containment analysis
 # ---------------------------------------------------------------------------
 # Labels whose regions should never be removed even when contained in another box
 _PRESERVE_LABELS = {"image", "seal", "chart"}
 def check_containment(
    boxes: np.ndarray,
    preserve_cls_ids: Optional[set] = None,
    category_index: Optional[int] = None,
    mode: Optional[str] = None,
 ) -> Tuple[np.ndarray, np.ndarray]:
    """Compute containment flags for each box.
    Args:
        boxes: Array of shape (N, 6+) — [cls_id, score, x1, y1, x2, y2, ...].
        preserve_cls_ids: Class IDs that must never be marked as contained.
        category_index: If set, apply mode only relative to this class.
        mode: 'large' or 'small' (only used with category_index).
    Returns:
        (contains_other, contained_by_other): boolean arrays of length N.
    """
    n = len(boxes)
    contains_other = np.zeros(n, dtype=int)
    contained_by_other = np.zeros(n, dtype=int)
    for i in range(n):
        for j in range(n):
            if i == j:
                continue
            if preserve_cls_ids and int(boxes[i, 0]) in preserve_cls_ids:
                continue
            if category_index is not None and mode is not None:
                if mode == "large" and int(boxes[j, 0]) == category_index:
                    if is_contained(boxes[i].tolist(), boxes[j].tolist()):
                        contained_by_other[i] = 1
                        contains_other[j] = 1
                elif mode == "small" and int(boxes[i, 0]) == category_index:
                    if is_contained(boxes[i].tolist(), boxes[j].tolist()):
                        contained_by_other[i] = 1
                        contains_other[j] = 1
            else:
                if is_contained(boxes[i].tolist(), boxes[j].tolist()):
                    contained_by_other[i] = 1
                    contains_other[j] = 1
    return contains_other, contained_by_other
 # ---------------------------------------------------------------------------
 # Box expansion (unclip)
 # ---------------------------------------------------------------------------
 def unclip_boxes(
    boxes: np.ndarray,
    unclip_ratio: Union[float, Tuple[float, float], Dict, List, None],
 ) -> np.ndarray:
    """Expand bounding boxes by the given ratio.
    Args:
        boxes: Array of shape (N, 6+) — [cls_id, score, x1, y1, x2, y2, ...].
        unclip_ratio: Scalar, (w_ratio, h_ratio) tuple, or dict mapping cls_id to ratio.
    Returns:
        Expanded boxes array.
    """
    if unclip_ratio is None:
        return boxes
    if isinstance(unclip_ratio, dict):
        expanded = []
        for box in boxes:
            cls_id = int(box[0])
            if cls_id in unclip_ratio:
                w_ratio, h_ratio = unclip_ratio[cls_id]
                x1, y1, x2, y2 = box[2], box[3], box[4], box[5]
                cx, cy = (x1 + x2) / 2, (y1 + y2) / 2
                nw, nh = (x2 - x1) * w_ratio, (y2 - y1) * h_ratio
                new_box = list(box)
                new_box[2], new_box[3] = cx - nw / 2, cy - nh / 2
                new_box[4], new_box[5] = cx + nw / 2, cy + nh / 2
                expanded.append(new_box)
            else:
                expanded.append(list(box))
        return np.array(expanded)
    # Scalar or tuple
    if isinstance(unclip_ratio, (int, float)):
        unclip_ratio = (float(unclip_ratio), float(unclip_ratio))
    w_ratio, h_ratio = unclip_ratio[0], unclip_ratio[1]
    widths = boxes[:, 4] - boxes[:, 2]
    heights = boxes[:, 5] - boxes[:, 3]
    cx = boxes[:, 2] + widths / 2
    cy = boxes[:, 3] + heights / 2
    nw, nh = widths * w_ratio, heights * h_ratio
    expanded = boxes.copy().astype(float)
    expanded[:, 2] = cx - nw / 2
    expanded[:, 3] = cy - nh / 2
    expanded[:, 4] = cx + nw / 2
    expanded[:, 5] = cy + nh / 2
    return expanded
 # ---------------------------------------------------------------------------
 # Main entry-point
 # ---------------------------------------------------------------------------
 def apply_layout_postprocess(
    boxes: List[Dict],
    img_size: Tuple[int, int],
    layout_nms: bool = True,
    layout_unclip_ratio: Union[float, Tuple, Dict, None] = None,
    layout_merge_bboxes_mode: Union[str, Dict, None] = "large",
 ) -> List[Dict]:
    """Apply GLM-OCR layout post-processing to PaddleOCR detection results.
    Args:
        boxes: PaddleOCR output — list of dicts with keys:
               cls_id, label, score, coordinate ([x1, y1, x2, y2]).
        img_size: (width, height) of the image.
        layout_nms: Apply dual-threshold NMS.
        layout_unclip_ratio: Optional bbox expansion ratio.
        layout_merge_bboxes_mode: Containment mode — 'large' (default), 'small',
                                  'union', or per-class dict.
    Returns:
        Filtered and ordered list of box dicts in the same PaddleOCR format.
    """
    if not boxes:
        return boxes
    img_width, img_height = img_size
    # --- Build working array [cls_id, score, x1, y1, x2, y2] -------------- #
    arr_rows = []
    for b in boxes:
        cls_id = b.get("cls_id", 0)
        score = b.get("score", 0.0)
        x1, y1, x2, y2 = b.get("coordinate", [0, 0, 0, 0])
        arr_rows.append([cls_id, score, x1, y1, x2, y2])
    boxes_array = np.array(arr_rows, dtype=float)
    all_labels: List[str] = [b.get("label", "") for b in boxes]
    # 1. NMS ---------------------------------------------------------------- #
    if layout_nms and len(boxes_array) > 1:
        kept = nms(boxes_array, iou_same=0.6, iou_diff=0.98)
        boxes_array = boxes_array[kept]
        all_labels = [all_labels[k] for k in kept]
    # 2. Filter large image regions ---------------------------------------- #
    if len(boxes_array) > 1:
        img_area = img_width * img_height
        area_thres = 0.82 if img_width > img_height else 0.93
        image_cls_ids = {
            int(boxes_array[i, 0])
            for i, lbl in enumerate(all_labels)
            if lbl == "image"
        }
        keep_mask = np.ones(len(boxes_array), dtype=bool)
        for i, lbl in enumerate(all_labels):
            if lbl == "image":
                x1, y1, x2, y2 = boxes_array[i, 2:6]
                x1 = max(0.0, x1); y1 = max(0.0, y1)
                x2 = min(float(img_width), x2); y2 = min(float(img_height), y2)
                if (x2 - x1) * (y2 - y1) > area_thres * img_area:
                    keep_mask[i] = False
        boxes_array = boxes_array[keep_mask]
        all_labels = [lbl for lbl, k in zip(all_labels, keep_mask) if k]
    # 3. Containment analysis (merge_bboxes_mode) -------------------------- #
    if layout_merge_bboxes_mode and len(boxes_array) > 1:
        preserve_cls_ids = {
            int(boxes_array[i, 0])
            for i, lbl in enumerate(all_labels)
            if lbl in _PRESERVE_LABELS
        }
        if isinstance(layout_merge_bboxes_mode, str):
            mode = layout_merge_bboxes_mode
            if mode in ("large", "small"):
                contains_other, contained_by_other = check_containment(
                    boxes_array, preserve_cls_ids
                )
                if mode == "large":
                    keep_mask = contained_by_other == 0
                else:
                    keep_mask = (contains_other == 0) | (contained_by_other == 1)
                boxes_array = boxes_array[keep_mask]
                all_labels = [lbl for lbl, k in zip(all_labels, keep_mask) if k]
        elif isinstance(layout_merge_bboxes_mode, dict):
            keep_mask = np.ones(len(boxes_array), dtype=bool)
            for category_index, mode in layout_merge_bboxes_mode.items():
                if mode in ("large", "small"):
                    contains_other, contained_by_other = check_containment(
                        boxes_array, preserve_cls_ids, int(category_index), mode
                    )
                    if mode == "large":
                        keep_mask &= contained_by_other == 0
                    else:
                        keep_mask &= (contains_other == 0) | (contained_by_other == 1)
            boxes_array = boxes_array[keep_mask]
            all_labels = [lbl for lbl, k in zip(all_labels, keep_mask) if k]
    if len(boxes_array) == 0:
        return []
    # 4. Unclip (bbox expansion) ------------------------------------------- #
    if layout_unclip_ratio is not None:
        boxes_array = unclip_boxes(boxes_array, layout_unclip_ratio)
    # 5. Clamp to image boundaries + skip invalid -------------------------- #
    result: List[Dict] = []
    for i, row in enumerate(boxes_array):
        cls_id = int(row[0])
        score = float(row[1])
        x1 = max(0.0, min(float(row[2]), img_width))
        y1 = max(0.0, min(float(row[3]), img_height))
        x2 = max(0.0, min(float(row[4]), img_width))
        y2 = max(0.0, min(float(row[5]), img_height))
        if x1 >= x2 or y1 >= y2:
            continue
        result.append({
            "cls_id": cls_id,
            "label": all_labels[i],
            "score": score,
            "coordinate": [int(x1), int(y1), int(x2), int(y2)],
        })
    return result
--- a/app/services/ocr_service.py
+++ b/app/services/ocr_service.py
@@ -1,19 +1,27 @@
 """PaddleOCR-VL client service for text and formula recognition."""
 import base64
 import logging
 import re
 import numpy as np
 import cv2
 import requests
 from io import BytesIO
 from app.core.config import get_settings
 from paddleocr import PaddleOCRVL
 from typing import Optional
 from app.services.layout_detector import LayoutDetector
 from app.services.image_processor import ImageProcessor
 from app.services.converter import Converter
 from abc import ABC, abstractmethod
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from io import BytesIO
 import cv2
 import numpy as np
 import requests
 from openai import OpenAI
 from paddleocr import PaddleOCRVL
 from PIL import Image as PILImage
 from app.core.config import get_settings
 from app.services.converter import Converter
 from app.services.glm_postprocess import GLMResultFormatter
 from app.services.image_processor import ImageProcessor
 from app.services.layout_detector import LayoutDetector
 settings = get_settings()
 logger = logging.getLogger(__name__)
 _COMMANDS_NEED_SPACE = {
    # operators / calculus
@@ -39,12 +47,23 @@ _COMMANDS_NEED_SPACE = {
    "log",
    "ln",
    "exp",
    # set relations (often glued by OCR)
    "in",
    "notin",
    "subset",
    "supset",
    "subseteq",
    "supseteq",
    "cap",
    "cup",
    # misc
    "partial",
    "nabla",
 }
 _MATH_SEGMENT_PATTERN = re.compile(r"\$\$.*?\$\$|\$.*?\$", re.DOTALL)
 # Match LaTeX commands: \command (greedy match all letters)
 # The splitting logic in _split_glued_command_token will handle \inX -> \in X
 _COMMAND_TOKEN_PATTERN = re.compile(r"\\[a-zA-Z]+")
 # stage2: differentials inside math segments
@@ -63,6 +82,7 @@ def _split_glued_command_token(token: str) -> str:
    Examples:
    - \\cdotdS -> \\cdot dS
    - \\intdx  -> \\int dx
    - \\inX    -> \\in X (stop at uppercase letter)
    """
    if not token.startswith("\\"):
        return token
@@ -72,8 +92,8 @@ def _split_glued_command_token(token: str) -> str:
        return token
    best = None
-    # longest prefix that is in whitelist
+    # Find longest prefix that is in whitelist
-    for i in range(1, len(body)):
+    for i in range(1, len(body) + 1):
        prefix = body[:i]
        if prefix in _COMMANDS_NEED_SPACE:
            best = prefix
@@ -90,42 +110,54 @@ def _split_glued_command_token(token: str) -> str:
 def _clean_latex_syntax_spaces(expr: str) -> str:
    """Clean unwanted spaces in LaTeX syntax (common OCR errors).
-    
+
    OCR often adds spaces in LaTeX syntax structures where they shouldn't be:
    - Subscripts: a _ {i 1} -> a_{i1}
    - Superscripts: x ^ {2 3} -> x^{23}
    - Fractions: \\frac { a } { b } -> \\frac{a}{b}
    - Commands: \\ alpha -> \\alpha
    - Braces: { a b } -> {ab} (within subscripts/superscripts)
-    
+
    This is safe because these spaces are always OCR errors - LaTeX doesn't
    need or want spaces in these positions.
-    
+
    Args:
        expr: LaTeX math expression.
-        
+
    Returns:
        Expression with LaTeX syntax spaces cleaned.
    """
    # Pattern 1: Spaces around _ and ^ (subscript/superscript operators)
    # a _ {i} -> a_{i}, x ^ {2} -> x^{2}
-    expr = re.sub(r'\s*_\s*', '_', expr)
+    expr = re.sub(r"\s*_\s*", "_", expr)
-    expr = re.sub(r'\s*\^\s*', '^', expr)
+    expr = re.sub(r"\s*\^\s*", "^", expr)
-    
+
    # Pattern 2: Spaces inside braces that follow _ or ^
    # _{i 1} -> _{i1}, ^{2 3} -> ^{23}
    # This is safe because spaces inside subscript/superscript braces are usually OCR errors
    # BUT: if content contains LaTeX commands (\in, \alpha, etc.), spaces after them
    # must be preserved as they serve as command terminators (\in X != \inX)
    def clean_subscript_superscript_braces(match):
        operator = match.group(1)  # _ or ^
-        content = match.group(2)   # content inside braces
+        content = match.group(2)  # content inside braces
-        # Remove spaces but preserve LaTeX commands (e.g., \alpha, \beta)
+        if "\\" not in content:
-        # Only remove spaces between non-backslash characters
+            # No LaTeX commands: safe to remove all spaces
-        cleaned = re.sub(r'(?<!\\)\s+(?!\\)', '', content)
+            cleaned = re.sub(r"\s+", "", content)
        else:
            # Contains LaTeX commands: remove spaces carefully
            # Keep spaces that follow a LaTeX command (e.g., \in X must keep the space)
            # Remove spaces everywhere else (e.g., x \in -> x\in is fine)
            # Strategy: remove spaces before \ and between non-command chars,
            # but preserve the space after \command when followed by a non-\ char
            cleaned = re.sub(r"\s+(?=\\)", "", content)  # remove space before \cmd
            cleaned = re.sub(
                r"(?<!\\)(?<![a-zA-Z])\s+", "", cleaned
            )  # remove space after non-letter non-\
        return f"{operator}{{{cleaned}}}"
-    
+
    # Match _{ ... } or ^{ ... }
-    expr = re.sub(r'([_^])\{([^}]+)\}', clean_subscript_superscript_braces, expr)
+    expr = re.sub(r"([_^])\{([^}]+)\}", clean_subscript_superscript_braces, expr)
-    
+
    # Pattern 3: Spaces inside \frac arguments
    # \frac { a } { b } -> \frac{a}{b}
    # \frac{ a + b }{ c } -> \frac{a+b}{c}
@@ -133,47 +165,46 @@ def _clean_latex_syntax_spaces(expr: str) -> str:
        numerator = match.group(1).strip()
        denominator = match.group(2).strip()
        return f"\\frac{{{numerator}}}{{{denominator}}}"
-    
+
-    expr = re.sub(r'\\frac\s*\{\s*([^}]+?)\s*\}\s*\{\s*([^}]+?)\s*\}', 
+    expr = re.sub(r"\\frac\s*\{\s*([^}]+?)\s*\}\s*\{\s*([^}]+?)\s*\}", clean_frac_braces, expr)
-                  clean_frac_braces, expr)
+
    # Pattern 4: Spaces after backslash in LaTeX commands
    # \ alpha -> \alpha, \ beta -> \beta
-    expr = re.sub(r'\\\s+([a-zA-Z]+)', r'\\\1', expr)
+    expr = re.sub(r"\\\s+([a-zA-Z]+)", r"\\\1", expr)
-    
+
    # Pattern 5: Spaces before/after braces in general contexts (conservative)
    # Only remove if the space is clearly wrong (e.g., after operators)
    # { x } in standalone context is kept as-is to avoid breaking valid spacing
    # But after operators like \sqrt{ x } -> \sqrt{x}
-    expr = re.sub(r'(\\[a-zA-Z]+)\s*\{\s*', r'\1{', expr)  # \sqrt { -> \sqrt{
+    expr = re.sub(r"(\\[a-zA-Z]+)\s*\{\s*", r"\1{", expr)  # \sqrt { -> \sqrt{
-    
+
    return expr
 def _postprocess_math(expr: str) -> str:
    """Postprocess a *math* expression (already inside $...$ or $$...$$).
-    
+
    Processing stages:
    0. Fix OCR number errors (spaces in numbers)
-    1. Split glued LaTeX commands (e.g., \\cdotdS -> \\cdot dS)
+    1. Split glued LaTeX commands (e.g., \\cdotdS -> \\cdot dS, \\inX -> \\in X)
    2. Clean LaTeX syntax spaces (e.g., a _ {i 1} -> a_{i1})
    3. Normalize differentials (DISABLED by default to avoid breaking variables)
-    
+
    Args:
        expr: LaTeX math expression without delimiters.
-        
+
    Returns:
        Processed LaTeX expression.
    """
    # stage0: fix OCR number errors (digits with spaces)
    expr = _fix_ocr_number_errors(expr)
-    
+
-    # stage1: split glued command tokens (e.g. \cdotdS)
+    # stage1: split glued command tokens (e.g. \cdotdS, \inX)
    expr = _COMMAND_TOKEN_PATTERN.sub(lambda m: _split_glued_command_token(m.group(0)), expr)
-    
+
    # stage2: clean LaTeX syntax spaces (OCR often adds unwanted spaces)
    expr = _clean_latex_syntax_spaces(expr)
-    
+
    # stage3: normalize differentials - DISABLED
    # This feature is disabled because it's too aggressive and can break:
    # - LaTeX commands containing 'd': \vdots, \lambda (via subscripts), \delta, etc.
@@ -186,40 +217,36 @@ def _postprocess_math(expr: str) -> str:
    #
    # If differential normalization is needed, implement a context-aware version:
    # expr = _normalize_differentials_contextaware(expr)
-    
+
    return expr
 def _normalize_differentials_contextaware(expr: str) -> str:
    """Context-aware differential normalization (optional, not used by default).
-    
+
    Only normalizes differentials in specific mathematical contexts:
    1. After integral symbols: \\int dx, \\iint dA, \\oint dr
    2. In fraction denominators: \\frac{dy}{dx}
    3. In explicit differential notation: f(x)dx (function followed by differential)
-    
+
    This avoids false positives like variable names, subscripts, or LaTeX commands.
-    
+
    Args:
        expr: LaTeX math expression.
-        
+
    Returns:
        Expression with differentials normalized in safe contexts only.
    """
    # Pattern 1: After integral commands
    # \int dx -> \int d x
-    integral_pattern = re.compile(
+    integral_pattern = re.compile(r"(\\i+nt|\\oint)\s*([^\\]*?)\s*d([a-zA-Z])(?![a-zA-Z])")
-        r'(\\i+nt|\\oint)\s*([^\\]*?)\s*d([a-zA-Z])(?![a-zA-Z])'
+    expr = integral_pattern.sub(r"\1 \2 d \3", expr)
-    )
+
    expr = integral_pattern.sub(r'\1 \2 d \3', expr)
    # Pattern 2: In fraction denominators
    # \frac{...}{dx} -> \frac{...}{d x}
-    frac_pattern = re.compile(
+    frac_pattern = re.compile(r"(\\frac\{[^}]*\}\{[^}]*?)d([a-zA-Z])(?![a-zA-Z])([^}]*\})")
-        r'(\\frac\{[^}]*\}\{[^}]*?)d([a-zA-Z])(?![a-zA-Z])([^}]*\})'
+    expr = frac_pattern.sub(r"\1d \2\3", expr)
-    )
+
    expr = frac_pattern.sub(r'\1d \2\3', expr)
    return expr
@@ -241,21 +268,21 @@ def _fix_ocr_number_errors(expr: str) -> str:
    """
    # Fix pattern 1: "digit space digit(s). digit(s)" → "digit digit(s).digit(s)"
    # Example: "2 2. 2" → "22.2"
-    expr = re.sub(r'(\d)\s+(\d+)\.\s*(\d+)', r'\1\2.\3', expr)
+    expr = re.sub(r"(\d)\s+(\d+)\.\s*(\d+)", r"\1\2.\3", expr)
-    
+
    # Fix pattern 2: "digit(s). space digit(s)" → "digit(s).digit(s)"
    # Example: "22. 2" → "22.2"
-    expr = re.sub(r'(\d+)\.\s+(\d+)', r'\1.\2', expr)
+    expr = re.sub(r"(\d+)\.\s+(\d+)", r"\1.\2", expr)
-    
+
    # Fix pattern 3: "digit space digit" (no decimal point, within same number context)
    # Be careful: only merge if followed by decimal point or comma/end
    # Example: "1 5 0" → "150" when followed by comma or end
-    expr = re.sub(r'(\d)\s+(\d)(?=\s*[,\)]|$)', r'\1\2', expr)
+    expr = re.sub(r"(\d)\s+(\d)(?=\s*[,\)]|$)", r"\1\2", expr)
-    
+
    # Fix pattern 4: Multiple spaces in decimal numbers
    # Example: "2  2  .  2" → "22.2"
-    expr = re.sub(r'(\d)\s+(\d)(?=\s*\.)', r'\1\2', expr)
+    expr = re.sub(r"(\d)\s+(\d)(?=\s*\.)", r"\1\2", expr)
-    
+
    return expr
@@ -272,7 +299,87 @@ def _postprocess_markdown(markdown_content: str) -> str:
            return f"${_postprocess_math(seg[1:-1])}$"
        return seg
-    return _MATH_SEGMENT_PATTERN.sub(_fix_segment, markdown_content)
+    markdown_content = _MATH_SEGMENT_PATTERN.sub(_fix_segment, markdown_content)
    # Apply markdown-level postprocessing (after LaTeX processing)
    markdown_content = _remove_false_heading_from_single_formula(markdown_content)
    return markdown_content
 def _remove_false_heading_from_single_formula(markdown_content: str) -> str:
    """Remove false heading markers from single-formula content.
    OCR sometimes incorrectly identifies a single formula as a heading by adding '#' prefix.
    This function detects and removes the heading marker when:
    1. The content contains only one formula (display or inline)
    2. The formula line starts with '#' (heading marker)
    3. No other non-formula text content exists
    Examples:
        Input:  "# $$E = mc^2$$"
        Output: "$$E = mc^2$$"
        Input:  "# $x = y$"
        Output: "$x = y$"
        Input:  "# Introduction\n$$E = mc^2$$"  (has text, keep heading)
        Output: "# Introduction\n$$E = mc^2$$"
    Args:
        markdown_content: Markdown text with potential false headings.
    Returns:
        Markdown text with false heading markers removed.
    """
    if not markdown_content or not markdown_content.strip():
        return markdown_content
    lines = markdown_content.split("\n")
    # Count formulas and heading lines
    formula_count = 0
    heading_lines = []
    has_non_formula_text = False
    for i, line in enumerate(lines):
        line_stripped = line.strip()
        if not line_stripped:
            continue
        # Check if line starts with heading marker
        heading_match = re.match(r"^(#{1,6})\s+(.+)$", line_stripped)
        if heading_match:
            heading_level = heading_match.group(1)
            content = heading_match.group(2)
            # Check if the heading content is a formula
            if re.fullmatch(r"\$\$?.+\$\$?", content):
                # This is a heading with a formula
                heading_lines.append((i, heading_level, content))
                formula_count += 1
            else:
                # This is a real heading with text
                has_non_formula_text = True
        elif re.fullmatch(r"\$\$?.+\$\$?", line_stripped):
            # Standalone formula line (not in a heading)
            formula_count += 1
        elif line_stripped and not re.match(r"^#+\s*$", line_stripped):
            # Non-empty, non-heading, non-formula line
            has_non_formula_text = True
    # Only remove heading markers if:
    # 1. There's exactly one formula
    # 2. That formula is in a heading line
    # 3. There's no other text content
    if formula_count == 1 and len(heading_lines) == 1 and not has_non_formula_text:
        # Remove the heading marker from the formula
        line_idx, heading_level, formula_content = heading_lines[0]
        lines[line_idx] = formula_content
    return "\n".join(lines)
 class OCRServiceBase(ABC):
@@ -284,8 +391,8 @@ class OCRServiceBase(ABC):
 class OCRService(OCRServiceBase):
    """Service for OCR using PaddleOCR-VL."""
-    _pipeline: Optional[PaddleOCRVL] = None
+    _pipeline: PaddleOCRVL | None = None
-    _layout_detector: Optional[LayoutDetector] = None
+    _layout_detector: LayoutDetector | None = None
    def __init__(
        self,
@@ -404,44 +511,213 @@ class OCRService(OCRServiceBase):
            return self._recognize_formula(image)
 class GLMOCRService(OCRServiceBase):
    """Service for OCR using GLM-4V model via vLLM."""
    def __init__(
        self,
        vl_server_url: str,
        image_processor: ImageProcessor,
        converter: Converter,
    ):
        """Initialize GLM OCR service.
        Args:
            vl_server_url: URL of the vLLM server for GLM-4V (default: http://127.0.0.1:8002/v1).
            image_processor: Image processor instance.
            converter: Converter instance for format conversion.
        """
        self.vl_server_url = vl_server_url or settings.glm_ocr_url
        self.image_processor = image_processor
        self.converter = converter
        self.openai_client = OpenAI(api_key="EMPTY", base_url=self.vl_server_url, timeout=3600)
    def _recognize_formula(self, image: np.ndarray) -> dict:
        """Recognize formula/math content using GLM-4V.
        Args:
            image: Input image as numpy array in BGR format.
        Returns:
            Dict with 'latex', 'markdown', 'mathml', 'mml' keys.
        Raises:
            RuntimeError: If recognition fails (preserves original exception for fallback handling).
        """
        # Add padding to image
        padded_image = self.image_processor.add_padding(image)
        # Encode image to base64
        success, encoded_image = cv2.imencode(".png", padded_image)
        if not success:
            raise RuntimeError("Failed to encode image")
        image_base64 = base64.b64encode(encoded_image.tobytes()).decode("utf-8")
        image_url = f"data:image/png;base64,{image_base64}"
        # Call OpenAI-compatible API with formula recognition prompt
        prompt = "Formula Recognition:"
        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image_url", "image_url": {"url": image_url}},
                    {"type": "text", "text": prompt},
                ],
            }
        ]
        # Don't catch exceptions here - let them propagate for fallback handling
        response = self.openai_client.chat.completions.create(
            model="glm-ocr",
            messages=messages,
            temperature=0.0,
        )
        markdown_content = response.choices[0].message.content
        # Process LaTeX delimiters
        if markdown_content.startswith(r"\[") or markdown_content.startswith(r"\("):
            markdown_content = markdown_content.replace(r"\[", "$$").replace(r"\(", "$$")
            markdown_content = markdown_content.replace(r"\]", "$$").replace(r"\)", "$$")
        elif not markdown_content.startswith("$$") and not markdown_content.startswith("$"):
            markdown_content = f"$${markdown_content}$$"
        # Apply postprocessing
        markdown_content = _postprocess_markdown(markdown_content)
        convert_result = self.converter.convert_to_formats(markdown_content)
        return {
            "latex": convert_result.latex,
            "mathml": convert_result.mathml,
            "mml": convert_result.mml,
            "markdown": markdown_content,
        }
    def recognize(self, image: np.ndarray) -> dict:
        """Recognize content using GLM-4V.
        Args:
            image: Input image as numpy array in BGR format.
        Returns:
            Dict with 'latex', 'markdown', 'mathml', 'mml' keys.
        """
        return self._recognize_formula(image)
 class MineruOCRService(OCRServiceBase):
    """Service for OCR using local file_parse API."""
    def __init__(
        self,
        api_url: str = "http://127.0.0.1:8000/file_parse",
-        image_processor: Optional[ImageProcessor] = None,
+        image_processor: ImageProcessor | None = None,
-        converter: Optional[Converter] = None,
+        converter: Converter | None = None,
        glm_ocr_url: str = "http://localhost:8002/v1",
        layout_detector: LayoutDetector | None = None,
    ):
        """Initialize Local API service.
        Args:
            api_url: URL of the local file_parse API endpoint.
            converter: Optional converter instance for format conversion.
            glm_ocr_url: URL of the GLM-OCR vLLM server.
        """
        self.api_url = api_url
        self.image_processor = image_processor
        self.converter = converter
        self.glm_ocr_url = glm_ocr_url
        self.openai_client = OpenAI(api_key="EMPTY", base_url=glm_ocr_url, timeout=3600)
-    def recognize(self, image: np.ndarray) -> dict:
+    def _recognize_formula_with_paddleocr_vl(
-        """Recognize content using local file_parse API.
+        self, image: np.ndarray, prompt: str = "Formula Recognition:"
    ) -> str:
        """Recognize formula using PaddleOCR-VL API.
        Args:
            image: Input image as numpy array in BGR format.
            prompt: Recognition prompt (default: "Formula Recognition:")
        Returns:
            Recognized formula text (LaTeX format).
        """
        try:
            # Encode image to base64
            success, encoded_image = cv2.imencode(".png", image)
            if not success:
                raise RuntimeError("Failed to encode image")
            image_base64 = base64.b64encode(encoded_image.tobytes()).decode("utf-8")
            image_url = f"data:image/png;base64,{image_base64}"
            # Call OpenAI-compatible API
            messages = [
                {
                    "role": "user",
                    "content": [
                        {"type": "image_url", "image_url": {"url": image_url}},
                        {"type": "text", "text": prompt},
                    ],
                }
            ]
            response = self.openai_client.chat.completions.create(
                model="glm-ocr",
                messages=messages,
                temperature=0.0,
            )
            return response.choices[0].message.content
        except Exception as e:
            raise RuntimeError(f"PaddleOCR-VL formula recognition failed: {e}") from e
    def _extract_and_recognize_formulas(
        self, markdown_content: str, original_image: np.ndarray
    ) -> str:
        """Extract image references from markdown and recognize formulas.
        Args:
            markdown_content: Markdown content with potential image references.
            original_image: Original input image.
        Returns:
            Markdown content with formulas recognized by PaddleOCR-VL.
        """
        # Pattern to match image references: ![](images/xxx.png) or ![](images/xxx.jpg)
        image_pattern = re.compile(r"!\[\]\(images/[^)]+\)")
        if not image_pattern.search(markdown_content):
            return markdown_content
        formula_text = self._recognize_formula_with_paddleocr_vl(original_image)
        if formula_text.startswith(r"\[") or formula_text.startswith(r"\("):
            formula_text = formula_text.replace(r"\[", "$$").replace(r"\(", "$$")
            formula_text = formula_text.replace(r"\]", "$$").replace(r"\)", "$$")
        elif not formula_text.startswith("$$") and not formula_text.startswith("$"):
            formula_text = f"$${formula_text}$$"
        return formula_text
    def recognize(self, image_bytes: BytesIO) -> dict:
        """Recognize content using local file_parse API.
        Args:
            image_bytes: Input image as BytesIO object (already encoded as PNG).
        Returns:
            Dict with 'markdown', 'latex', 'mathml' keys.
        """
        try:
-            if self.image_processor:
+            # Decode image_bytes to numpy array for potential formula recognition
-                image = self.image_processor.add_padding(image)
+            image_bytes.seek(0)
            image_data = np.frombuffer(image_bytes.read(), dtype=np.uint8)
            original_image = cv2.imdecode(image_data, cv2.IMREAD_COLOR)
-            # Convert numpy array to image bytes
+            # Reset image_bytes for API request
-            success, encoded_image = cv2.imencode(".png", image)
+            image_bytes.seek(0)
            if not success:
                raise RuntimeError("Failed to encode image")
            image_bytes = BytesIO(encoded_image.tobytes())
            # Prepare multipart form data
            files = {"files": ("image.png", image_bytes, "image/png")}
@@ -464,7 +740,13 @@ class MineruOCRService(OCRServiceBase):
            }
            # Make API request
-            response = requests.post(self.api_url, files=files, data=data, headers={"accept": "application/json"}, timeout=30)
+            response = requests.post(
                self.api_url,
                files=files,
                data=data,
                headers={"accept": "application/json"},
                timeout=30,
            )
            response.raise_for_status()
            result = response.json()
@@ -474,6 +756,11 @@ class MineruOCRService(OCRServiceBase):
            if "results" in result and "image" in result["results"]:
                markdown_content = result["results"]["image"].get("md_content", "")
            if "![](images/" in markdown_content:
                markdown_content = self._extract_and_recognize_formulas(
                    markdown_content, original_image
                )
            # Apply postprocessing to fix OCR errors
            markdown_content = _postprocess_markdown(markdown_content)
@@ -500,9 +787,195 @@ class MineruOCRService(OCRServiceBase):
            raise RuntimeError(f"Recognition failed: {e}") from e
 # Task-specific prompts (from GLM-OCR SDK config.yaml)
 _TASK_PROMPTS: dict[str, str] = {
    "text": "Text Recognition:",
    "formula": "Formula Recognition:",
    "table": "Table Recognition:",
 }
 _DEFAULT_PROMPT = (
    "Recognize the text in the image and output in Markdown format. "
    "Preserve the original layout (headings/paragraphs/tables/formulas). "
    "Do not fabricate content that does not exist in the image."
 )
 class GLMOCREndToEndService(OCRServiceBase):
    """End-to-end OCR using GLM-OCR pipeline: layout detection → per-region OCR.
    Pipeline:
      1. Add padding (ImageProcessor)
      2. Detect layout regions (LayoutDetector → PP-DocLayoutV3)
      3. Crop each region and call vLLM with a task-specific prompt (parallel)
      4. GLMResultFormatter: clean, format titles/bullets/formulas, merge tags
      5. _postprocess_markdown: LaTeX math error correction
      6. Converter: markdown → latex/mathml/mml
    This replaces both GLMOCRService (formula-only) and MineruOCRService (mixed).
    """
    def __init__(
        self,
        vl_server_url: str,
        image_processor: ImageProcessor,
        converter: Converter,
        layout_detector: LayoutDetector,
        max_workers: int = 8,
    ):
        self.vl_server_url = vl_server_url or settings.glm_ocr_url
        self.image_processor = image_processor
        self.converter = converter
        self.layout_detector = layout_detector
        self.max_workers = max_workers
        self.openai_client = OpenAI(api_key="EMPTY", base_url=self.vl_server_url, timeout=3600)
        self._formatter = GLMResultFormatter()
    def _encode_region(self, image: np.ndarray) -> str:
        """Convert BGR numpy array to base64 JPEG string."""
        rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        pil_img = PILImage.fromarray(rgb)
        buf = BytesIO()
        pil_img.save(buf, format="JPEG")
        return base64.b64encode(buf.getvalue()).decode("utf-8")
    def _call_vllm(self, image: np.ndarray, prompt: str) -> str:
        """Send image + prompt to vLLM and return raw content string."""
        img_b64 = self._encode_region(image)
        data_url = f"data:image/jpeg;base64,{img_b64}"
        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image_url", "image_url": {"url": data_url}},
                    {"type": "text", "text": prompt},
                ],
            }
        ]
        response = self.openai_client.chat.completions.create(
            model="glm-ocr",
            messages=messages,
            temperature=0.01,
            max_tokens=settings.max_tokens,
        )
        return response.choices[0].message.content.strip()
    def _normalize_bbox(self, bbox: list[float], img_w: int, img_h: int) -> list[int]:
        """Convert pixel bbox [x1,y1,x2,y2] to 0-1000 normalised coords."""
        x1, y1, x2, y2 = bbox
        return [
            int(x1 / img_w * 1000),
            int(y1 / img_h * 1000),
            int(x2 / img_w * 1000),
            int(y2 / img_h * 1000),
        ]
    def recognize(self, image: np.ndarray) -> dict:
        """Full pipeline: padding → layout → per-region OCR → postprocess → markdown.
        Args:
            image: Input image as numpy array in BGR format.
        Returns:
            Dict with 'markdown', 'latex', 'mathml', 'mml' keys.
        """
        # 1. Padding
        padded = self.image_processor.add_padding(image)
        img_h, img_w = padded.shape[:2]
        # 2. Layout detection
        layout_info = self.layout_detector.detect(padded)
        # Sort regions in reading order: top-to-bottom, left-to-right
        layout_info.regions.sort(key=lambda r: (r.bbox[1], r.bbox[0]))
        # 3. OCR: per-region (parallel) or full-image fallback
        if not layout_info.regions:
            # No layout detected → assume it's a formula, use formula recognition
            logger.info("No layout regions detected, treating image as formula")
            raw_content = self._call_vllm(padded, _TASK_PROMPTS["formula"])
            # Format as display formula markdown
            formatted_content = raw_content.strip()
            if not (formatted_content.startswith("$$") and formatted_content.endswith("$$")):
                formatted_content = f"$$\n{formatted_content}\n$$"
            markdown_content = formatted_content
        else:
            # Build task list for non-figure regions
            tasks = []
            for idx, region in enumerate(layout_info.regions):
                if region.type == "figure":
                    continue
                x1, y1, x2, y2 = (int(c) for c in region.bbox)
                cropped = padded[y1:y2, x1:x2]
                if cropped.size == 0 or cropped.shape[0] < 10 or cropped.shape[1] < 10:
                    logger.warning(
                        "Skipping region idx=%d (label=%s): crop too small %s",
                        idx,
                        region.native_label,
                        cropped.shape[:2],
                    )
                    continue
                prompt = _TASK_PROMPTS.get(region.type, _DEFAULT_PROMPT)
                tasks.append((idx, region, cropped, prompt))
            if not tasks:
                raw_content = self._call_vllm(padded, _DEFAULT_PROMPT)
                markdown_content = self._formatter._clean_content(raw_content)
            else:
                # Parallel OCR calls
                raw_results: dict[int, str] = {}
                with ThreadPoolExecutor(max_workers=min(self.max_workers, len(tasks))) as ex:
                    future_map = {
                        ex.submit(self._call_vllm, cropped, prompt): idx
                        for idx, region, cropped, prompt in tasks
                    }
                    for future in as_completed(future_map):
                        idx = future_map[future]
                        try:
                            raw_results[idx] = future.result()
                        except Exception as e:
                            logger.warning("vLLM call failed for region idx=%d: %s", idx, e)
                            raw_results[idx] = ""
                # Build structured region dicts for GLMResultFormatter
                region_dicts = []
                for idx, region, _cropped, _prompt in tasks:
                    region_dicts.append(
                        {
                            "index": idx,
                            "label": region.type,
                            "native_label": region.native_label,
                            "content": raw_results.get(idx, ""),
                            "bbox_2d": self._normalize_bbox(region.bbox, img_w, img_h),
                        }
                    )
                # 4. GLM-OCR postprocessing: clean, format, merge, bullets
                markdown_content = self._formatter.process(region_dicts)
        # 5. LaTeX math error correction (our existing pipeline)
        markdown_content = _postprocess_markdown(markdown_content)
        # 6. Format conversion
        latex, mathml, mml = "", "", ""
        if markdown_content and self.converter:
            try:
                fmt = self.converter.convert_to_formats(markdown_content)
                latex, mathml, mml = fmt.latex, fmt.mathml, fmt.mml
            except RuntimeError as e:
                logger.warning("Format conversion failed, returning empty latex/mathml/mml: %s", e)
        return {"markdown": markdown_content, "latex": latex, "mathml": mathml, "mml": mml}
 if __name__ == "__main__":
    mineru_service = MineruOCRService()
-    image = cv2.imread("test/complex_formula.png")
+    image = cv2.imread("test/formula2.jpg")
    image_numpy = np.array(image)
-    ocr_result = mineru_service.recognize(image_numpy)
+    # Encode image to bytes (as done in API layer)
    success, encoded_image = cv2.imencode(".png", image_numpy)
    if not success:
        raise RuntimeError("Failed to encode image")
    image_bytes = BytesIO(encoded_image.tobytes())
    image_bytes.seek(0)
    ocr_result = mineru_service.recognize(image_bytes)
    print(ocr_result)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -17,6 +17,8 @@ services:
      # Mount pre-downloaded models (adjust paths as needed)
      - ./models/DocLayout:/app/models/DocLayout:ro
      - ./models/PP-DocLayout:/app/models/PP-DocLayout:ro
      # Mount logs directory to persist logs across container restarts
      - ./logs:/app/logs
    deploy:
      resources:
        reservations:
@@ -47,6 +49,8 @@ services:
    volumes:
      - ./models/DocLayout:/app/models/DocLayout:ro
      - ./models/PP-DocLayout:/app/models/PP-DocLayout:ro
      # Mount logs directory to persist logs across container restarts
      - ./logs:/app/logs
    profiles:
      - cpu
    restart: unless-stopped
--- a/docs/LATEX_POSTPROCESSING_COMPLETE.md
+++ b/docs/LATEX_POSTPROCESSING_COMPLETE.md
@@ -0,0 +1,380 @@
 # LaTeX 后处理完整方案总结
 ## 功能概述
 实现了一个安全、智能的 LaTeX 后处理管道，修复 OCR 识别的常见错误。
 ## 处理管道
 ```
 输入: a _ {i 1} + \ vdots
 ↓ Stage 0: 数字错误修复
  修复: 2 2. 2 → 22.2
  结果: a _ {i 1} + \ vdots
 ↓ Stage 1: 拆分粘连命令
  修复: \intdx → \int dx
  结果: a _ {i 1} + \vdots
 ↓ Stage 2: 清理 LaTeX 语法空格 ← 新增
  修复: a _ {i 1} → a_{i1}
  修复: \ vdots → \vdots
  结果: a_{i1}+\vdots
 ↓ Stage 3: 微分规范化 (已禁用)
  跳过
  结果: a_{i1}+\vdots
 输出: a_{i1}+\vdots ✅
 ```
 ## Stage 详解
 ### Stage 0: 数字错误修复 ✅
 **目的**: 修复 OCR 数字识别错误
 **示例**:
 - `2 2. 2` → `22.2`
 - `1 5 0` → `150`
 - `3 0. 4` → `30.4`
 **安全性**: ✅ 高（只处理数字和小数点）
 ---
 ### Stage 1: 拆分粘连命令 ✅
 **目的**: 修复 OCR 命令粘连错误
 **示例**:
 - `\intdx` → `\int dx`
 - `\cdotdS` → `\cdot dS`
 - `\sumdx` → `\sum dx`
 **方法**: 基于白名单的智能拆分
 **白名单**:
 ```python
 _COMMANDS_NEED_SPACE = {
    "cdot", "times", "div", "pm", "mp",
    "int", "iint", "iiint", "oint", "sum", "prod", "lim",
    "sin", "cos", "tan", "cot", "sec", "csc",
    "log", "ln", "exp",
    "partial", "nabla",
 }
 ```
 **安全性**: ✅ 高（白名单机制）
 ---
 ### Stage 2: 清理 LaTeX 语法空格 ✅ 新增
 **目的**: 清理 OCR 在 LaTeX 语法中插入的不必要空格
 **清理规则**:
 #### 1. 下标/上标操作符空格
 ```latex
 a _ {i 1}  →  a_{i1}
 x ^ {2 3}  →  x^{23}
 ```
 #### 2. 大括号内部空格（智能）
 ```latex
 a_{i 1}     →  a_{i1}       (移除空格)
 y_{\alpha}  →  y_{\alpha}   (保留命令)
 ```
 #### 3. 分式空格
 ```latex
 \frac { a } { b }  →  \frac{a}{b}
 ```
 #### 4. 命令反斜杠后空格
 ```latex
 \ alpha  →  \alpha
 \ beta   →  \beta
 ```
 #### 5. 命令后大括号前空格
 ```latex
 \sqrt { x }  →  \sqrt{x}
 \sin { x }   →  \sin{x}
 ```
 **安全性**: ✅ 高（只清理明确的语法位置）
 ---
 ### Stage 3: 微分规范化 ❌ 已禁用
 **原计划**: 规范化微分符号 `dx → d x`
 **为什么禁用**:
 - ❌ 无法区分微分和变量名
 - ❌ 会破坏 LaTeX 命令（`\vdots` → `\vd ots`）
 - ❌ 误判率太高
 - ✅ 收益小（`dx` 本身就是有效的 LaTeX）
 **状态**: 禁用，提供可选的上下文感知版本
 ---
 ## 解决的问题
 ### 问题 1: LaTeX 命令被拆分 ✅ 已解决
 **原问题**:
 ```latex
 \vdots     →  \vd ots      ❌
 \lambda_1  →  \lambd a_1   ❌
 ```
 **解决方案**: 禁用 Stage 3 微分规范化
 **结果**:
 ```latex
 \vdots     →  \vdots       ✅
 \lambda_1  →  \lambda_1    ✅
 ```
 ### 问题 2: 语法空格错误 ✅ 已解决
 **原问题**:
 ```latex
 a _ {i 1}  (OCR 识别结果)
 ```
 **解决方案**: 新增 Stage 2 空格清理
 **结果**:
 ```latex
 a _ {i 1}  →  a_{i1}  ✅
 ```
 ### 问题 3: Unicode 实体未转换 ✅ 已解决（之前）
 **原问题**:
 ```
 MathML 中 &#x03BB; 未转换为 λ
 ```
 **解决方案**: 扩展 Unicode 实体映射表
 **结果**:
 ```
 &#x03BB; → λ  ✅
 &#x022EE; → ⋮  ✅
 ```
 ---
 ## 完整测试用例
 ### 测试 1: 下标空格（用户需求）
 ```latex
 输入:  a _ {i 1}
 输出:  a_{i1}  ✅
 ```
 ### 测试 2: 上标空格
 ```latex
 输入:  x ^ {2 3}
 输出:  x^{23}  ✅
 ```
 ### 测试 3: 分式空格
 ```latex
 输入:  \frac { a } { b }
 输出:  \frac{a}{b}  ✅
 ```
 ### 测试 4: 命令空格
 ```latex
 输入:  \ alpha + \ beta
 输出:  \alpha+\beta  ✅
 ```
 ### 测试 5: LaTeX 命令保护
 ```latex
 输入:  \vdots
 输出:  \vdots  ✅ (不被破坏)
 输入:  \lambda_{1}
 输出:  \lambda_{1}  ✅ (不被破坏)
 ```
 ### 测试 6: 复杂组合
 ```latex
 输入:  \frac { a _ {i 1} } { \ sqrt { x ^ {2} } }
 输出:  \frac{a_{i1}}{\sqrt{x^{2}}}  ✅
 ```
 ---
 ## 安全性保证
 ### ✅ 保护机制
 1. **白名单机制** (Stage 1)
   - 只拆分已知命令
   - 不处理未知命令
 2. **语法位置检查** (Stage 2)
   - 只清理明确的语法位置
   - 不处理模糊的空格
 3. **命令保护** (Stage 2)
   - 保留反斜杠后的内容
   - 使用 `(?<!\\)` 负向后查找
 4. **禁用危险功能** (Stage 3)
   - 微分规范化已禁用
   - 避免误判
 ### ⚠️ 潜在边界情况
 #### 1. 运算符空格被移除
 ```latex
 输入:  a + b
 输出:  a+b  (空格被移除)
 ```
 **评估**: 可接受（LaTeX 渲染效果相同）
 #### 2. 命令间空格被移除
 ```latex
 输入:  \alpha \beta
 输出:  \alpha\beta  (空格被移除)
 ```
 **评估**: 可能需要调整（如果这是问题）
 **解决方案**（可选）:
 ```python
 # 保留命令后的空格
 expr = re.sub(r'(\\[a-zA-Z]+)\s+(\\[a-zA-Z]+)', r'\1 \2', expr)
 ```
 ---
 ## 性能分析
 | Stage | 操作数 | 时间估算 |
 |-------|-------|---------|
 | 0 | 4 个正则表达式 | < 0.5ms |
 | 1 | 1 个正则表达式 + 白名单查找 | < 1ms |
 | 2 | 5 个正则表达式 | < 1ms |
 | 3 | 已禁用 | 0ms |
 | **总计** | | **< 3ms** |
 **结论**: ✅ 性能影响可忽略
 ---
 ## 文档和工具
 ### 📄 文档
 1. `docs/LATEX_SPACE_CLEANING.md` - 空格清理详解
 2. `docs/LATEX_PROTECTION_FINAL_FIX.md` - 命令保护方案
 3. `docs/DISABLE_DIFFERENTIAL_NORMALIZATION.md` - 微分规范化禁用说明
 4. `docs/DIFFERENTIAL_PATTERN_BUG_FIX.md` - 初始 Bug 修复
 5. `docs/LATEX_RENDERING_FIX_REPORT.md` - Unicode 实体映射修复
 ### 🧪 测试工具
 1. `test_latex_space_cleaning.py` - 空格清理测试
 2. `test_disabled_differential_norm.py` - 微分规范化禁用测试
 3. `test_differential_bug_fix.py` - Bug 修复验证
 4. `diagnose_latex_rendering.py` - 渲染问题诊断
 ---
 ## 部署检查清单
 - [x] Stage 0: 数字错误修复 - 保留 ✅
 - [x] Stage 1: 拆分粘连命令 - 保留 ✅
 - [x] Stage 2: 清理语法空格 - **新增** ✅
 - [x] Stage 3: 微分规范化 - 禁用 ✅
 - [x] Unicode 实体映射 - 已扩展 ✅
 - [x] 代码无语法错误 - 已验证 ✅
 - [ ] 服务重启 - **待完成**
 - [ ] 功能测试 - **待完成**
 ---
 ## 部署步骤
 1. **✅ 代码已完成**
   - `app/services/ocr_service.py` 已更新
   - `app/services/converter.py` 已更新
 2. **✅ 测试准备**
   - 测试脚本已创建
   - 文档已完善
 3. **🔄 重启服务**
   ```bash
   # 重启 FastAPI 服务
   ```
 4. **🧪 功能验证**
   ```bash
   # 运行测试
   python test_latex_space_cleaning.py
   # 测试 API
   curl -X POST "http://localhost:8000/api/v1/image/ocr" \
     -H "Content-Type: application/json" \
     -d '{"image_base64": "...", "model_name": "paddle"}'
   ```
 5. **✅ 验证结果**
   - 检查 `a _ {i 1}` → `a_{i1}`
   - 检查 `\vdots` 不被破坏
   - 检查 `\lambda_{1}` 不被破坏
 ---
 ## 总结
 | 功能 | 状态 | 优先级 |
 |-----|------|--------|
 | 数字错误修复 | ✅ 保留 | 必需 |
 | 粘连命令拆分 | ✅ 保留 | 必需 |
 | **语法空格清理** | ✅ **新增** | **重要** |
 | 微分规范化 | ❌ 禁用 | 可选 |
 | LaTeX 命令保护 | ✅ 完成 | 必需 |
 | Unicode 实体映射 | ✅ 完成 | 必需 |
 ### 三大改进
 1. **禁用微分规范化** → 保护所有 LaTeX 命令
 2. **新增空格清理** → 修复 OCR 语法错误  
 3. **扩展 Unicode 映射** → 支持所有数学符号
 ### 设计原则
 ✅ **Do No Harm** - 不确定的不要改  
 ✅ **Fix Clear Errors** - 只修复明确的错误  
 ✅ **Whitelist Over Blacklist** - 基于白名单处理  
 ---
 ## 下一步
 **立即行动**:
 1. 重启服务
 2. 测试用户示例: `a _ {i 1}` → `a_{i1}`
 3. 验证 LaTeX 命令不被破坏
 **后续优化**（如需要）:
 1. 根据实际使用调整空格清理规则
 2. 收集更多 OCR 错误模式
 3. 添加配置选项（细粒度控制）
 🎉 **完成！现在的后处理管道既安全又智能！**
--- a/docs/REMOVE_FALSE_HEADING.md
+++ b/docs/REMOVE_FALSE_HEADING.md
@@ -0,0 +1,366 @@
 # 移除单公式假标题功能
 ## 功能概述
 OCR 识别时，有时会错误地将单个公式识别为标题格式（在公式前添加 `#`）。
 新增功能：自动检测并移除单公式内容的假标题标记。
 ## 问题背景
 ### OCR 错误示例
 当图片中只有一个数学公式时，OCR 可能错误识别为：
 ```markdown
 # $$E = mc^2$$
 ```
 但实际应该是：
 ```markdown
 $$E = mc^2$$
 ```
 ### 产生原因
 1. **视觉误判**: OCR 将公式的位置或样式误判为标题
 2. **布局分析错误**: 检测到公式居中或突出显示，误认为是标题
 3. **字体大小**: 大号公式被识别为标题级别的文本
 ## 解决方案
 ### 处理逻辑
 **移除标题标记的条件**（必须**同时满足**）:
 1. ✅ 内容中只有**一个公式**（display 或 inline）
 2. ✅ 该公式在以 `#` 开头的行（标题行）
 3. ✅ 没有其他文本内容（除了空行）
 **保留标题标记的情况**:
 1. ❌ 有真实的文本内容（如 `# Introduction`）
 2. ❌ 有多个公式
 3. ❌ 公式不在标题行
 ### 实现位置
 **文件**: `app/services/ocr_service.py`
 **函数**: `_remove_false_heading_from_single_formula()`
 **集成点**: 在 `_postprocess_markdown()` 的最后阶段
 ### 处理流程
 ```
 输入 Markdown
    ↓
 LaTeX 语法后处理
    ↓
 移除单公式假标题 ← 新增
    ↓
 输出 Markdown
 ```
 ## 使用示例
 ### 示例 1: 移除假标题 ✅
 ```markdown
 输入:  # $$E = mc^2$$
 输出:  $$E = mc^2$$
 说明:  只有一个公式且在标题中，移除 #
 ```
 ### 示例 2: 保留真标题 ❌
 ```markdown
 输入:  # Introduction
       $$E = mc^2$$
 输出:  # Introduction
       $$E = mc^2$$
 说明:  有文本内容，保留标题
 ```
 ### 示例 3: 多个公式 ❌
 ```markdown
 输入:  # $$x = y$$
       $$a = b$$
 输出:  # $$x = y$$
       $$a = b$$
 说明:  有多个公式，保留标题
 ```
 ### 示例 4: 无标题公式 →
 ```markdown
 输入:  $$E = mc^2$$
 输出:  $$E = mc^2$$
 说明:  本身就没有标题，无需修改
 ```
 ## 详细测试用例
 ### 类别 1: 应该移除标题 ✅
 | 输入 | 输出 | 说明 |
 |-----|------|------|
 | `# $$E = mc^2$$` | `$$E = mc^2$$` | 单个 display 公式 |
 | `# $x = y$` | `$x = y$` | 单个 inline 公式 |
 | `## $$\frac{a}{b}$$` | `$$\frac{a}{b}$$` | 二级标题 |
 | `### $$\lambda_{1}$$` | `$$\lambda_{1}$$` | 三级标题 |
 ### 类别 2: 应该保留标题（有文本） ❌
 | 输入 | 输出 | 说明 |
 |-----|------|------|
 | `# Introduction\n$$E = mc^2$$` | 不变 | 标题有文本 |
 | `# Title\nText\n$$x=y$$` | 不变 | 有段落文本 |
 | `$$E = mc^2$$\n# Summary` | 不变 | 后面有文本标题 |
 ### 类别 3: 应该保留标题（多个公式） ❌
 | 输入 | 输出 | 说明 |
 |-----|------|------|
 | `# $$x = y$$\n$$a = b$$` | 不变 | 两个公式 |
 | `$$x = y$$\n# $$a = b$$` | 不变 | 两个公式 |
 ### 类别 4: 无需修改 →
 | 输入 | 输出 | 说明 |
 |-----|------|------|
 | `$$E = mc^2$$` | 不变 | 无标题标记 |
 | `$x = y$` | 不变 | 无标题标记 |
 | 空字符串 | 不变 | 空内容 |
 ## 算法实现
 ### 步骤 1: 分析内容
 ```python
 for each line:
    if line starts with '#':
        if line content is a formula:
            count as heading_formula
        else:
            mark as has_text_content
    elif line is a formula:
        count as standalone_formula
    elif line has text:
        mark as has_text_content
 ```
 ### 步骤 2: 决策
 ```python
 if (total_formulas == 1 AND 
    heading_formulas == 1 AND 
    NOT has_text_content):
    remove heading marker
 else:
    keep as-is
 ```
 ### 步骤 3: 执行
 ```python
 if should_remove:
    replace "# $$formula$$" with "$$formula$$"
 ```
 ## 正则表达式说明
 ### 检测标题行
 ```python
 heading_match = re.match(r'^(#{1,6})\s+(.+)$', line_stripped)
 ```
 - `^(#{1,6})` - 1-6 个 `#` 符号（Markdown 标题级别）
 - `\s+` - 至少一个空格
 - `(.+)$` - 标题内容
 ### 检测公式
 ```python
 re.fullmatch(r'\$\$?.+\$\$?', content)
 ```
 - `\$\$?` - `$` 或 `$$`（inline 或 display）
 - `.+` - 公式内容
 - `\$\$?` - 结束的 `$` 或 `$$`
 ## 边界情况处理
 ### 1. 空行
 ```markdown
 输入:  # $$E = mc^2$$
 输出:  $$E = mc^2$$
 说明:  空行不影响判断
 ```
 ### 2. 前后空行
 ```markdown
 输入:  
       # $$E = mc^2$$
 输出:  
       $$E = mc^2$$
 说明:  保留空行结构
 ```
 ### 3. 复杂公式
 ```markdown
 输入:  # $$\int_{0}^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2}$$
 输出:  $$\int_{0}^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2}$$
 说明:  复杂公式也能正确处理
 ```
 ## 安全性分析
 ### ✅ 安全保证
 1. **保守策略**: 只在明确的情况下移除标题
 2. **多重条件**: 必须同时满足 3 个条件
 3. **保留真标题**: 有文本内容的标题不会被移除
 4. **保留结构**: 多公式场景保持原样
 ### ⚠️ 已考虑的风险
 #### 风险 1: 误删有意义的标题
 **场景**: 用户真的想要 `# $$formula$$` 格式
 **缓解**: 
 - 仅在单公式场景下触发
 - 如果有任何文本，保留标题
 - 这种真实需求极少（通常标题会有文字说明）
 #### 风险 2: 多级标题判断
 **场景**: `##`, `###` 等不同级别
 **处理**: 支持所有级别（`#{1,6}`）
 #### 风险 3: 公式类型混合
 **场景**: Display (`$$`) 和 inline (`$`) 混合
 **处理**: 两种类型都能正确识别和计数
 ## 性能影响
 | 操作 | 复杂度 | 时间 |
 |-----|-------|------|
 | 分行 | O(n) | < 0.1ms |
 | 遍历行 | O(n) | < 0.5ms |
 | 正则匹配 | O(m) | < 0.5ms |
 | 替换 | O(1) | < 0.1ms |
 | **总计** | **O(n)** | **< 1ms** |
 **评估**: ✅ 性能影响可忽略
 ## 与其他功能的关系
 ### 处理顺序
 ```
 1. OCR 识别 → Markdown 输出
 2. LaTeX 数学公式后处理
   - 数字错误修复
   - 命令拆分
   - 语法空格清理
 3. Markdown 级别后处理
   - 移除单公式假标题 ← 本功能
 ```
 ### 为什么放在最后
 - 需要看到完整的 Markdown 结构
 - 需要 LaTeX 公式已经被清理干净
 - 避免影响前面的处理步骤
 ## 配置选项（未来扩展）
 如果需要更细粒度的控制：
 ```python
 def _remove_false_heading_from_single_formula(
    markdown_content: str,
    enabled: bool = True,
    max_heading_level: int = 6,
    preserve_if_has_text: bool = True,
 ) -> str:
    """Configurable heading removal."""
    # ...
 ```
 ## 测试验证
 ```bash
 python test_remove_false_heading.py
 ```
 **关键测试**:
 - ✅ `# $$E = mc^2$$` → `$$E = mc^2$$`
 - ✅ `# Introduction\n$$E = mc^2$$` → 不变
 - ✅ `# $$x = y$$\n$$a = b$$` → 不变
 ## 部署检查
 - [x] 函数实现完成
 - [x] 集成到处理管道
 - [x] 无语法错误
 - [x] 测试用例覆盖
 - [x] 文档完善
 - [ ] 服务重启
 - [ ] 功能验证
 ## 向后兼容性
 **影响**: ✅ 正向改进
 - **之前**: 单公式可能带有错误的 `#` 标记
 - **之后**: 自动移除假标题，Markdown 更干净
 - **兼容性**: 不影响有真实文本的标题
 ## 总结
 | 方面 | 状态 |
 |-----|------|
 | 用户需求 | ✅ 实现 |
 | 单公式假标题 | ✅ 移除 |
 | 真标题保护 | ✅ 保留 |
 | 多公式场景 | ✅ 保留 |
 | 安全性 | ✅ 高（保守策略） |
 | 性能 | ✅ < 1ms |
 | 测试覆盖 | ✅ 完整 |
 **状态**: ✅ **实现完成，等待测试验证**
 **下一步**: 重启服务，测试只包含单个公式的图片！
--- a/docs/REMOVE_FALSE_HEADING_SUMMARY.md
+++ b/docs/REMOVE_FALSE_HEADING_SUMMARY.md
@@ -0,0 +1,132 @@
 # 移除单公式假标题 - 快速指南
 ## 问题
 OCR 识别单个公式时，可能错误添加标题标记：
 ```markdown
 ❌ 错误识别:  # $$E = mc^2$$
 ✅ 应该是:    $$E = mc^2$$
 ```
 ## 解决方案
 **自动移除假标题标记**
 ### 移除条件（必须同时满足）
 1. ✅ 只有**一个**公式
 2. ✅ 该公式在标题行（以 `#` 开头）
 3. ✅ 没有其他文本内容
 ### 保留标题的情况
 1. ❌ 有文本内容：`# Introduction\n$$E = mc^2$$`
 2. ❌ 多个公式：`# $$x = y$$\n$$a = b$$`
 3. ❌ 公式不在标题中：`$$E = mc^2$$`
 ## 示例
 ### ✅ 移除假标题
 ```markdown
 输入:  # $$E = mc^2$$
 输出:  $$E = mc^2$$
 ```
 ```markdown
 输入:  ## $$\frac{a}{b}$$
 输出:  $$\frac{a}{b}$$
 ```
 ### ❌ 保留真标题
 ```markdown
 输入:  # Introduction
       $$E = mc^2$$
 输出:  # Introduction
       $$E = mc^2$$
 ```
 ### ❌ 保留多公式场景
 ```markdown
 输入:  # $$x = y$$
       $$a = b$$
 输出:  # $$x = y$$
       $$a = b$$
 ```
 ## 实现
 **文件**: `app/services/ocr_service.py`
 **函数**: `_remove_false_heading_from_single_formula()`
 **位置**: Markdown 后处理的最后阶段
 ## 处理流程
 ```
 OCR 识别
    ↓
 LaTeX 公式后处理
    ↓
 移除单公式假标题 ← 新增
    ↓
 输出 Markdown
 ```
 ## 安全性
 ### ✅ 保护机制
 - **保守策略**: 只在明确的单公式场景下移除
 - **多重条件**: 必须同时满足 3 个条件
 - **保留真标题**: 有文本的标题不会被移除
 ### 不会误删
 - ✅ 带文字的标题：`# Introduction`
 - ✅ 多公式场景：`# $$x=y$$\n$$a=b$$`
 - ✅ 标题 + 公式：`# Title\n$$x=y$$`
 ## 测试
 ```bash
 python test_remove_false_heading.py
 ```
 **关键测试**:
 - ✅ `# $$E = mc^2$$` → `$$E = mc^2$$`
 - ✅ `# Intro\n$$E=mc^2$$` → 不变（保留标题）
 - ✅ `# $$x=y$$\n$$a=b$$` → 不变（多公式）
 ## 性能
 - **时间复杂度**: O(n)，n 为行数
 - **处理时间**: < 1ms
 - **影响**: ✅ 可忽略
 ## 部署
 1. ✅ 代码已完成
 2. ✅ 测试已覆盖
 3. 🔄 重启服务
 4. 🧪 测试验证
 ## 总结
 | 方面 | 状态 |
 |-----|------|
 | 移除假标题 | ✅ 实现 |
 | 保护真标题 | ✅ 保证 |
 | 保护多公式 | ✅ 保证 |
 | 安全性 | ✅ 高 |
 | 性能 | ✅ 优 |
 **状态**: ✅ **完成**
 **下一步**: 重启服务，测试单公式图片识别！
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -19,7 +19,7 @@ dependencies = [
    "numpy==2.2.6",
    "pillow==12.0.0",
    "python-docx==1.2.0",
-    "paddleocr==3.3.2",
+    "paddleocr==3.4.0",
    "doclayout-yolo==0.0.4",
    "latex2mathml==3.78.1",
    "paddle==1.2.0",
@@ -27,11 +27,13 @@ dependencies = [
    "paddlepaddle",
    "paddleocr[doc-parser]",
    "safetensors",
-    "lxml>=5.0.0"
+    "lxml>=5.0.0",
    "openai",
    "wordfreq",
 ]
-[tool.uv.sources]
+# [tool.uv.sources]
-paddlepaddle = { path = "wheels/paddlepaddle-3.4.0.dev20251224-cp310-cp310-linux_x86_64.whl" }
+# paddlepaddle = { path = "wheels/paddlepaddle-3.4.0.dev20251224-cp310-cp310-linux_x86_64.whl" }
 [project.optional-dependencies]
 dev = [
--- a/test_latex_space_cleaning.py
+++ b/test_latex_space_cleaning.py
@@ -1,154 +0,0 @@
 """Test LaTeX syntax space cleaning functionality.
 Tests the _clean_latex_syntax_spaces() function which removes
 unwanted spaces in LaTeX syntax that are common OCR errors.
 """
 import re
 def _clean_latex_syntax_spaces(expr: str) -> str:
    """Clean unwanted spaces in LaTeX syntax (common OCR errors)."""
    # Pattern 1: Spaces around _ and ^
    expr = re.sub(r'\s*_\s*', '_', expr)
    expr = re.sub(r'\s*\^\s*', '^', expr)
    # Pattern 2: Spaces inside braces that follow _ or ^
    def clean_subscript_superscript_braces(match):
        operator = match.group(1)
        content = match.group(2)
        # Remove spaces but preserve LaTeX commands
        cleaned = re.sub(r'(?<!\\)\s+(?!\\)', '', content)
        return f"{operator}{{{cleaned}}}"
    expr = re.sub(r'([_^])\{([^}]+)\}', clean_subscript_superscript_braces, expr)
    # Pattern 3: Spaces inside \frac arguments
    def clean_frac_braces(match):
        numerator = match.group(1).strip()
        denominator = match.group(2).strip()
        return f"\\frac{{{numerator}}}{{{denominator}}}"
    expr = re.sub(r'\\frac\s*\{\s*([^}]+?)\s*\}\s*\{\s*([^}]+?)\s*\}', 
                  clean_frac_braces, expr)
    # Pattern 4: Spaces after backslash
    expr = re.sub(r'\\\s+([a-zA-Z]+)', r'\\\1', expr)
    # Pattern 5: Spaces after LaTeX commands before braces
    expr = re.sub(r'(\\[a-zA-Z]+)\s*\{\s*', r'\1{', expr)
    return expr
 # Test cases
 test_cases = [
    # Subscripts with spaces
    (r"a _ {i 1}", r"a_{i1}", "subscript with spaces"),
    (r"x _ { n }", r"x_{n}", "subscript with spaces around"),
    (r"a_{i 1}", r"a_{i1}", "subscript braces with spaces"),
    (r"y _ { i j k }", r"y_{ijk}", "subscript multiple spaces"),
    # Superscripts with spaces
    (r"x ^ {2 3}", r"x^{23}", "superscript with spaces"),
    (r"a ^ { n }", r"a^{n}", "superscript with spaces around"),
    (r"e^{ 2 x }", r"e^{2x}", "superscript expression with spaces"),
    # Fractions with spaces
    (r"\frac { a } { b }", r"\frac{a}{b}", "fraction with spaces"),
    (r"\frac{ x + y }{ z }", r"\frac{x+y}{z}", "fraction expression with spaces"),
    (r"\frac { 1 } { 2 }", r"\frac{1}{2}", "fraction numbers with spaces"),
    # LaTeX commands with spaces
    (r"\ alpha", r"\alpha", "command with space after backslash"),
    (r"\ beta + \ gamma", r"\beta+\gamma", "multiple commands with spaces"),
    (r"\sqrt { x }", r"\sqrt{x}", "sqrt with space before brace"),
    (r"\sin { x }", r"\sin{x}", "sin with space"),
    # Combined cases
    (r"a _ {i 1} + b ^ {2 3}", r"a_{i1}+b^{23}", "subscript and superscript"),
    (r"\frac { a _ {i} } { b ^ {2} }", r"\frac{a_{i}}{b^{2}}", "fraction with sub/superscripts"),
    (r"x _ { \alpha }", r"x_{\alpha}", "subscript with LaTeX command"),
    (r"y ^ { \beta + 1 }", r"y^{\beta+1}", "superscript with expression"),
    # Edge cases - should preserve necessary spaces
    (r"a + b", r"a+b", "arithmetic operators (space removed)"),
    (r"\int x dx", r"\intxdx", "integral (spaces removed - might be too aggressive)"),
    (r"f(x) = x^2", r"f(x)=x^2", "function definition (spaces removed)"),
    # LaTeX commands should be preserved
    (r"\lambda_{1}", r"\lambda_{1}", "lambda with subscript (already clean)"),
    (r"\vdots", r"\vdots", "vdots (should not be affected)"),
    (r"\alpha \beta \gamma", r"\alpha\beta\gamma", "Greek letters (spaces removed between commands)"),
 ]
 print("=" * 80)
 print("LaTeX Syntax Space Cleaning Test")
 print("=" * 80)
 passed = 0
 failed = 0
 warnings = 0
 for original, expected, description in test_cases:
    result = _clean_latex_syntax_spaces(original)
    if result == expected:
        status = "✅ PASS"
        passed += 1
    else:
        status = "❌ FAIL"
        failed += 1
        # Check if it's close but not exact
        if result.replace(" ", "") == expected.replace(" ", ""):
            status = "⚠️  CLOSE"
            warnings += 1
    print(f"{status} {description:40s}")
    print(f"     Input:    {original}")
    print(f"     Expected: {expected}")
    print(f"     Got:      {result}")
    if result != expected:
        print(f"     >>> Mismatch!")
    print()
 print("=" * 80)
 print("USER'S SPECIFIC EXAMPLE")
 print("=" * 80)
 user_example = r"a _ {i 1}"
 expected_output = r"a_{i1}"
 result = _clean_latex_syntax_spaces(user_example)
 print(f"Input:    {user_example}")
 print(f"Expected: {expected_output}")
 print(f"Got:      {result}")
 print(f"Status:   {'✅ CORRECT' if result == expected_output else '❌ INCORRECT'}")
 print("\n" + "=" * 80)
 print("SUMMARY")
 print("=" * 80)
 print(f"Total tests: {len(test_cases)}")
 print(f"✅ Passed: {passed}")
 print(f"❌ Failed: {failed}")
 print(f"⚠️  Close: {warnings}")
 if failed == 0:
    print("\n✅ All tests passed!")
 else:
    print(f"\n⚠️  {failed} test(s) failed")
 print("\n" + "=" * 80)
 print("IMPORTANT NOTES")
 print("=" * 80)
 print("""
 1. ✅ Subscript/superscript spaces: a _ {i 1} -> a_{i1}
 2. ✅ Fraction spaces: \\frac { a } { b } -> \\frac{a}{b}
 3. ✅ Command spaces: \\ alpha -> \\alpha
 4. ⚠️  This might remove some intentional spaces in expressions
 5. ⚠️  LaTeX commands inside braces are preserved (e.g., _{\\alpha})
 If any edge cases are broken, the patterns can be adjusted to be more conservative.
 """)
 print("=" * 80)
--- a/tests/api/v1/endpoints/test_image_endpoint.py
+++ b/tests/api/v1/endpoints/test_image_endpoint.py
@@ -0,0 +1,98 @@
 import numpy as np
 import pytest
 from fastapi import FastAPI
 from fastapi.testclient import TestClient
 from app.api.v1.endpoints.image import router
 from app.core.dependencies import get_glmocr_endtoend_service, get_image_processor
 class _FakeImageProcessor:
    def preprocess(self, image_url=None, image_base64=None):
        return np.zeros((8, 8, 3), dtype=np.uint8)
 class _FakeOCRService:
    def __init__(self, result=None, error=None):
        self._result = result or {"markdown": "md", "latex": "tex", "mathml": "mml", "mml": "xml"}
        self._error = error
    def recognize(self, image):
        if self._error:
            raise self._error
        return self._result
 def _build_client(image_processor=None, ocr_service=None):
    app = FastAPI()
    app.include_router(router)
    app.dependency_overrides[get_image_processor] = lambda: image_processor or _FakeImageProcessor()
    app.dependency_overrides[get_glmocr_endtoend_service] = lambda: ocr_service or _FakeOCRService()
    return TestClient(app)
 def test_image_endpoint_requires_exactly_one_of_image_url_or_image_base64():
    client = _build_client()
    missing = client.post("/ocr", json={})
    both = client.post("/ocr", json={"image_url": "https://example.com/a.png", "image_base64": "abc"})
    assert missing.status_code == 422
    assert both.status_code == 422
 def test_image_endpoint_returns_503_for_runtime_error():
    client = _build_client(ocr_service=_FakeOCRService(error=RuntimeError("backend unavailable")))
    response = client.post("/ocr", json={"image_url": "https://example.com/a.png"})
    assert response.status_code == 503
    assert response.json()["detail"] == "backend unavailable"
 def test_image_endpoint_returns_500_for_unexpected_error():
    client = _build_client(ocr_service=_FakeOCRService(error=ValueError("boom")))
    response = client.post("/ocr", json={"image_url": "https://example.com/a.png"})
    assert response.status_code == 500
    assert response.json()["detail"] == "Internal server error"
 def test_image_endpoint_returns_ocr_payload():
    client = _build_client()
    response = client.post("/ocr", json={"image_base64": "ZmFrZQ=="})
    assert response.status_code == 200
    assert response.json() == {
        "latex": "tex",
        "markdown": "md",
        "mathml": "mml",
        "mml": "xml",
        "layout_info": {"regions": [], "MixedRecognition": False},
        "recognition_mode": "",
    }
 def test_image_endpoint_real_e2e_with_env_services():
    from app.main import app
    image_url = (
        "https://static.texpixel.com/formula/012dab3e-fb31-4ecd-90fc-6957458ee309.png"
        "?Expires=1773049821&OSSAccessKeyId=TMP.3KnrJUz7aXHoU9rLTAih4MAyPGd9zyGRHiqg9AyH6TY6NKtzqT2yr4qo7Vwf8fMRFCBrWXiCFrbBwC3vn7U6mspV2NeU1K"
        "&Signature=oynhP0OLIgFI0Sv3z2CWeHPT2Ck%3D"
    )
    with TestClient(app) as client:
        response = client.post(
            "/doc_process/v1/image/ocr",
            json={"image_url": image_url},
            headers={"x-request-id": "test-e2e"},
        )
    assert response.status_code == 200, response.text
    payload = response.json()
    assert isinstance(payload["markdown"], str)
    assert payload["markdown"].strip()
    assert set(payload) >= {"markdown", "latex", "mathml", "mml"}
--- a/tests/core/test_dependencies.py
+++ b/tests/core/test_dependencies.py
@@ -0,0 +1,10 @@
 import pytest
 from app.core import dependencies
 def test_get_glmocr_endtoend_service_raises_when_layout_detector_missing(monkeypatch):
    monkeypatch.setattr(dependencies, "_layout_detector", None)
    with pytest.raises(RuntimeError, match="Layout detector not initialized"):
        dependencies.get_glmocr_endtoend_service()
--- a/tests/schemas/test_image.py
+++ b/tests/schemas/test_image.py
@@ -0,0 +1,31 @@
 from app.schemas.image import ImageOCRRequest, LayoutRegion
 def test_layout_region_native_label_defaults_to_empty_string():
    region = LayoutRegion(
        type="text",
        bbox=[0, 0, 10, 10],
        confidence=0.9,
        score=0.9,
    )
    assert region.native_label == ""
 def test_layout_region_exposes_native_label_when_provided():
    region = LayoutRegion(
        type="text",
        native_label="doc_title",
        bbox=[0, 0, 10, 10],
        confidence=0.9,
        score=0.9,
    )
    assert region.native_label == "doc_title"
 def test_image_ocr_request_requires_exactly_one_input():
    request = ImageOCRRequest(image_url="https://example.com/test.png")
    assert request.image_url == "https://example.com/test.png"
    assert request.image_base64 is None
--- a/tests/services/test_glm_postprocess.py
+++ b/tests/services/test_glm_postprocess.py
@@ -0,0 +1,199 @@
 from app.services.glm_postprocess import (
    GLMResultFormatter,
    clean_formula_number,
    clean_repeated_content,
    find_consecutive_repeat,
 )
 def test_find_consecutive_repeat_truncates_when_threshold_met():
    repeated = "abcdefghij" * 10 + "tail"
    assert find_consecutive_repeat(repeated) == "abcdefghij"
 def test_find_consecutive_repeat_returns_none_when_below_threshold():
    assert find_consecutive_repeat("abcdefghij" * 9) is None
 def test_clean_repeated_content_handles_consecutive_and_line_level_repeats():
    assert clean_repeated_content("abcdefghij" * 10 + "tail") == "abcdefghij"
    line_repeated = "\n".join(["same line"] * 10 + ["other"])
    assert clean_repeated_content(line_repeated, line_threshold=10) == "same line\n"
    assert clean_repeated_content("normal text") == "normal text"
 def test_clean_formula_number_strips_wrapping_parentheses():
    assert clean_formula_number("(1)") == "1"
    assert clean_formula_number("（2.1）") == "2.1"
    assert clean_formula_number("3") == "3"
 def test_clean_content_removes_literal_tabs_and_long_repeat_noise():
    formatter = GLMResultFormatter()
    noisy = r"\t\t" + ("·" * 5) + ("abcdefghij" * 205) + r"\t"
    cleaned = formatter._clean_content(noisy)
    assert cleaned.startswith("···")
    assert cleaned.endswith("abcdefghij")
    assert r"\t" not in cleaned
 def test_format_content_handles_titles_formula_text_and_newlines():
    formatter = GLMResultFormatter()
    assert formatter._format_content("Intro", "text", "doc_title") == "# Intro"
    assert formatter._format_content("- Section", "text", "paragraph_title") == "## Section"
    assert formatter._format_content(r"\[x+y\]", "formula", "display_formula") == "$$\nx+y\n$$"
    assert formatter._format_content("· item\nnext", "text", "text") == "- item\n\nnext"
 def test_merge_formula_numbers_merges_before_and_after_formula():
    formatter = GLMResultFormatter()
    before = formatter._merge_formula_numbers(
        [
            {"index": 0, "label": "text", "native_label": "formula_number", "content": "(1)"},
            {"index": 1, "label": "formula", "native_label": "display_formula", "content": "$$\nx+y\n$$"},
        ]
    )
    after = formatter._merge_formula_numbers(
        [
            {"index": 0, "label": "formula", "native_label": "display_formula", "content": "$$\nx+y\n$$"},
            {"index": 1, "label": "text", "native_label": "formula_number", "content": "(2)"},
        ]
    )
    untouched = formatter._merge_formula_numbers(
        [{"index": 0, "label": "text", "native_label": "formula_number", "content": "(3)"}]
    )
    assert before == [
        {
            "index": 0,
            "label": "formula",
            "native_label": "display_formula",
            "content": "$$\nx+y \\tag{1}\n$$",
        }
    ]
    assert after == [
        {
            "index": 0,
            "label": "formula",
            "native_label": "display_formula",
            "content": "$$\nx+y \\tag{2}\n$$",
        }
    ]
    assert untouched == []
 def test_merge_text_blocks_joins_hyphenated_words_when_wordfreq_accepts(monkeypatch):
    formatter = GLMResultFormatter()
    monkeypatch.setattr("app.services.glm_postprocess._WORDFREQ_AVAILABLE", True)
    monkeypatch.setattr("app.services.glm_postprocess.zipf_frequency", lambda word, lang: 3.0)
    merged = formatter._merge_text_blocks(
        [
            {"index": 0, "label": "text", "native_label": "text", "content": "inter-"},
            {"index": 1, "label": "text", "native_label": "text", "content": "national"},
        ]
    )
    assert merged == [
        {"index": 0, "label": "text", "native_label": "text", "content": "international"}
    ]
 def test_merge_text_blocks_skips_invalid_merge(monkeypatch):
    formatter = GLMResultFormatter()
    monkeypatch.setattr("app.services.glm_postprocess._WORDFREQ_AVAILABLE", True)
    monkeypatch.setattr("app.services.glm_postprocess.zipf_frequency", lambda word, lang: 1.0)
    merged = formatter._merge_text_blocks(
        [
            {"index": 0, "label": "text", "native_label": "text", "content": "inter-"},
            {"index": 1, "label": "text", "native_label": "text", "content": "National"},
        ]
    )
    assert merged == [
        {"index": 0, "label": "text", "native_label": "text", "content": "inter-"},
        {"index": 1, "label": "text", "native_label": "text", "content": "National"},
    ]
 def test_format_bullet_points_infers_missing_middle_bullet():
    formatter = GLMResultFormatter()
    items = [
        {"native_label": "text", "content": "- first", "bbox_2d": [10, 0, 50, 10]},
        {"native_label": "text", "content": "second", "bbox_2d": [12, 12, 52, 22]},
        {"native_label": "text", "content": "- third", "bbox_2d": [11, 24, 51, 34]},
    ]
    formatted = formatter._format_bullet_points(items)
    assert formatted[1]["content"] == "- second"
 def test_format_bullet_points_skips_when_bbox_missing():
    formatter = GLMResultFormatter()
    items = [
        {"native_label": "text", "content": "- first", "bbox_2d": [10, 0, 50, 10]},
        {"native_label": "text", "content": "second", "bbox_2d": []},
        {"native_label": "text", "content": "- third", "bbox_2d": [11, 24, 51, 34]},
    ]
    formatted = formatter._format_bullet_points(items)
    assert formatted[1]["content"] == "second"
 def test_process_runs_full_pipeline_and_skips_empty_content():
    formatter = GLMResultFormatter()
    regions = [
        {
            "index": 0,
            "label": "text",
            "native_label": "doc_title",
            "content": "Doc Title",
            "bbox_2d": [0, 0, 100, 30],
        },
        {
            "index": 1,
            "label": "text",
            "native_label": "formula_number",
            "content": "(1)",
            "bbox_2d": [80, 50, 100, 60],
        },
        {
            "index": 2,
            "label": "formula",
            "native_label": "display_formula",
            "content": "x+y",
            "bbox_2d": [0, 40, 100, 80],
        },
        {
            "index": 3,
            "label": "figure",
            "native_label": "image",
            "content": "figure placeholder",
            "bbox_2d": [0, 80, 100, 120],
        },
        {
            "index": 4,
            "label": "text",
            "native_label": "text",
            "content": "",
            "bbox_2d": [0, 120, 100, 150],
        },
    ]
    output = formatter.process(regions)
    assert "# Doc Title" in output
    assert "$$\nx+y \\tag{1}\n$$" in output
    assert "![](bbox=[0, 80, 100, 120])" in output
--- a/tests/services/test_layout_detector.py
+++ b/tests/services/test_layout_detector.py
@@ -0,0 +1,46 @@
 import numpy as np
 from app.services.layout_detector import LayoutDetector
 class _FakePredictor:
    def __init__(self, boxes):
        self._boxes = boxes
    def predict(self, image):
        return [{"boxes": self._boxes}]
 def test_detect_applies_postprocess_and_keeps_native_label(monkeypatch):
    raw_boxes = [
        {"cls_id": 22, "label": "text", "score": 0.95, "coordinate": [0, 0, 100, 100]},
        {"cls_id": 22, "label": "text", "score": 0.90, "coordinate": [10, 10, 20, 20]},
        {"cls_id": 6, "label": "doc_title", "score": 0.99, "coordinate": [0, 0, 80, 20]},
    ]
    detector = LayoutDetector.__new__(LayoutDetector)
    detector._get_layout_detector = lambda: _FakePredictor(raw_boxes)
    calls = {}
    def fake_apply_layout_postprocess(boxes, img_size, layout_nms, layout_unclip_ratio, layout_merge_bboxes_mode):
        calls["args"] = {
            "boxes": boxes,
            "img_size": img_size,
            "layout_nms": layout_nms,
            "layout_unclip_ratio": layout_unclip_ratio,
            "layout_merge_bboxes_mode": layout_merge_bboxes_mode,
        }
        return [boxes[0], boxes[2]]
    monkeypatch.setattr("app.services.layout_detector.apply_layout_postprocess", fake_apply_layout_postprocess)
    image = np.zeros((200, 100, 3), dtype=np.uint8)
    info = detector.detect(image)
    assert calls["args"]["img_size"] == (100, 200)
    assert calls["args"]["layout_nms"] is True
    assert calls["args"]["layout_merge_bboxes_mode"] == "large"
    assert [region.native_label for region in info.regions] == ["text", "doc_title"]
    assert [region.type for region in info.regions] == ["text", "text"]
    assert info.MixedRecognition is True
--- a/tests/services/test_layout_postprocess.py
+++ b/tests/services/test_layout_postprocess.py
@@ -0,0 +1,151 @@
 import math
 import numpy as np
 from app.services.layout_postprocess import (
    apply_layout_postprocess,
    check_containment,
    iou,
    is_contained,
    nms,
    unclip_boxes,
 )
 def _raw_box(cls_id, score, x1, y1, x2, y2, label="text"):
    return {
        "cls_id": cls_id,
        "label": label,
        "score": score,
        "coordinate": [x1, y1, x2, y2],
    }
 def test_iou_handles_full_none_and_partial_overlap():
    assert iou([0, 0, 9, 9], [0, 0, 9, 9]) == 1.0
    assert iou([0, 0, 9, 9], [20, 20, 29, 29]) == 0.0
    assert math.isclose(iou([0, 0, 9, 9], [5, 5, 14, 14]), 1 / 7, rel_tol=1e-6)
 def test_nms_keeps_highest_score_for_same_class_overlap():
    boxes = np.array(
        [
            [0, 0.95, 0, 0, 10, 10],
            [0, 0.80, 1, 1, 11, 11],
        ],
        dtype=float,
    )
    kept = nms(boxes, iou_same=0.6, iou_diff=0.98)
    assert kept == [0]
 def test_nms_keeps_cross_class_overlap_boxes_below_diff_threshold():
    boxes = np.array(
        [
            [0, 0.95, 0, 0, 10, 10],
            [1, 0.90, 1, 1, 11, 11],
        ],
        dtype=float,
    )
    kept = nms(boxes, iou_same=0.6, iou_diff=0.98)
    assert kept == [0, 1]
 def test_nms_returns_single_box_index():
    boxes = np.array([[0, 0.95, 0, 0, 10, 10]], dtype=float)
    assert nms(boxes) == [0]
 def test_is_contained_uses_overlap_threshold():
    outer = [0, 0.9, 0, 0, 10, 10]
    inner = [0, 0.9, 2, 2, 8, 8]
    partial = [0, 0.9, 6, 6, 12, 12]
    assert is_contained(inner, outer) is True
    assert is_contained(partial, outer) is False
    assert is_contained(partial, outer, overlap_threshold=0.3) is True
 def test_check_containment_respects_preserve_class_ids():
    boxes = np.array(
        [
            [0, 0.9, 0, 0, 100, 100],
            [1, 0.8, 10, 10, 30, 30],
            [2, 0.7, 15, 15, 25, 25],
        ],
        dtype=float,
    )
    contains_other, contained_by_other = check_containment(boxes, preserve_cls_ids={1})
    assert contains_other.tolist() == [1, 1, 0]
    assert contained_by_other.tolist() == [0, 0, 1]
 def test_unclip_boxes_supports_scalar_tuple_dict_and_none():
    boxes = np.array(
        [
            [0, 0.9, 10, 10, 20, 20],
            [1, 0.8, 30, 30, 50, 40],
        ],
        dtype=float,
    )
    scalar = unclip_boxes(boxes, 2.0)
    assert scalar[:, 2:6].tolist() == [[5.0, 5.0, 25.0, 25.0], [20.0, 25.0, 60.0, 45.0]]
    tuple_ratio = unclip_boxes(boxes, (2.0, 3.0))
    assert tuple_ratio[:, 2:6].tolist() == [[5.0, 0.0, 25.0, 30.0], [20.0, 20.0, 60.0, 50.0]]
    per_class = unclip_boxes(boxes, {1: (1.5, 2.0)})
    assert per_class[:, 2:6].tolist() == [[10.0, 10.0, 20.0, 20.0], [25.0, 25.0, 55.0, 45.0]]
    assert np.array_equal(unclip_boxes(boxes, None), boxes)
 def test_apply_layout_postprocess_large_mode_removes_contained_small_box():
    boxes = [
        _raw_box(0, 0.95, 0, 0, 100, 100, "text"),
        _raw_box(0, 0.90, 10, 10, 20, 20, "text"),
    ]
    result = apply_layout_postprocess(boxes, img_size=(120, 120), layout_merge_bboxes_mode="large")
    assert [box["coordinate"] for box in result] == [[0, 0, 100, 100]]
 def test_apply_layout_postprocess_preserves_contained_image_like_boxes():
    boxes = [
        _raw_box(0, 0.95, 0, 0, 100, 100, "text"),
        _raw_box(1, 0.90, 10, 10, 20, 20, "image"),
        _raw_box(2, 0.90, 25, 25, 35, 35, "seal"),
        _raw_box(3, 0.90, 40, 40, 50, 50, "chart"),
    ]
    result = apply_layout_postprocess(boxes, img_size=(120, 120), layout_merge_bboxes_mode="large")
    assert {box["label"] for box in result} == {"text", "image", "seal", "chart"}
 def test_apply_layout_postprocess_clamps_skips_invalid_and_filters_large_image():
    boxes = [
        _raw_box(0, 0.95, -10, -5, 40, 50, "text"),
        _raw_box(1, 0.90, 10, 10, 10, 50, "text"),
        _raw_box(2, 0.85, 0, 0, 100, 90, "image"),
    ]
    result = apply_layout_postprocess(
        boxes,
        img_size=(100, 90),
        layout_nms=False,
        layout_merge_bboxes_mode=None,
    )
    assert result == [
        {"cls_id": 0, "label": "text", "score": 0.95, "coordinate": [0, 0, 40, 50]}
    ]
--- a/tests/services/test_ocr_service.py
+++ b/tests/services/test_ocr_service.py
@@ -0,0 +1,124 @@
 import base64
 from types import SimpleNamespace
 import cv2
 import numpy as np
 from app.schemas.image import LayoutInfo, LayoutRegion
 from app.services.ocr_service import GLMOCREndToEndService
 class _FakeConverter:
    def convert_to_formats(self, markdown):
        return SimpleNamespace(
            latex=f"LATEX::{markdown}",
            mathml=f"MATHML::{markdown}",
            mml=f"MML::{markdown}",
        )
 class _FakeImageProcessor:
    def add_padding(self, image):
        return image
 class _FakeLayoutDetector:
    def __init__(self, regions):
        self._regions = regions
    def detect(self, image):
        return LayoutInfo(regions=self._regions, MixedRecognition=bool(self._regions))
 def _build_service(regions=None):
    return GLMOCREndToEndService(
        vl_server_url="http://127.0.0.1:8002/v1",
        image_processor=_FakeImageProcessor(),
        converter=_FakeConverter(),
        layout_detector=_FakeLayoutDetector(regions or []),
        max_workers=2,
    )
 def test_encode_region_returns_decodable_base64_jpeg():
    service = _build_service()
    image = np.zeros((8, 12, 3), dtype=np.uint8)
    image[:, :] = [0, 128, 255]
    encoded = service._encode_region(image)
    decoded = cv2.imdecode(np.frombuffer(base64.b64decode(encoded), dtype=np.uint8), cv2.IMREAD_COLOR)
    assert decoded.shape[:2] == image.shape[:2]
 def test_call_vllm_builds_messages_and_returns_content():
    service = _build_service()
    captured = {}
    def create(**kwargs):
        captured.update(kwargs)
        return SimpleNamespace(
            choices=[SimpleNamespace(message=SimpleNamespace(content=" recognized content \n"))]
        )
    service.openai_client = SimpleNamespace(
        chat=SimpleNamespace(completions=SimpleNamespace(create=create))
    )
    result = service._call_vllm(np.zeros((4, 4, 3), dtype=np.uint8), "Formula Recognition:")
    assert result == "recognized content"
    assert captured["model"] == "glm-ocr"
    assert captured["max_tokens"] == 1024
    assert captured["messages"][0]["content"][0]["type"] == "image_url"
    assert captured["messages"][0]["content"][0]["image_url"]["url"].startswith("data:image/jpeg;base64,")
    assert captured["messages"][0]["content"][1] == {"type": "text", "text": "Formula Recognition:"}
 def test_normalize_bbox_scales_coordinates_to_1000():
    service = _build_service()
    assert service._normalize_bbox([0, 0, 200, 100], 200, 100) == [0, 0, 1000, 1000]
    assert service._normalize_bbox([50, 25, 150, 75], 200, 100) == [250, 250, 750, 750]
 def test_recognize_falls_back_to_full_image_when_no_layout_regions(monkeypatch):
    service = _build_service(regions=[])
    image = np.zeros((20, 30, 3), dtype=np.uint8)
    monkeypatch.setattr(service, "_call_vllm", lambda image, prompt: "raw text")
    result = service.recognize(image)
    assert result["markdown"] == "raw text"
    assert result["latex"] == "LATEX::raw text"
    assert result["mathml"] == "MATHML::raw text"
    assert result["mml"] == "MML::raw text"
 def test_recognize_skips_figures_keeps_order_and_postprocesses(monkeypatch):
    regions = [
        LayoutRegion(type="text", native_label="doc_title", bbox=[0, 0, 10, 10], confidence=0.9, score=0.9),
        LayoutRegion(type="figure", native_label="image", bbox=[10, 10, 20, 20], confidence=0.8, score=0.8),
        LayoutRegion(type="formula", native_label="display_formula", bbox=[20, 20, 40, 40], confidence=0.95, score=0.95),
    ]
    service = _build_service(regions=regions)
    image = np.zeros((40, 40, 3), dtype=np.uint8)
    calls = []
    def fake_call_vllm(cropped, prompt):
        calls.append(prompt)
        if prompt == "Text Recognition:":
            return "Title"
        return "x + y"
    monkeypatch.setattr(service, "_call_vllm", fake_call_vllm)
    result = service.recognize(image)
    assert calls == ["Text Recognition:", "Formula Recognition:"]
    assert result["markdown"] == "# Title\n\n$$\nx + y\n$$"
    assert result["latex"] == "LATEX::# Title\n\n$$\nx + y\n$$"
    assert result["mathml"] == "MATHML::# Title\n\n$$\nx + y\n$$"
    assert result["mml"] == "MML::# Title\n\n$$\nx + y\n$$"
Author	SHA1	Message	Date
liuyuanchuang	f8173f7c0a	feat: optimize padding and formula fallback	2026-03-10 09:54:54 +08:00
liuyuanchuang	cff14904bf	fix: layout detection & format conversion robustness Three targeted fixes for layout processing issues: 1. formula_number type mapping (layout_detector.py) - Changed formula_number region type from "formula" to "text" - Ensures Text Recognition prompt, preventing $$-wrapped output - Prevents malformed \tag{$$...\n$$} in merged formulas 2. Reading order (ocr_service.py) - Sort layout regions by (y1, x1) after detection - Ensures top-to-bottom, left-to-right processing order - Fixes paragraph ordering issues in output 3. Formula number cleaning (glm_postprocess.py) - clean_formula_number() now strips $$, $, \[...\] delimiters - Handles edge case where vLLM still returns math-mode wrapped content - Prevents delimiter leakage into \tag{} placeholders Also adds logging: - Warning when empty formula content is skipped - Warning when region crop is too small (< 10×10 px) - Warning when vLLM parallel call fails - Warning when format conversion fails Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-03-09 17:57:05 +08:00
liuyuanchuang	bd1c118cb2	chore: update ignore file	2026-03-09 17:19:53 +08:00
liuyuanchuang	6dfaf9668b	feat add glm-ocr core	2026-03-09 17:13:19 +08:00
liuyuanchuang	d74130914c	feat: use padding mode	2026-02-26 17:01:23 +08:00
liuyuanchuang	fd91819af0	feat: no padding image	2026-02-25 09:52:45 +08:00
liuyuanchuang	a568149164	fix: update paddle-ocr url	2026-02-09 22:26:31 +08:00
liuyuanchuang	f64bf25f67	fix: image variable not defined	2026-02-09 22:23:52 +08:00
liuyuanchuang	8114abc27a	feat: rm csv file	2026-02-09 22:19:12 +08:00
liuyuanchuang	7799e39298	fix: image as element	2026-02-09 22:18:30 +08:00
liuyuanchuang	5504bbbf1e	fix:glm max tokens	2026-02-07 21:38:41 +08:00
liuyuanchuang	1a4d54ce34	fix: post hanlde for ocr	2026-02-07 21:28:46 +08:00
liuyuanchuang	f514f98142	feat: add padding	2026-02-07 16:53:09 +08:00
liuyuanchuang	d86107976a	feat: update threshold	2026-02-07 13:26:57 +08:00
liuyuanchuang	de66ae24af	build: update package	2026-02-07 09:58:00 +08:00
liuyuanchuang	2a962a6271	feat: update dockerfile	2026-02-07 09:40:34 +08:00
liuyuanchuang	fa10d8194a	fix: downgrade threshold	2026-02-07 09:34:15 +08:00
liuyuanchuang	05a39d8b2e	fix: update type comment	2026-02-07 09:27:51 +08:00
liuyuanchuang	aec030b071	feat: add log	2026-02-07 09:26:45 +08:00
liuyuanchuang	23e2160668	fix: get setting param	2026-02-07 09:11:43 +08:00
liuyuanchuang	f0ad0a4c77	feat: add glm ocr	2026-02-06 15:06:50 +08:00
liuyuanchuang	c372a4afbe	fix: update port in dockerfile	2026-02-05 22:20:01 +08:00
liuyuanchuang	36172ba4ff	fix: update port	2026-02-05 22:08:04 +08:00
liuyuanchuang	a3ca04856f	fix: rm space	2026-02-05 21:50:12 +08:00
liuyuanchuang	eb68843e2c	feat: update model name	2026-02-05 21:26:23 +08:00
liuyuanchuang	c93eba2839	refact: add log	2026-02-05 20:50:04 +08:00
liuyuanchuang	15986c8966	feat: update paddleocr-vl port	2026-02-05 20:43:24 +08:00
liuyuanchuang	4de9aefa68	feat: add paddleocr-vl	2026-02-05 20:33:43 +08:00
liuyuanchuang	767006ee38	Merge branch 'feature/converter'	2026-02-05 18:00:20 +08:00
liuyuanchuang	83e9bf0fb1	feat: add rm fake title	2026-02-05 17:59:54 +08:00
YogeLiu	d841e7321a	Merge pull request 'feature/converter' (#1 ) from feature/converter into main Reviewed-on: #1	2026-02-05 13:48:21 +08:00