Files
doc_processer/openspec/changes/add-doc-processing-api/proposal.md
liuyuanchuang 874fd383cc init repo
2025-12-29 17:34:58 +08:00

1.6 KiB

Change: Add Document Processing API

Why

DocProcesser needs a FastAPI backend to accept images (via URL or base64) and convert them to LaTeX/Markdown/MathML, plus a markdown-to-DOCX conversion endpoint. This establishes the core functionality of the project.

What Changes

  • BREAKING: Initial project setup (new FastAPI project structure)
  • Add image-to-OCR API endpoint (POST /doc_process/v1/image/ocr)
    • Accept image_url or image_base64 input
    • Preprocess with OpenCV (30% whitespace padding)
    • Use DocLayout-YOLO for layout detection
    • Route to PaddleOCR-VL (with PP-DocLayoutV2) for text/formula recognition
    • Exists plain_text element, use PP-DocLayoutV2 to recognize the image as mixed_recognition , otherwise directly PaddleOCR-VL API combined with prompt Formula Recognition as formula_recognition.
    • Refrence markdown_2_docx code convert the markdown to latex, mathml for mixed_recognition, convert the latex to markdown, mathml for formula_recognition
    • Return LaTeX, Markdown, and MathML outputs
  • Add markdown-to-DOCX API endpoint (POST /doc_process/v1/convert/docx)
  • Add Dockerfile for GPU-enabled deployment (RTX 5080, port 8053)

Impact

  • Affected specs: image-ocr, markdown-docx
  • Affected code: New project structure under app/
  • External dependencies:
    • DocLayout-YOLO (pre-downloaded model, not fetched in container)
    • PaddleOCR-VL with vLLM backend (external service at localhost:8080)
    • markdown_2_docx library