1.6 KiB
1.6 KiB
Change: Add Document Processing API
Why
DocProcesser needs a FastAPI backend to accept images (via URL or base64) and convert them to LaTeX/Markdown/MathML, plus a markdown-to-DOCX conversion endpoint. This establishes the core functionality of the project.
What Changes
- BREAKING: Initial project setup (new FastAPI project structure)
- Add image-to-OCR API endpoint (
POST /doc_process/v1/image/ocr)- Accept
image_urlorimage_base64input - Preprocess with OpenCV (30% whitespace padding)
- Use DocLayout-YOLO for layout detection
- Route to PaddleOCR-VL (with PP-DocLayoutV2) for text/formula recognition
- Exists
plain_textelement, use PP-DocLayoutV2 to recognize the image as mixed_recognition , otherwise directly PaddleOCR-VL API combined with prompt Formula Recognition as formula_recognition. - Refrence markdown_2_docx code convert the markdown to latex, mathml for mixed_recognition, convert the latex to markdown, mathml for formula_recognition
- Return LaTeX, Markdown, and MathML outputs
- Accept
- Add markdown-to-DOCX API endpoint (
POST /doc_process/v1/convert/docx)- Accept markdown content
- Refrence markdown_2_docx library for conversion, the address is http://github.com/YogeLiu/markdown_2_docxdd.
- Return DOCX file
- Add Dockerfile for GPU-enabled deployment (RTX 5080, port 8053)
Impact
- Affected specs:
image-ocr,markdown-docx - Affected code: New project structure under
app/ - External dependencies:
- DocLayout-YOLO (pre-downloaded model, not fetched in container)
- PaddleOCR-VL with vLLM backend (external service at localhost:8080)
- markdown_2_docx library