# Change: Add Document Processing API ## Why DocProcesser needs a FastAPI backend to accept images (via URL or base64) and convert them to LaTeX/Markdown/MathML, plus a markdown-to-DOCX conversion endpoint. This establishes the core functionality of the project. ## What Changes - **BREAKING**: Initial project setup (new FastAPI project structure) - Add image-to-OCR API endpoint (`POST /doc_process/v1/image/ocr`) - Accept `image_url` or `image_base64` input - Preprocess with OpenCV (30% whitespace padding) - Use DocLayout-YOLO for layout detection - Route to PaddleOCR-VL (with PP-DocLayoutV2) for text/formula recognition - Exists `plain_text` element, use PP-DocLayoutV2 to recognize the image as mixed_recognition , otherwise directly PaddleOCR-VL API combined with prompt Formula Recognition as formula_recognition. - Refrence markdown_2_docx code convert the markdown to latex, mathml for mixed_recognition, convert the latex to markdown, mathml for formula_recognition - Return LaTeX, Markdown, and MathML outputs - Add markdown-to-DOCX API endpoint (`POST /doc_process/v1/convert/docx`) - Accept markdown content - Refrence markdown_2_docx library for conversion, the address is http://github.com/YogeLiu/markdown_2_docxdd. - Return DOCX file - Add Dockerfile for GPU-enabled deployment (RTX 5080, port 8053) ## Impact - Affected specs: `image-ocr`, `markdown-docx` - Affected code: New project structure under `app/` - External dependencies: - DocLayout-YOLO (pre-downloaded model, not fetched in container) - PaddleOCR-VL with vLLM backend (external service at localhost:8080) - markdown_2_docx library