openspec/changes/add-doc-processing-api/tasks.md

## 1. Project Scaffolding

- [x] 1.1 Create FastAPI project structure (`app/`, `api/`, `core/`, `services/`, `schemas/`)
- [x] 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)
- [x] 1.3 Create `app/main.py` with FastAPI app initialization
- [x] 1.4 Create `app/core/config.py` with Pydantic Settings

## 2. Image OCR API

- [x] 2.1 Create request/response schemas in `app/schemas/image.py`
- [x] 2.2 Implement image preprocessing service with OpenCV padding (`app/services/image_processor.py`)
- [x] 2.3 Implement DocLayout-YOLO wrapper (`app/services/layout_detector.py`)
- [x] 2.4 Implement PaddleOCR-VL client (`app/services/ocr_service.py`)
- [x] 2.5 Create image OCR endpoint (`app/api/v1/endpoints/image.py`)
- [x] 2.6 Wire up router and test endpoint

## 3. Markdown to DOCX API

- [x] 3.1 Create request/response schemas in `app/schemas/convert.py`
- [x] 3.2 Integrate markdown_2_docx library (`app/services/docx_converter.py`)
- [x] 3.3 Create conversion endpoint (`app/api/v1/endpoints/convert.py`)
- [x] 3.4 Wire up router and test endpoint

## 4. Deployment

- [x] 4.1 Create Dockerfile with CUDA base image for RTX 5080
- [x] 4.2 Create docker-compose.yml (optional, for local development)
- [x] 4.3 Document deployment steps in README

## 5. Validation

- [ ] 5.1 Test image OCR endpoint with sample images
- [ ] 5.2 Test markdown to DOCX conversion
- [ ] 5.3 Verify Docker build and GPU access
init repo 2025-12-29 17:34:58 +08:00			`## 1. Project Scaffolding`

			- [x] 1.1 Create FastAPI project structure (`app/`, `api/`, `core/`, `services/`, `schemas/`)
			`- [x] 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)`
			- [x] 1.3 Create `app/main.py` with FastAPI app initialization
			- [x] 1.4 Create `app/core/config.py` with Pydantic Settings

			`## 2. Image OCR API`

			- [x] 2.1 Create request/response schemas in `app/schemas/image.py`
			- [x] 2.2 Implement image preprocessing service with OpenCV padding (`app/services/image_processor.py`)
			- [x] 2.3 Implement DocLayout-YOLO wrapper (`app/services/layout_detector.py`)
			- [x] 2.4 Implement PaddleOCR-VL client (`app/services/ocr_service.py`)
			- [x] 2.5 Create image OCR endpoint (`app/api/v1/endpoints/image.py`)
			`- [x] 2.6 Wire up router and test endpoint`

			`## 3. Markdown to DOCX API`

			- [x] 3.1 Create request/response schemas in `app/schemas/convert.py`
			- [x] 3.2 Integrate markdown_2_docx library (`app/services/docx_converter.py`)
			- [x] 3.3 Create conversion endpoint (`app/api/v1/endpoints/convert.py`)
			`- [x] 3.4 Wire up router and test endpoint`

			`## 4. Deployment`

			`- [x] 4.1 Create Dockerfile with CUDA base image for RTX 5080`
			`- [x] 4.2 Create docker-compose.yml (optional, for local development)`
			`- [x] 4.3 Document deployment steps in README`

			`## 5. Validation`

			`- [ ] 5.1 Test image OCR endpoint with sample images`
			`- [ ] 5.2 Test markdown to DOCX conversion`
			`- [ ] 5.3 Verify Docker build and GPU access`