1.4 KiB
1.4 KiB
1. Project Scaffolding
- 1.1 Create FastAPI project structure (
app/,api/,core/,services/,schemas/) - 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)
- 1.3 Create
app/main.pywith FastAPI app initialization - 1.4 Create
app/core/config.pywith Pydantic Settings
2. Image OCR API
- 2.1 Create request/response schemas in
app/schemas/image.py - 2.2 Implement image preprocessing service with OpenCV padding (
app/services/image_processor.py) - 2.3 Implement DocLayout-YOLO wrapper (
app/services/layout_detector.py) - 2.4 Implement PaddleOCR-VL client (
app/services/ocr_service.py) - 2.5 Create image OCR endpoint (
app/api/v1/endpoints/image.py) - 2.6 Wire up router and test endpoint
3. Markdown to DOCX API
- 3.1 Create request/response schemas in
app/schemas/convert.py - 3.2 Integrate markdown_2_docx library (
app/services/docx_converter.py) - 3.3 Create conversion endpoint (
app/api/v1/endpoints/convert.py) - 3.4 Wire up router and test endpoint
4. Deployment
- 4.1 Create Dockerfile with CUDA base image for RTX 5080
- 4.2 Create docker-compose.yml (optional, for local development)
- 4.3 Document deployment steps in README
5. Validation
- 5.1 Test image OCR endpoint with sample images
- 5.2 Test markdown to DOCX conversion
- 5.3 Verify Docker build and GPU access