Files
liuyuanchuang 874fd383cc init repo
2025-12-29 17:34:58 +08:00

1.4 KiB

1. Project Scaffolding

  • 1.1 Create FastAPI project structure (app/, api/, core/, services/, schemas/)
  • 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)
  • 1.3 Create app/main.py with FastAPI app initialization
  • 1.4 Create app/core/config.py with Pydantic Settings

2. Image OCR API

  • 2.1 Create request/response schemas in app/schemas/image.py
  • 2.2 Implement image preprocessing service with OpenCV padding (app/services/image_processor.py)
  • 2.3 Implement DocLayout-YOLO wrapper (app/services/layout_detector.py)
  • 2.4 Implement PaddleOCR-VL client (app/services/ocr_service.py)
  • 2.5 Create image OCR endpoint (app/api/v1/endpoints/image.py)
  • 2.6 Wire up router and test endpoint

3. Markdown to DOCX API

  • 3.1 Create request/response schemas in app/schemas/convert.py
  • 3.2 Integrate markdown_2_docx library (app/services/docx_converter.py)
  • 3.3 Create conversion endpoint (app/api/v1/endpoints/convert.py)
  • 3.4 Wire up router and test endpoint

4. Deployment

  • 4.1 Create Dockerfile with CUDA base image for RTX 5080
  • 4.2 Create docker-compose.yml (optional, for local development)
  • 4.3 Document deployment steps in README

5. Validation

  • 5.1 Test image OCR endpoint with sample images
  • 5.2 Test markdown to DOCX conversion
  • 5.3 Verify Docker build and GPU access