init repo

2025-12-29 17:34:58 +08:00
commit 874fd383cc
36 changed files with 2641 additions and 0 deletions
--- a/openspec/AGENTS.md
+++ b/openspec/AGENTS.md
@@ -0,0 +1,456 @@
+# OpenSpec Instructions
+
+Instructions for AI coding assistants using OpenSpec for spec-driven development.
+
+## TL;DR Quick Checklist
+
+- Search existing work: `openspec spec list --long`, `openspec list` (use `rg` only for full-text search)
+- Decide scope: new capability vs modify existing capability
+- Pick a unique `change-id`: kebab-case, verb-led (`add-`, `update-`, `remove-`, `refactor-`)
+- Scaffold: `proposal.md`, `tasks.md`, `design.md` (only if needed), and delta specs per affected capability
+- Write deltas: use `## ADDED|MODIFIED|REMOVED|RENAMED Requirements`; include at least one `#### Scenario:` per requirement
+- Validate: `openspec validate [change-id] --strict` and fix issues
+- Request approval: Do not start implementation until proposal is approved
+
+## Three-Stage Workflow
+
+### Stage 1: Creating Changes
+Create proposal when you need to:
+- Add features or functionality
+- Make breaking changes (API, schema)
+- Change architecture or patterns  
+- Optimize performance (changes behavior)
+- Update security patterns
+
+Triggers (examples):
+- "Help me create a change proposal"
+- "Help me plan a change"
+- "Help me create a proposal"
+- "I want to create a spec proposal"
+- "I want to create a spec"
+
+Loose matching guidance:
+- Contains one of: `proposal`, `change`, `spec`
+- With one of: `create`, `plan`, `make`, `start`, `help`
+
+Skip proposal for:
+- Bug fixes (restore intended behavior)
+- Typos, formatting, comments
+- Dependency updates (non-breaking)
+- Configuration changes
+- Tests for existing behavior
+
+**Workflow**
+1. Review `openspec/project.md`, `openspec list`, and `openspec list --specs` to understand current context.
+2. Choose a unique verb-led `change-id` and scaffold `proposal.md`, `tasks.md`, optional `design.md`, and spec deltas under `openspec/changes/<id>/`.
+3. Draft spec deltas using `## ADDED|MODIFIED|REMOVED Requirements` with at least one `#### Scenario:` per requirement.
+4. Run `openspec validate <id> --strict` and resolve any issues before sharing the proposal.
+
+### Stage 2: Implementing Changes
+Track these steps as TODOs and complete them one by one.
+1. **Read proposal.md** - Understand what's being built
+2. **Read design.md** (if exists) - Review technical decisions
+3. **Read tasks.md** - Get implementation checklist
+4. **Implement tasks sequentially** - Complete in order
+5. **Confirm completion** - Ensure every item in `tasks.md` is finished before updating statuses
+6. **Update checklist** - After all work is done, set every task to `- [x]` so the list reflects reality
+7. **Approval gate** - Do not start implementation until the proposal is reviewed and approved
+
+### Stage 3: Archiving Changes
+After deployment, create separate PR to:
+- Move `changes/[name]/` → `changes/archive/YYYY-MM-DD-[name]/`
+- Update `specs/` if capabilities changed
+- Use `openspec archive <change-id> --skip-specs --yes` for tooling-only changes (always pass the change ID explicitly)
+- Run `openspec validate --strict` to confirm the archived change passes checks
+
+## Before Any Task
+
+**Context Checklist:**
+- [ ] Read relevant specs in `specs/[capability]/spec.md`
+- [ ] Check pending changes in `changes/` for conflicts
+- [ ] Read `openspec/project.md` for conventions
+- [ ] Run `openspec list` to see active changes
+- [ ] Run `openspec list --specs` to see existing capabilities
+
+**Before Creating Specs:**
+- Always check if capability already exists
+- Prefer modifying existing specs over creating duplicates
+- Use `openspec show [spec]` to review current state
+- If request is ambiguous, ask 1–2 clarifying questions before scaffolding
+
+### Search Guidance
+- Enumerate specs: `openspec spec list --long` (or `--json` for scripts)
+- Enumerate changes: `openspec list` (or `openspec change list --json` - deprecated but available)
+- Show details:
+  - Spec: `openspec show <spec-id> --type spec` (use `--json` for filters)
+  - Change: `openspec show <change-id> --json --deltas-only`
+- Full-text search (use ripgrep): `rg -n "Requirement:|Scenario:" openspec/specs`
+
+## Quick Start
+
+### CLI Commands
+
+```bash
+# Essential commands
+openspec list                  # List active changes
+openspec list --specs          # List specifications
+openspec show [item]           # Display change or spec
+openspec validate [item]       # Validate changes or specs
+openspec archive <change-id> [--yes|-y]   # Archive after deployment (add --yes for non-interactive runs)
+
+# Project management
+openspec init [path]           # Initialize OpenSpec
+openspec update [path]         # Update instruction files
+
+# Interactive mode
+openspec show                  # Prompts for selection
+openspec validate              # Bulk validation mode
+
+# Debugging
+openspec show [change] --json --deltas-only
+openspec validate [change] --strict
+```
+
+### Command Flags
+
+- `--json` - Machine-readable output
+- `--type change|spec` - Disambiguate items
+- `--strict` - Comprehensive validation
+- `--no-interactive` - Disable prompts
+- `--skip-specs` - Archive without spec updates
+- `--yes`/`-y` - Skip confirmation prompts (non-interactive archive)
+
+## Directory Structure
+
+```
+openspec/
+├── project.md              # Project conventions
+├── specs/                  # Current truth - what IS built
+│   └── [capability]/       # Single focused capability
+│       ├── spec.md         # Requirements and scenarios
+│       └── design.md       # Technical patterns
+├── changes/                # Proposals - what SHOULD change
+│   ├── [change-name]/
+│   │   ├── proposal.md     # Why, what, impact
+│   │   ├── tasks.md        # Implementation checklist
+│   │   ├── design.md       # Technical decisions (optional; see criteria)
+│   │   └── specs/          # Delta changes
+│   │       └── [capability]/
+│   │           └── spec.md # ADDED/MODIFIED/REMOVED
+│   └── archive/            # Completed changes
+```
+
+## Creating Change Proposals
+
+### Decision Tree
+
+```
+New request?
+├─ Bug fix restoring spec behavior? → Fix directly
+├─ Typo/format/comment? → Fix directly  
+├─ New feature/capability? → Create proposal
+├─ Breaking change? → Create proposal
+├─ Architecture change? → Create proposal
+└─ Unclear? → Create proposal (safer)
+```
+
+### Proposal Structure
+
+1. **Create directory:** `changes/[change-id]/` (kebab-case, verb-led, unique)
+
+2. **Write proposal.md:**
+```markdown
+# Change: [Brief description of change]
+
+## Why
+[1-2 sentences on problem/opportunity]
+
+## What Changes
+- [Bullet list of changes]
+- [Mark breaking changes with **BREAKING**]
+
+## Impact
+- Affected specs: [list capabilities]
+- Affected code: [key files/systems]
+```
+
+3. **Create spec deltas:** `specs/[capability]/spec.md`
+```markdown
+## ADDED Requirements
+### Requirement: New Feature
+The system SHALL provide...
+
+#### Scenario: Success case
+- **WHEN** user performs action
+- **THEN** expected result
+
+## MODIFIED Requirements
+### Requirement: Existing Feature
+[Complete modified requirement]
+
+## REMOVED Requirements
+### Requirement: Old Feature
+**Reason**: [Why removing]
+**Migration**: [How to handle]
+```
+If multiple capabilities are affected, create multiple delta files under `changes/[change-id]/specs/<capability>/spec.md`—one per capability.
+
+4. **Create tasks.md:**
+```markdown
+## 1. Implementation
+- [ ] 1.1 Create database schema
+- [ ] 1.2 Implement API endpoint
+- [ ] 1.3 Add frontend component
+- [ ] 1.4 Write tests
+```
+
+5. **Create design.md when needed:**
+Create `design.md` if any of the following apply; otherwise omit it:
+- Cross-cutting change (multiple services/modules) or a new architectural pattern
+- New external dependency or significant data model changes
+- Security, performance, or migration complexity
+- Ambiguity that benefits from technical decisions before coding
+
+Minimal `design.md` skeleton:
+```markdown
+## Context
+[Background, constraints, stakeholders]
+
+## Goals / Non-Goals
+- Goals: [...]
+- Non-Goals: [...]
+
+## Decisions
+- Decision: [What and why]
+- Alternatives considered: [Options + rationale]
+
+## Risks / Trade-offs
+- [Risk] → Mitigation
+
+## Migration Plan
+[Steps, rollback]
+
+## Open Questions
+- [...]
+```
+
+## Spec File Format
+
+### Critical: Scenario Formatting
+
+**CORRECT** (use #### headers):
+```markdown
+#### Scenario: User login success
+- **WHEN** valid credentials provided
+- **THEN** return JWT token
+```
+
+**WRONG** (don't use bullets or bold):
+```markdown
+- **Scenario: User login**  ❌
+**Scenario**: User login     ❌
+### Scenario: User login      ❌
+```
+
+Every requirement MUST have at least one scenario.
+
+### Requirement Wording
+- Use SHALL/MUST for normative requirements (avoid should/may unless intentionally non-normative)
+
+### Delta Operations
+
+- `## ADDED Requirements` - New capabilities
+- `## MODIFIED Requirements` - Changed behavior
+- `## REMOVED Requirements` - Deprecated features
+- `## RENAMED Requirements` - Name changes
+
+Headers matched with `trim(header)` - whitespace ignored.
+
+#### When to use ADDED vs MODIFIED
+- ADDED: Introduces a new capability or sub-capability that can stand alone as a requirement. Prefer ADDED when the change is orthogonal (e.g., adding "Slash Command Configuration") rather than altering the semantics of an existing requirement.
+- MODIFIED: Changes the behavior, scope, or acceptance criteria of an existing requirement. Always paste the full, updated requirement content (header + all scenarios). The archiver will replace the entire requirement with what you provide here; partial deltas will drop previous details.
+- RENAMED: Use when only the name changes. If you also change behavior, use RENAMED (name) plus MODIFIED (content) referencing the new name.
+
+Common pitfall: Using MODIFIED to add a new concern without including the previous text. This causes loss of detail at archive time. If you aren’t explicitly changing the existing requirement, add a new requirement under ADDED instead.
+
+Authoring a MODIFIED requirement correctly:
+1) Locate the existing requirement in `openspec/specs/<capability>/spec.md`.
+2) Copy the entire requirement block (from `### Requirement: ...` through its scenarios).
+3) Paste it under `## MODIFIED Requirements` and edit to reflect the new behavior.
+4) Ensure the header text matches exactly (whitespace-insensitive) and keep at least one `#### Scenario:`.
+
+Example for RENAMED:
+```markdown
+## RENAMED Requirements
+- FROM: `### Requirement: Login`
+- TO: `### Requirement: User Authentication`
+```
+
+## Troubleshooting
+
+### Common Errors
+
+**"Change must have at least one delta"**
+- Check `changes/[name]/specs/` exists with .md files
+- Verify files have operation prefixes (## ADDED Requirements)
+
+**"Requirement must have at least one scenario"**
+- Check scenarios use `#### Scenario:` format (4 hashtags)
+- Don't use bullet points or bold for scenario headers
+
+**Silent scenario parsing failures**
+- Exact format required: `#### Scenario: Name`
+- Debug with: `openspec show [change] --json --deltas-only`
+
+### Validation Tips
+
+```bash
+# Always use strict mode for comprehensive checks
+openspec validate [change] --strict
+
+# Debug delta parsing
+openspec show [change] --json | jq '.deltas'
+
+# Check specific requirement
+openspec show [spec] --json -r 1
+```
+
+## Happy Path Script
+
+```bash
+# 1) Explore current state
+openspec spec list --long
+openspec list
+# Optional full-text search:
+# rg -n "Requirement:|Scenario:" openspec/specs
+# rg -n "^#|Requirement:" openspec/changes
+
+# 2) Choose change id and scaffold
+CHANGE=add-two-factor-auth
+mkdir -p openspec/changes/$CHANGE/{specs/auth}
+printf "## Why\n...\n\n## What Changes\n- ...\n\n## Impact\n- ...\n" > openspec/changes/$CHANGE/proposal.md
+printf "## 1. Implementation\n- [ ] 1.1 ...\n" > openspec/changes/$CHANGE/tasks.md
+
+# 3) Add deltas (example)
+cat > openspec/changes/$CHANGE/specs/auth/spec.md << 'EOF'
+## ADDED Requirements
+### Requirement: Two-Factor Authentication
+Users MUST provide a second factor during login.
+
+#### Scenario: OTP required
+- **WHEN** valid credentials are provided
+- **THEN** an OTP challenge is required
+EOF
+
+# 4) Validate
+openspec validate $CHANGE --strict
+```
+
+## Multi-Capability Example
+
+```
+openspec/changes/add-2fa-notify/
+├── proposal.md
+├── tasks.md
+└── specs/
+    ├── auth/
+    │   └── spec.md   # ADDED: Two-Factor Authentication
+    └── notifications/
+        └── spec.md   # ADDED: OTP email notification
+```
+
+auth/spec.md
+```markdown
+## ADDED Requirements
+### Requirement: Two-Factor Authentication
+...
+```
+
+notifications/spec.md
+```markdown
+## ADDED Requirements
+### Requirement: OTP Email Notification
+...
+```
+
+## Best Practices
+
+### Simplicity First
+- Default to <100 lines of new code
+- Single-file implementations until proven insufficient
+- Avoid frameworks without clear justification
+- Choose boring, proven patterns
+
+### Complexity Triggers
+Only add complexity with:
+- Performance data showing current solution too slow
+- Concrete scale requirements (>1000 users, >100MB data)
+- Multiple proven use cases requiring abstraction
+
+### Clear References
+- Use `file.ts:42` format for code locations
+- Reference specs as `specs/auth/spec.md`
+- Link related changes and PRs
+
+### Capability Naming
+- Use verb-noun: `user-auth`, `payment-capture`
+- Single purpose per capability
+- 10-minute understandability rule
+- Split if description needs "AND"
+
+### Change ID Naming
+- Use kebab-case, short and descriptive: `add-two-factor-auth`
+- Prefer verb-led prefixes: `add-`, `update-`, `remove-`, `refactor-`
+- Ensure uniqueness; if taken, append `-2`, `-3`, etc.
+
+## Tool Selection Guide
+
+| Task | Tool | Why |
+|------|------|-----|
+| Find files by pattern | Glob | Fast pattern matching |
+| Search code content | Grep | Optimized regex search |
+| Read specific files | Read | Direct file access |
+| Explore unknown scope | Task | Multi-step investigation |
+
+## Error Recovery
+
+### Change Conflicts
+1. Run `openspec list` to see active changes
+2. Check for overlapping specs
+3. Coordinate with change owners
+4. Consider combining proposals
+
+### Validation Failures
+1. Run with `--strict` flag
+2. Check JSON output for details
+3. Verify spec file format
+4. Ensure scenarios properly formatted
+
+### Missing Context
+1. Read project.md first
+2. Check related specs
+3. Review recent archives
+4. Ask for clarification
+
+## Quick Reference
+
+### Stage Indicators
+- `changes/` - Proposed, not yet built
+- `specs/` - Built and deployed
+- `archive/` - Completed changes
+
+### File Purposes
+- `proposal.md` - Why and what
+- `tasks.md` - Implementation steps
+- `design.md` - Technical decisions
+- `spec.md` - Requirements and behavior
+
+### CLI Essentials
+```bash
+openspec list              # What's in progress?
+openspec show [item]       # View details
+openspec validate --strict # Is it correct?
+openspec archive <change-id> [--yes|-y]  # Mark complete (add --yes for automation)
+```
+
+Remember: Specs are truth. Changes are proposals. Keep them in sync.
--- a/openspec/changes/add-doc-processing-api/design.md
+++ b/openspec/changes/add-doc-processing-api/design.md
@@ -0,0 +1,107 @@
+## Context
+
+This is the initial implementation of the DocProcesser service. The system integrates multiple external models and services:
+
+- DocLayout-YOLO for document layout analysis
+- PaddleOCR-VL with PP-DocLayoutV2 for text and formula recognition (deployed via vLLM)
+- markdown_2_docx for document conversion
+
+Target deployment: Ubuntu machine with RTX 5080 GPU (16GB VRAM), Python 3.11.0.
+
+## Goals / Non-Goals
+
+**Goals:**
+
+- Clean FastAPI project structure following best practices
+- Image preprocessing with OpenCV (30% padding)
+- Layout-aware OCR routing using DocLayout-YOLO
+- Text and formula recognition via PaddleOCR-VL
+- Markdown to DOCX conversion
+- GPU-enabled Docker deployment
+
+**Non-Goals:**
+
+- Authentication/authorization (can be added later)
+- Rate limiting
+- Persistent storage
+- Training or fine-tuning models
+
+## Decisions
+
+### Project Structure
+
+Follow FastAPI best practices with modular organization:
+
+```
+app/
+├── api/
+│   └── v1/
+│       ├── endpoints/
+│       │   ├── image.py      # Image OCR endpoint
+│       │   └── convert.py    # Markdown to DOCX endpoint
+│       └── router.py
+├── core/
+│   └── config.py             # Settings and environment config
+|—— model/
+|   |—— DocLayout
+|   |—— PP-DocLayout
+├── services/
+│   ├── image_processor.py    # OpenCV preprocessing
+│   ├── layout_detector.py    # DocLayout-YOLO wrapper
+│   ├── ocr_service.py        # PaddleOCR-VL client
+│   └── docx_converter.py     # markdown_2_docx wrapper
+├── schemas/
+│   ├── image.py              # Request/response models for image OCR
+│   └── convert.py            # Request/response models for conversion
+└── main.py                   # FastAPI app initialization
+```
+
+**Rationale:** Separation of concerns between API layer, business logic (services), and data models (schemas).
+
+### Image Preprocessing
+
+- Use OpenCV `cv2.copyMakeBorder()` to add 30% whitespace padding
+- Padding color: white `[255, 255, 255]`
+- This matches DocLayout-YOLO's demo.py pattern
+
+### Layout Detection Flow
+
+1. DocLayout-YOLO detects layout regions (plain text, formulas, tables, figures)
+2. Exsit plain text, routes to PaddleOCR-VL with PP-DocLayoutV2, othewise routes to PaddleOCR-VL with prompt
+3. PaddleOCR-VL combined PP-DocLayoutV2 handles mixed content recognition internally, PaddleOCR-VL combined prompt handles formula
+
+### External Service Integration
+
+- PaddleOCR-VL: Connect to vLLM server at configurable URL (default: `http://localhost:8080/v1`)
+- DocLayout-YOLO: Load model from pre-downloaded path (not downloaded in container)
+
+### Docker Strategy
+
+- Base image: NVIDIA CUDA with Python 3.11
+- Pre-install OpenCV dependencies (`libgl1-mesa-glx`, `libglib2.0-0`)
+- Mount model directory for DocLayout-YOLO weights
+- Expose port 8053
+- Use Uvicorn with multiple workers
+
+## Risks / Trade-offs
+
+| Risk                              | Mitigation                                                         |
+| --------------------------------- | ------------------------------------------------------------------ |
+| PaddleOCR-VL service unavailable  | Health check endpoint, retry logic with exponential backoff        |
+| Large image memory consumption    | Configure max image size, resize before processing                 |
+| DocLayout-YOLO model loading time | Load model once at startup, keep in memory                         |
+| GPU memory contention             | DocLayout-YOLO uses GPU; PaddleOCR-VL runs on separate vLLM server |
+
+## Configuration
+
+Environment variables:
+
+- `PADDLEOCR_VL_URL`: vLLM server URL (default: `http://localhost:8000/v1`)
+- `DOCLAYOUT_MODEL_PATH`: Path to DocLayout-YOLO weights
+- `PP_DOCLAYOUT_MODEL_DIR`: Path to PP-DocLayoutV3 model directory
+- `MAX_IMAGE_SIZE_MB`: Maximum upload size (default: 10)
+
+## Open Questions
+
+- Should we add async queue for large batch processing? (Defer to future change)
+- Do we need WebSocket for progress updates? (Defer to future change)
--- a/openspec/changes/add-doc-processing-api/proposal.md
+++ b/openspec/changes/add-doc-processing-api/proposal.md
@@ -0,0 +1,31 @@
+# Change: Add Document Processing API
+
+## Why
+
+DocProcesser needs a FastAPI backend to accept images (via URL or base64) and convert them to LaTeX/Markdown/MathML, plus a markdown-to-DOCX conversion endpoint. This establishes the core functionality of the project.
+
+## What Changes
+
+- **BREAKING**: Initial project setup (new FastAPI project structure)
+- Add image-to-OCR API endpoint (`POST /doc_process/v1/image/ocr`)
+  - Accept `image_url` or `image_base64` input
+  - Preprocess with OpenCV (30% whitespace padding)
+  - Use DocLayout-YOLO for layout detection
+  - Route to PaddleOCR-VL (with PP-DocLayoutV2) for text/formula recognition
+  - Exists `plain_text` element, use PP-DocLayoutV2 to recognize the image as mixed_recognition , otherwise directly PaddleOCR-VL API combined with prompt Formula Recognition as formula_recognition.
+  - Refrence markdown_2_docx code convert the markdown to latex, mathml for mixed_recognition, convert the latex to markdown, mathml for formula_recognition
+  - Return LaTeX, Markdown, and MathML outputs
+- Add markdown-to-DOCX API endpoint (`POST /doc_process/v1/convert/docx`)
+  - Accept markdown content
+  - Refrence markdown_2_docx library for conversion, the address is http://github.com/YogeLiu/markdown_2_docxdd.
+  - Return DOCX file
+- Add Dockerfile for GPU-enabled deployment (RTX 5080, port 8053)
+
+## Impact
+
+- Affected specs: `image-ocr`, `markdown-docx`
+- Affected code: New project structure under `app/`
+- External dependencies:
+  - DocLayout-YOLO (pre-downloaded model, not fetched in container)
+  - PaddleOCR-VL with vLLM backend (external service at localhost:8080)
+  - markdown_2_docx library
--- a/openspec/changes/add-doc-processing-api/specs/image-ocr/spec.md
+++ b/openspec/changes/add-doc-processing-api/specs/image-ocr/spec.md
@@ -0,0 +1,137 @@
+## ADDED Requirements
+
+### Requirement: Image Input Acceptance
+
+The system SHALL accept images via `POST /api/v1/image/ocr` endpoint with either:
+
+- `image_url`: A publicly accessible URL to the image
+- `image_base64`: Base64-encoded image data
+
+The system SHALL return an error if neither input is provided or if both are provided simultaneously.
+
+#### Scenario: Image URL provided
+
+- **WHEN** a valid `image_url` is provided in the request body
+- **THEN** the system SHALL download the image and process it
+- **AND** return OCR results in the response
+
+#### Scenario: Base64 image provided
+
+- **WHEN** a valid `image_base64` string is provided in the request body
+- **THEN** the system SHALL decode the image and process it
+- **AND** return OCR results in the response
+
+#### Scenario: Invalid input
+
+- **WHEN** neither `image_url` nor `image_base64` is provided
+- **THEN** the system SHALL return HTTP 422 with validation error
+
+---
+
+### Requirement: Image Preprocessing with Padding
+
+The system SHALL preprocess all input images by adding 30% whitespace padding around the image borders using OpenCV.
+
+The padding calculation: `padding = int(max(height, width) * 0.15)` on each side (totaling 30% expansion).
+
+The padding color SHALL be white (`RGB: 255, 255, 255`).
+
+#### Scenario: Image padding applied
+
+- **WHEN** an image of dimensions 1000x800 pixels is received
+- **THEN** the system SHALL add approximately 150 pixels of white padding on each side
+- **AND** the resulting image dimensions SHALL be approximately 1300x1100 pixels
+
+---
+
+### Requirement: Layout Detection with DocLayout-YOLO
+
+The system SHALL use DocLayout-YOLO model to detect document layout regions including:
+
+- Plain text blocks
+- Formulas/equations
+- Tables
+- Figures
+
+The model SHALL be loaded from a pre-configured local path (not downloaded at runtime).
+
+#### Scenario: Layout detection success
+
+- **WHEN** a padded image is passed to DocLayout-YOLO
+- **THEN** the system SHALL return detected regions with bounding boxes and class labels
+- **AND** confidence scores for each detection
+
+#### Scenario: Model not available
+
+- **WHEN** the DocLayout-YOLO model file is not found at the configured path
+- **THEN** the system SHALL fail startup with a clear error message
+
+---
+
+### Requirement: OCR Processing with PaddleOCR-VL
+
+The system SHALL send images to PaddleOCR-VL (via vLLM backend) for text and formula recognition.
+
+PaddleOCR-VL SHALL be configured with PP-DocLayoutV2 for document layout understanding.
+
+The system SHALL handle both plain text and formula/math content.
+
+#### Scenario: Plain text recognition
+
+- **WHEN** DocLayout-YOLO detects plain text regions
+- **THEN** the system SHALL send the image to PaddleOCR-VL
+- **AND** return recognized text content
+
+#### Scenario: Formula recognition
+
+- **WHEN** DocLayout-YOLO detects formula/equation regions
+- **THEN** the system SHALL send the image to PaddleOCR-VL
+- **AND** return formula content in LaTeX format
+
+#### Scenario: Mixed content handling
+
+- **WHEN** DocLayout-YOLO detects both text and formula regions
+- **THEN** the system SHALL process all regions via PaddleOCR-VL with PP-DocLayoutV3
+- **AND** return combined results preserving document structure
+
+#### Scenario: PaddleOCR-VL service unavailable
+
+- **WHEN** the PaddleOCR-VL vLLM server is unreachable
+- **THEN** the system SHALL return HTTP 503 with service unavailable error
+
+---
+
+### Requirement: Multi-Format Output
+
+The system SHALL return OCR results in multiple formats:
+
+- `latex`: LaTeX representation of the content
+- `markdown`: Markdown representation of the content
+- `mathml`: MathML representation for mathematical content
+
+#### Scenario: Successful OCR response
+
+- **WHEN** image processing completes successfully
+- **THEN** the response SHALL include:
+  - `latex`: string containing LaTeX output
+  - `markdown`: string containing Markdown output
+  - `mathml`: string containing MathML output (empty string if no math detected)
+- **AND** HTTP status code SHALL be 200
+
+#### Scenario: Response structure
+
+- **WHEN** the OCR endpoint returns successfully
+- **THEN** the response body SHALL be JSON with structure:
+
+```json
+{
+  "latex": "...",
+  "markdown": "...",
+  "mathml": "...",
+  "layout_info": {
+    "regions": [
+      {"type": "text|formula|table|figure", "bbox": [x1, y1, x2, y2], "confidence": 0.95}
+    ]
+  }
+}
+```
--- a/openspec/changes/add-doc-processing-api/specs/markdown-docx/spec.md
+++ b/openspec/changes/add-doc-processing-api/specs/markdown-docx/spec.md
@@ -0,0 +1,93 @@
+## ADDED Requirements
+
+### Requirement: Markdown Input Acceptance
+
+The system SHALL accept markdown content via `POST /api/v1/convert/docx` endpoint.
+
+The request body SHALL contain:
+- `markdown`: string containing the markdown content to convert
+
+#### Scenario: Valid markdown provided
+
+- **WHEN** valid markdown content is provided in the request body
+- **THEN** the system SHALL process and convert it to DOCX format
+
+#### Scenario: Empty markdown
+
+- **WHEN** an empty `markdown` string is provided
+- **THEN** the system SHALL return HTTP 422 with validation error
+
+---
+
+### Requirement: DOCX Conversion
+
+The system SHALL convert markdown content to DOCX format using the markdown_2_docx library.
+
+The conversion SHALL preserve:
+- Headings (H1-H6)
+- Paragraphs
+- Bold and italic formatting
+- Lists (ordered and unordered)
+- Code blocks
+- Tables
+- Images (if embedded as base64 or accessible URLs)
+
+#### Scenario: Basic markdown conversion
+
+- **WHEN** markdown with headings, paragraphs, and formatting is provided
+- **THEN** the system SHALL generate a valid DOCX file
+- **AND** the DOCX SHALL preserve the document structure
+
+#### Scenario: Complex markdown with tables
+
+- **WHEN** markdown containing tables is provided
+- **THEN** the system SHALL convert tables to Word table format
+- **AND** preserve table structure and content
+
+#### Scenario: Markdown with math formulas
+
+- **WHEN** markdown containing LaTeX math expressions is provided
+- **THEN** the system SHALL convert math to OMML (Office Math Markup Language) format
+- **AND** render correctly in Microsoft Word
+
+---
+
+### Requirement: DOCX File Response
+
+The system SHALL return the generated DOCX file as a binary download.
+
+The response SHALL include:
+- Content-Type: `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
+- Content-Disposition: `attachment; filename="output.docx"`
+
+#### Scenario: Successful conversion response
+
+- **WHEN** markdown conversion completes successfully
+- **THEN** the response SHALL be the DOCX file binary
+- **AND** HTTP status code SHALL be 200
+- **AND** appropriate headers for file download SHALL be set
+
+#### Scenario: Custom filename
+
+- **WHEN** an optional `filename` parameter is provided in the request
+- **THEN** the Content-Disposition header SHALL use the provided filename
+- **AND** append `.docx` extension if not present
+
+---
+
+### Requirement: Error Handling
+
+The system SHALL provide clear error responses for conversion failures.
+
+#### Scenario: Conversion failure
+
+- **WHEN** markdown_2_docx fails to convert the content
+- **THEN** the system SHALL return HTTP 500 with error details
+- **AND** the error message SHALL describe the failure reason
+
+#### Scenario: Malformed markdown
+
+- **WHEN** severely malformed markdown is provided
+- **THEN** the system SHALL attempt best-effort conversion
+- **AND** log a warning about potential formatting issues
+
--- a/openspec/changes/add-doc-processing-api/tasks.md
+++ b/openspec/changes/add-doc-processing-api/tasks.md
@@ -0,0 +1,34 @@
+## 1. Project Scaffolding
+
+- [x] 1.1 Create FastAPI project structure (`app/`, `api/`, `core/`, `services/`, `schemas/`)
+- [x] 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)
+- [x] 1.3 Create `app/main.py` with FastAPI app initialization
+- [x] 1.4 Create `app/core/config.py` with Pydantic Settings
+
+## 2. Image OCR API
+
+- [x] 2.1 Create request/response schemas in `app/schemas/image.py`
+- [x] 2.2 Implement image preprocessing service with OpenCV padding (`app/services/image_processor.py`)
+- [x] 2.3 Implement DocLayout-YOLO wrapper (`app/services/layout_detector.py`)
+- [x] 2.4 Implement PaddleOCR-VL client (`app/services/ocr_service.py`)
+- [x] 2.5 Create image OCR endpoint (`app/api/v1/endpoints/image.py`)
+- [x] 2.6 Wire up router and test endpoint
+
+## 3. Markdown to DOCX API
+
+- [x] 3.1 Create request/response schemas in `app/schemas/convert.py`
+- [x] 3.2 Integrate markdown_2_docx library (`app/services/docx_converter.py`)
+- [x] 3.3 Create conversion endpoint (`app/api/v1/endpoints/convert.py`)
+- [x] 3.4 Wire up router and test endpoint
+
+## 4. Deployment
+
+- [x] 4.1 Create Dockerfile with CUDA base image for RTX 5080
+- [x] 4.2 Create docker-compose.yml (optional, for local development)
+- [x] 4.3 Document deployment steps in README
+
+## 5. Validation
+
+- [ ] 5.1 Test image OCR endpoint with sample images
+- [ ] 5.2 Test markdown to DOCX conversion
+- [ ] 5.3 Verify Docker build and GPU access
--- a/openspec/project.md
+++ b/openspec/project.md
@@ -0,0 +1,42 @@
+# Project Context
+
+## Purpose
+
+This project is DocProcesser which can process the image to latex, markdown, mathml, omml, ect.
+It is a fastapi web project, it accept the request from upstream and process the image or send the image to the third-part, then return the result to upstream.
+
+## Tech Stack
+
+- python
+- fastapi
+
+## Project Conventions
+
+### Code Style
+
+[Describe your code style preferences, formatting rules, and naming conventions]
+
+### Architecture Patterns
+
+[Document your architectural decisions and patterns]
+
+### Testing Strategy
+
+[Explain your testing approach and requirements]
+
+### Git Workflow
+
+[Describe your branching strategy and commit conventions]
+
+## Domain Context
+
+- DocLayout
+  A YOLO model which can recognize the document layout (Book, Paper, NewPapers) will be used to recongize if has plain text in a image.
+
+## Important Constraints
+
+[List any technical, business, or regulatory constraints]
+
+## External Dependencies
+
+[Document key external services, APIs, or systems]