init repo

This commit is contained in:
liuyuanchuang
2025-12-29 17:34:58 +08:00
commit 874fd383cc
36 changed files with 2641 additions and 0 deletions

456
openspec/AGENTS.md Normal file
View File

@@ -0,0 +1,456 @@
# OpenSpec Instructions
Instructions for AI coding assistants using OpenSpec for spec-driven development.
## TL;DR Quick Checklist
- Search existing work: `openspec spec list --long`, `openspec list` (use `rg` only for full-text search)
- Decide scope: new capability vs modify existing capability
- Pick a unique `change-id`: kebab-case, verb-led (`add-`, `update-`, `remove-`, `refactor-`)
- Scaffold: `proposal.md`, `tasks.md`, `design.md` (only if needed), and delta specs per affected capability
- Write deltas: use `## ADDED|MODIFIED|REMOVED|RENAMED Requirements`; include at least one `#### Scenario:` per requirement
- Validate: `openspec validate [change-id] --strict` and fix issues
- Request approval: Do not start implementation until proposal is approved
## Three-Stage Workflow
### Stage 1: Creating Changes
Create proposal when you need to:
- Add features or functionality
- Make breaking changes (API, schema)
- Change architecture or patterns
- Optimize performance (changes behavior)
- Update security patterns
Triggers (examples):
- "Help me create a change proposal"
- "Help me plan a change"
- "Help me create a proposal"
- "I want to create a spec proposal"
- "I want to create a spec"
Loose matching guidance:
- Contains one of: `proposal`, `change`, `spec`
- With one of: `create`, `plan`, `make`, `start`, `help`
Skip proposal for:
- Bug fixes (restore intended behavior)
- Typos, formatting, comments
- Dependency updates (non-breaking)
- Configuration changes
- Tests for existing behavior
**Workflow**
1. Review `openspec/project.md`, `openspec list`, and `openspec list --specs` to understand current context.
2. Choose a unique verb-led `change-id` and scaffold `proposal.md`, `tasks.md`, optional `design.md`, and spec deltas under `openspec/changes/<id>/`.
3. Draft spec deltas using `## ADDED|MODIFIED|REMOVED Requirements` with at least one `#### Scenario:` per requirement.
4. Run `openspec validate <id> --strict` and resolve any issues before sharing the proposal.
### Stage 2: Implementing Changes
Track these steps as TODOs and complete them one by one.
1. **Read proposal.md** - Understand what's being built
2. **Read design.md** (if exists) - Review technical decisions
3. **Read tasks.md** - Get implementation checklist
4. **Implement tasks sequentially** - Complete in order
5. **Confirm completion** - Ensure every item in `tasks.md` is finished before updating statuses
6. **Update checklist** - After all work is done, set every task to `- [x]` so the list reflects reality
7. **Approval gate** - Do not start implementation until the proposal is reviewed and approved
### Stage 3: Archiving Changes
After deployment, create separate PR to:
- Move `changes/[name]/``changes/archive/YYYY-MM-DD-[name]/`
- Update `specs/` if capabilities changed
- Use `openspec archive <change-id> --skip-specs --yes` for tooling-only changes (always pass the change ID explicitly)
- Run `openspec validate --strict` to confirm the archived change passes checks
## Before Any Task
**Context Checklist:**
- [ ] Read relevant specs in `specs/[capability]/spec.md`
- [ ] Check pending changes in `changes/` for conflicts
- [ ] Read `openspec/project.md` for conventions
- [ ] Run `openspec list` to see active changes
- [ ] Run `openspec list --specs` to see existing capabilities
**Before Creating Specs:**
- Always check if capability already exists
- Prefer modifying existing specs over creating duplicates
- Use `openspec show [spec]` to review current state
- If request is ambiguous, ask 12 clarifying questions before scaffolding
### Search Guidance
- Enumerate specs: `openspec spec list --long` (or `--json` for scripts)
- Enumerate changes: `openspec list` (or `openspec change list --json` - deprecated but available)
- Show details:
- Spec: `openspec show <spec-id> --type spec` (use `--json` for filters)
- Change: `openspec show <change-id> --json --deltas-only`
- Full-text search (use ripgrep): `rg -n "Requirement:|Scenario:" openspec/specs`
## Quick Start
### CLI Commands
```bash
# Essential commands
openspec list # List active changes
openspec list --specs # List specifications
openspec show [item] # Display change or spec
openspec validate [item] # Validate changes or specs
openspec archive <change-id> [--yes|-y] # Archive after deployment (add --yes for non-interactive runs)
# Project management
openspec init [path] # Initialize OpenSpec
openspec update [path] # Update instruction files
# Interactive mode
openspec show # Prompts for selection
openspec validate # Bulk validation mode
# Debugging
openspec show [change] --json --deltas-only
openspec validate [change] --strict
```
### Command Flags
- `--json` - Machine-readable output
- `--type change|spec` - Disambiguate items
- `--strict` - Comprehensive validation
- `--no-interactive` - Disable prompts
- `--skip-specs` - Archive without spec updates
- `--yes`/`-y` - Skip confirmation prompts (non-interactive archive)
## Directory Structure
```
openspec/
├── project.md # Project conventions
├── specs/ # Current truth - what IS built
│ └── [capability]/ # Single focused capability
│ ├── spec.md # Requirements and scenarios
│ └── design.md # Technical patterns
├── changes/ # Proposals - what SHOULD change
│ ├── [change-name]/
│ │ ├── proposal.md # Why, what, impact
│ │ ├── tasks.md # Implementation checklist
│ │ ├── design.md # Technical decisions (optional; see criteria)
│ │ └── specs/ # Delta changes
│ │ └── [capability]/
│ │ └── spec.md # ADDED/MODIFIED/REMOVED
│ └── archive/ # Completed changes
```
## Creating Change Proposals
### Decision Tree
```
New request?
├─ Bug fix restoring spec behavior? → Fix directly
├─ Typo/format/comment? → Fix directly
├─ New feature/capability? → Create proposal
├─ Breaking change? → Create proposal
├─ Architecture change? → Create proposal
└─ Unclear? → Create proposal (safer)
```
### Proposal Structure
1. **Create directory:** `changes/[change-id]/` (kebab-case, verb-led, unique)
2. **Write proposal.md:**
```markdown
# Change: [Brief description of change]
## Why
[1-2 sentences on problem/opportunity]
## What Changes
- [Bullet list of changes]
- [Mark breaking changes with **BREAKING**]
## Impact
- Affected specs: [list capabilities]
- Affected code: [key files/systems]
```
3. **Create spec deltas:** `specs/[capability]/spec.md`
```markdown
## ADDED Requirements
### Requirement: New Feature
The system SHALL provide...
#### Scenario: Success case
- **WHEN** user performs action
- **THEN** expected result
## MODIFIED Requirements
### Requirement: Existing Feature
[Complete modified requirement]
## REMOVED Requirements
### Requirement: Old Feature
**Reason**: [Why removing]
**Migration**: [How to handle]
```
If multiple capabilities are affected, create multiple delta files under `changes/[change-id]/specs/<capability>/spec.md`—one per capability.
4. **Create tasks.md:**
```markdown
## 1. Implementation
- [ ] 1.1 Create database schema
- [ ] 1.2 Implement API endpoint
- [ ] 1.3 Add frontend component
- [ ] 1.4 Write tests
```
5. **Create design.md when needed:**
Create `design.md` if any of the following apply; otherwise omit it:
- Cross-cutting change (multiple services/modules) or a new architectural pattern
- New external dependency or significant data model changes
- Security, performance, or migration complexity
- Ambiguity that benefits from technical decisions before coding
Minimal `design.md` skeleton:
```markdown
## Context
[Background, constraints, stakeholders]
## Goals / Non-Goals
- Goals: [...]
- Non-Goals: [...]
## Decisions
- Decision: [What and why]
- Alternatives considered: [Options + rationale]
## Risks / Trade-offs
- [Risk] → Mitigation
## Migration Plan
[Steps, rollback]
## Open Questions
- [...]
```
## Spec File Format
### Critical: Scenario Formatting
**CORRECT** (use #### headers):
```markdown
#### Scenario: User login success
- **WHEN** valid credentials provided
- **THEN** return JWT token
```
**WRONG** (don't use bullets or bold):
```markdown
- **Scenario: User login** ❌
**Scenario**: User login ❌
### Scenario: User login ❌
```
Every requirement MUST have at least one scenario.
### Requirement Wording
- Use SHALL/MUST for normative requirements (avoid should/may unless intentionally non-normative)
### Delta Operations
- `## ADDED Requirements` - New capabilities
- `## MODIFIED Requirements` - Changed behavior
- `## REMOVED Requirements` - Deprecated features
- `## RENAMED Requirements` - Name changes
Headers matched with `trim(header)` - whitespace ignored.
#### When to use ADDED vs MODIFIED
- ADDED: Introduces a new capability or sub-capability that can stand alone as a requirement. Prefer ADDED when the change is orthogonal (e.g., adding "Slash Command Configuration") rather than altering the semantics of an existing requirement.
- MODIFIED: Changes the behavior, scope, or acceptance criteria of an existing requirement. Always paste the full, updated requirement content (header + all scenarios). The archiver will replace the entire requirement with what you provide here; partial deltas will drop previous details.
- RENAMED: Use when only the name changes. If you also change behavior, use RENAMED (name) plus MODIFIED (content) referencing the new name.
Common pitfall: Using MODIFIED to add a new concern without including the previous text. This causes loss of detail at archive time. If you arent explicitly changing the existing requirement, add a new requirement under ADDED instead.
Authoring a MODIFIED requirement correctly:
1) Locate the existing requirement in `openspec/specs/<capability>/spec.md`.
2) Copy the entire requirement block (from `### Requirement: ...` through its scenarios).
3) Paste it under `## MODIFIED Requirements` and edit to reflect the new behavior.
4) Ensure the header text matches exactly (whitespace-insensitive) and keep at least one `#### Scenario:`.
Example for RENAMED:
```markdown
## RENAMED Requirements
- FROM: `### Requirement: Login`
- TO: `### Requirement: User Authentication`
```
## Troubleshooting
### Common Errors
**"Change must have at least one delta"**
- Check `changes/[name]/specs/` exists with .md files
- Verify files have operation prefixes (## ADDED Requirements)
**"Requirement must have at least one scenario"**
- Check scenarios use `#### Scenario:` format (4 hashtags)
- Don't use bullet points or bold for scenario headers
**Silent scenario parsing failures**
- Exact format required: `#### Scenario: Name`
- Debug with: `openspec show [change] --json --deltas-only`
### Validation Tips
```bash
# Always use strict mode for comprehensive checks
openspec validate [change] --strict
# Debug delta parsing
openspec show [change] --json | jq '.deltas'
# Check specific requirement
openspec show [spec] --json -r 1
```
## Happy Path Script
```bash
# 1) Explore current state
openspec spec list --long
openspec list
# Optional full-text search:
# rg -n "Requirement:|Scenario:" openspec/specs
# rg -n "^#|Requirement:" openspec/changes
# 2) Choose change id and scaffold
CHANGE=add-two-factor-auth
mkdir -p openspec/changes/$CHANGE/{specs/auth}
printf "## Why\n...\n\n## What Changes\n- ...\n\n## Impact\n- ...\n" > openspec/changes/$CHANGE/proposal.md
printf "## 1. Implementation\n- [ ] 1.1 ...\n" > openspec/changes/$CHANGE/tasks.md
# 3) Add deltas (example)
cat > openspec/changes/$CHANGE/specs/auth/spec.md << 'EOF'
## ADDED Requirements
### Requirement: Two-Factor Authentication
Users MUST provide a second factor during login.
#### Scenario: OTP required
- **WHEN** valid credentials are provided
- **THEN** an OTP challenge is required
EOF
# 4) Validate
openspec validate $CHANGE --strict
```
## Multi-Capability Example
```
openspec/changes/add-2fa-notify/
├── proposal.md
├── tasks.md
└── specs/
├── auth/
│ └── spec.md # ADDED: Two-Factor Authentication
└── notifications/
└── spec.md # ADDED: OTP email notification
```
auth/spec.md
```markdown
## ADDED Requirements
### Requirement: Two-Factor Authentication
...
```
notifications/spec.md
```markdown
## ADDED Requirements
### Requirement: OTP Email Notification
...
```
## Best Practices
### Simplicity First
- Default to <100 lines of new code
- Single-file implementations until proven insufficient
- Avoid frameworks without clear justification
- Choose boring, proven patterns
### Complexity Triggers
Only add complexity with:
- Performance data showing current solution too slow
- Concrete scale requirements (>1000 users, >100MB data)
- Multiple proven use cases requiring abstraction
### Clear References
- Use `file.ts:42` format for code locations
- Reference specs as `specs/auth/spec.md`
- Link related changes and PRs
### Capability Naming
- Use verb-noun: `user-auth`, `payment-capture`
- Single purpose per capability
- 10-minute understandability rule
- Split if description needs "AND"
### Change ID Naming
- Use kebab-case, short and descriptive: `add-two-factor-auth`
- Prefer verb-led prefixes: `add-`, `update-`, `remove-`, `refactor-`
- Ensure uniqueness; if taken, append `-2`, `-3`, etc.
## Tool Selection Guide
| Task | Tool | Why |
|------|------|-----|
| Find files by pattern | Glob | Fast pattern matching |
| Search code content | Grep | Optimized regex search |
| Read specific files | Read | Direct file access |
| Explore unknown scope | Task | Multi-step investigation |
## Error Recovery
### Change Conflicts
1. Run `openspec list` to see active changes
2. Check for overlapping specs
3. Coordinate with change owners
4. Consider combining proposals
### Validation Failures
1. Run with `--strict` flag
2. Check JSON output for details
3. Verify spec file format
4. Ensure scenarios properly formatted
### Missing Context
1. Read project.md first
2. Check related specs
3. Review recent archives
4. Ask for clarification
## Quick Reference
### Stage Indicators
- `changes/` - Proposed, not yet built
- `specs/` - Built and deployed
- `archive/` - Completed changes
### File Purposes
- `proposal.md` - Why and what
- `tasks.md` - Implementation steps
- `design.md` - Technical decisions
- `spec.md` - Requirements and behavior
### CLI Essentials
```bash
openspec list # What's in progress?
openspec show [item] # View details
openspec validate --strict # Is it correct?
openspec archive <change-id> [--yes|-y] # Mark complete (add --yes for automation)
```
Remember: Specs are truth. Changes are proposals. Keep them in sync.

View File

@@ -0,0 +1,107 @@
## Context
This is the initial implementation of the DocProcesser service. The system integrates multiple external models and services:
- DocLayout-YOLO for document layout analysis
- PaddleOCR-VL with PP-DocLayoutV2 for text and formula recognition (deployed via vLLM)
- markdown_2_docx for document conversion
Target deployment: Ubuntu machine with RTX 5080 GPU (16GB VRAM), Python 3.11.0.
## Goals / Non-Goals
**Goals:**
- Clean FastAPI project structure following best practices
- Image preprocessing with OpenCV (30% padding)
- Layout-aware OCR routing using DocLayout-YOLO
- Text and formula recognition via PaddleOCR-VL
- Markdown to DOCX conversion
- GPU-enabled Docker deployment
**Non-Goals:**
- Authentication/authorization (can be added later)
- Rate limiting
- Persistent storage
- Training or fine-tuning models
## Decisions
### Project Structure
Follow FastAPI best practices with modular organization:
```
app/
├── api/
│ └── v1/
│ ├── endpoints/
│ │ ├── image.py # Image OCR endpoint
│ │ └── convert.py # Markdown to DOCX endpoint
│ └── router.py
├── core/
│ └── config.py # Settings and environment config
|—— model/
| |—— DocLayout
| |—— PP-DocLayout
├── services/
│ ├── image_processor.py # OpenCV preprocessing
│ ├── layout_detector.py # DocLayout-YOLO wrapper
│ ├── ocr_service.py # PaddleOCR-VL client
│ └── docx_converter.py # markdown_2_docx wrapper
├── schemas/
│ ├── image.py # Request/response models for image OCR
│ └── convert.py # Request/response models for conversion
└── main.py # FastAPI app initialization
```
**Rationale:** Separation of concerns between API layer, business logic (services), and data models (schemas).
### Image Preprocessing
- Use OpenCV `cv2.copyMakeBorder()` to add 30% whitespace padding
- Padding color: white `[255, 255, 255]`
- This matches DocLayout-YOLO's demo.py pattern
### Layout Detection Flow
1. DocLayout-YOLO detects layout regions (plain text, formulas, tables, figures)
2. Exsit plain text, routes to PaddleOCR-VL with PP-DocLayoutV2, othewise routes to PaddleOCR-VL with prompt
3. PaddleOCR-VL combined PP-DocLayoutV2 handles mixed content recognition internally, PaddleOCR-VL combined prompt handles formula
### External Service Integration
- PaddleOCR-VL: Connect to vLLM server at configurable URL (default: `http://localhost:8080/v1`)
- DocLayout-YOLO: Load model from pre-downloaded path (not downloaded in container)
### Docker Strategy
- Base image: NVIDIA CUDA with Python 3.11
- Pre-install OpenCV dependencies (`libgl1-mesa-glx`, `libglib2.0-0`)
- Mount model directory for DocLayout-YOLO weights
- Expose port 8053
- Use Uvicorn with multiple workers
## Risks / Trade-offs
| Risk | Mitigation |
| --------------------------------- | ------------------------------------------------------------------ |
| PaddleOCR-VL service unavailable | Health check endpoint, retry logic with exponential backoff |
| Large image memory consumption | Configure max image size, resize before processing |
| DocLayout-YOLO model loading time | Load model once at startup, keep in memory |
| GPU memory contention | DocLayout-YOLO uses GPU; PaddleOCR-VL runs on separate vLLM server |
## Configuration
Environment variables:
- `PADDLEOCR_VL_URL`: vLLM server URL (default: `http://localhost:8000/v1`)
- `DOCLAYOUT_MODEL_PATH`: Path to DocLayout-YOLO weights
- `PP_DOCLAYOUT_MODEL_DIR`: Path to PP-DocLayoutV3 model directory
- `MAX_IMAGE_SIZE_MB`: Maximum upload size (default: 10)
## Open Questions
- Should we add async queue for large batch processing? (Defer to future change)
- Do we need WebSocket for progress updates? (Defer to future change)

View File

@@ -0,0 +1,31 @@
# Change: Add Document Processing API
## Why
DocProcesser needs a FastAPI backend to accept images (via URL or base64) and convert them to LaTeX/Markdown/MathML, plus a markdown-to-DOCX conversion endpoint. This establishes the core functionality of the project.
## What Changes
- **BREAKING**: Initial project setup (new FastAPI project structure)
- Add image-to-OCR API endpoint (`POST /doc_process/v1/image/ocr`)
- Accept `image_url` or `image_base64` input
- Preprocess with OpenCV (30% whitespace padding)
- Use DocLayout-YOLO for layout detection
- Route to PaddleOCR-VL (with PP-DocLayoutV2) for text/formula recognition
- Exists `plain_text` element, use PP-DocLayoutV2 to recognize the image as mixed_recognition , otherwise directly PaddleOCR-VL API combined with prompt Formula Recognition as formula_recognition.
- Refrence markdown_2_docx code convert the markdown to latex, mathml for mixed_recognition, convert the latex to markdown, mathml for formula_recognition
- Return LaTeX, Markdown, and MathML outputs
- Add markdown-to-DOCX API endpoint (`POST /doc_process/v1/convert/docx`)
- Accept markdown content
- Refrence markdown_2_docx library for conversion, the address is http://github.com/YogeLiu/markdown_2_docxdd.
- Return DOCX file
- Add Dockerfile for GPU-enabled deployment (RTX 5080, port 8053)
## Impact
- Affected specs: `image-ocr`, `markdown-docx`
- Affected code: New project structure under `app/`
- External dependencies:
- DocLayout-YOLO (pre-downloaded model, not fetched in container)
- PaddleOCR-VL with vLLM backend (external service at localhost:8080)
- markdown_2_docx library

View File

@@ -0,0 +1,137 @@
## ADDED Requirements
### Requirement: Image Input Acceptance
The system SHALL accept images via `POST /api/v1/image/ocr` endpoint with either:
- `image_url`: A publicly accessible URL to the image
- `image_base64`: Base64-encoded image data
The system SHALL return an error if neither input is provided or if both are provided simultaneously.
#### Scenario: Image URL provided
- **WHEN** a valid `image_url` is provided in the request body
- **THEN** the system SHALL download the image and process it
- **AND** return OCR results in the response
#### Scenario: Base64 image provided
- **WHEN** a valid `image_base64` string is provided in the request body
- **THEN** the system SHALL decode the image and process it
- **AND** return OCR results in the response
#### Scenario: Invalid input
- **WHEN** neither `image_url` nor `image_base64` is provided
- **THEN** the system SHALL return HTTP 422 with validation error
---
### Requirement: Image Preprocessing with Padding
The system SHALL preprocess all input images by adding 30% whitespace padding around the image borders using OpenCV.
The padding calculation: `padding = int(max(height, width) * 0.15)` on each side (totaling 30% expansion).
The padding color SHALL be white (`RGB: 255, 255, 255`).
#### Scenario: Image padding applied
- **WHEN** an image of dimensions 1000x800 pixels is received
- **THEN** the system SHALL add approximately 150 pixels of white padding on each side
- **AND** the resulting image dimensions SHALL be approximately 1300x1100 pixels
---
### Requirement: Layout Detection with DocLayout-YOLO
The system SHALL use DocLayout-YOLO model to detect document layout regions including:
- Plain text blocks
- Formulas/equations
- Tables
- Figures
The model SHALL be loaded from a pre-configured local path (not downloaded at runtime).
#### Scenario: Layout detection success
- **WHEN** a padded image is passed to DocLayout-YOLO
- **THEN** the system SHALL return detected regions with bounding boxes and class labels
- **AND** confidence scores for each detection
#### Scenario: Model not available
- **WHEN** the DocLayout-YOLO model file is not found at the configured path
- **THEN** the system SHALL fail startup with a clear error message
---
### Requirement: OCR Processing with PaddleOCR-VL
The system SHALL send images to PaddleOCR-VL (via vLLM backend) for text and formula recognition.
PaddleOCR-VL SHALL be configured with PP-DocLayoutV2 for document layout understanding.
The system SHALL handle both plain text and formula/math content.
#### Scenario: Plain text recognition
- **WHEN** DocLayout-YOLO detects plain text regions
- **THEN** the system SHALL send the image to PaddleOCR-VL
- **AND** return recognized text content
#### Scenario: Formula recognition
- **WHEN** DocLayout-YOLO detects formula/equation regions
- **THEN** the system SHALL send the image to PaddleOCR-VL
- **AND** return formula content in LaTeX format
#### Scenario: Mixed content handling
- **WHEN** DocLayout-YOLO detects both text and formula regions
- **THEN** the system SHALL process all regions via PaddleOCR-VL with PP-DocLayoutV3
- **AND** return combined results preserving document structure
#### Scenario: PaddleOCR-VL service unavailable
- **WHEN** the PaddleOCR-VL vLLM server is unreachable
- **THEN** the system SHALL return HTTP 503 with service unavailable error
---
### Requirement: Multi-Format Output
The system SHALL return OCR results in multiple formats:
- `latex`: LaTeX representation of the content
- `markdown`: Markdown representation of the content
- `mathml`: MathML representation for mathematical content
#### Scenario: Successful OCR response
- **WHEN** image processing completes successfully
- **THEN** the response SHALL include:
- `latex`: string containing LaTeX output
- `markdown`: string containing Markdown output
- `mathml`: string containing MathML output (empty string if no math detected)
- **AND** HTTP status code SHALL be 200
#### Scenario: Response structure
- **WHEN** the OCR endpoint returns successfully
- **THEN** the response body SHALL be JSON with structure:
```json
{
"latex": "...",
"markdown": "...",
"mathml": "...",
"layout_info": {
"regions": [
{"type": "text|formula|table|figure", "bbox": [x1, y1, x2, y2], "confidence": 0.95}
]
}
}
```

View File

@@ -0,0 +1,93 @@
## ADDED Requirements
### Requirement: Markdown Input Acceptance
The system SHALL accept markdown content via `POST /api/v1/convert/docx` endpoint.
The request body SHALL contain:
- `markdown`: string containing the markdown content to convert
#### Scenario: Valid markdown provided
- **WHEN** valid markdown content is provided in the request body
- **THEN** the system SHALL process and convert it to DOCX format
#### Scenario: Empty markdown
- **WHEN** an empty `markdown` string is provided
- **THEN** the system SHALL return HTTP 422 with validation error
---
### Requirement: DOCX Conversion
The system SHALL convert markdown content to DOCX format using the markdown_2_docx library.
The conversion SHALL preserve:
- Headings (H1-H6)
- Paragraphs
- Bold and italic formatting
- Lists (ordered and unordered)
- Code blocks
- Tables
- Images (if embedded as base64 or accessible URLs)
#### Scenario: Basic markdown conversion
- **WHEN** markdown with headings, paragraphs, and formatting is provided
- **THEN** the system SHALL generate a valid DOCX file
- **AND** the DOCX SHALL preserve the document structure
#### Scenario: Complex markdown with tables
- **WHEN** markdown containing tables is provided
- **THEN** the system SHALL convert tables to Word table format
- **AND** preserve table structure and content
#### Scenario: Markdown with math formulas
- **WHEN** markdown containing LaTeX math expressions is provided
- **THEN** the system SHALL convert math to OMML (Office Math Markup Language) format
- **AND** render correctly in Microsoft Word
---
### Requirement: DOCX File Response
The system SHALL return the generated DOCX file as a binary download.
The response SHALL include:
- Content-Type: `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
- Content-Disposition: `attachment; filename="output.docx"`
#### Scenario: Successful conversion response
- **WHEN** markdown conversion completes successfully
- **THEN** the response SHALL be the DOCX file binary
- **AND** HTTP status code SHALL be 200
- **AND** appropriate headers for file download SHALL be set
#### Scenario: Custom filename
- **WHEN** an optional `filename` parameter is provided in the request
- **THEN** the Content-Disposition header SHALL use the provided filename
- **AND** append `.docx` extension if not present
---
### Requirement: Error Handling
The system SHALL provide clear error responses for conversion failures.
#### Scenario: Conversion failure
- **WHEN** markdown_2_docx fails to convert the content
- **THEN** the system SHALL return HTTP 500 with error details
- **AND** the error message SHALL describe the failure reason
#### Scenario: Malformed markdown
- **WHEN** severely malformed markdown is provided
- **THEN** the system SHALL attempt best-effort conversion
- **AND** log a warning about potential formatting issues

View File

@@ -0,0 +1,34 @@
## 1. Project Scaffolding
- [x] 1.1 Create FastAPI project structure (`app/`, `api/`, `core/`, `services/`, `schemas/`)
- [x] 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)
- [x] 1.3 Create `app/main.py` with FastAPI app initialization
- [x] 1.4 Create `app/core/config.py` with Pydantic Settings
## 2. Image OCR API
- [x] 2.1 Create request/response schemas in `app/schemas/image.py`
- [x] 2.2 Implement image preprocessing service with OpenCV padding (`app/services/image_processor.py`)
- [x] 2.3 Implement DocLayout-YOLO wrapper (`app/services/layout_detector.py`)
- [x] 2.4 Implement PaddleOCR-VL client (`app/services/ocr_service.py`)
- [x] 2.5 Create image OCR endpoint (`app/api/v1/endpoints/image.py`)
- [x] 2.6 Wire up router and test endpoint
## 3. Markdown to DOCX API
- [x] 3.1 Create request/response schemas in `app/schemas/convert.py`
- [x] 3.2 Integrate markdown_2_docx library (`app/services/docx_converter.py`)
- [x] 3.3 Create conversion endpoint (`app/api/v1/endpoints/convert.py`)
- [x] 3.4 Wire up router and test endpoint
## 4. Deployment
- [x] 4.1 Create Dockerfile with CUDA base image for RTX 5080
- [x] 4.2 Create docker-compose.yml (optional, for local development)
- [x] 4.3 Document deployment steps in README
## 5. Validation
- [ ] 5.1 Test image OCR endpoint with sample images
- [ ] 5.2 Test markdown to DOCX conversion
- [ ] 5.3 Verify Docker build and GPU access

42
openspec/project.md Normal file
View File

@@ -0,0 +1,42 @@
# Project Context
## Purpose
This project is DocProcesser which can process the image to latex, markdown, mathml, omml, ect.
It is a fastapi web project, it accept the request from upstream and process the image or send the image to the third-part, then return the result to upstream.
## Tech Stack
- python
- fastapi
## Project Conventions
### Code Style
[Describe your code style preferences, formatting rules, and naming conventions]
### Architecture Patterns
[Document your architectural decisions and patterns]
### Testing Strategy
[Explain your testing approach and requirements]
### Git Workflow
[Describe your branching strategy and commit conventions]
## Domain Context
- DocLayout
A YOLO model which can recognize the document layout (Book, Paper, NewPapers) will be used to recongize if has plain text in a image.
## Important Constraints
[List any technical, business, or regulatory constraints]
## External Dependencies
[Document key external services, APIs, or systems]