init repo
This commit is contained in:
456
openspec/AGENTS.md
Normal file
456
openspec/AGENTS.md
Normal file
@@ -0,0 +1,456 @@
|
||||
# OpenSpec Instructions
|
||||
|
||||
Instructions for AI coding assistants using OpenSpec for spec-driven development.
|
||||
|
||||
## TL;DR Quick Checklist
|
||||
|
||||
- Search existing work: `openspec spec list --long`, `openspec list` (use `rg` only for full-text search)
|
||||
- Decide scope: new capability vs modify existing capability
|
||||
- Pick a unique `change-id`: kebab-case, verb-led (`add-`, `update-`, `remove-`, `refactor-`)
|
||||
- Scaffold: `proposal.md`, `tasks.md`, `design.md` (only if needed), and delta specs per affected capability
|
||||
- Write deltas: use `## ADDED|MODIFIED|REMOVED|RENAMED Requirements`; include at least one `#### Scenario:` per requirement
|
||||
- Validate: `openspec validate [change-id] --strict` and fix issues
|
||||
- Request approval: Do not start implementation until proposal is approved
|
||||
|
||||
## Three-Stage Workflow
|
||||
|
||||
### Stage 1: Creating Changes
|
||||
Create proposal when you need to:
|
||||
- Add features or functionality
|
||||
- Make breaking changes (API, schema)
|
||||
- Change architecture or patterns
|
||||
- Optimize performance (changes behavior)
|
||||
- Update security patterns
|
||||
|
||||
Triggers (examples):
|
||||
- "Help me create a change proposal"
|
||||
- "Help me plan a change"
|
||||
- "Help me create a proposal"
|
||||
- "I want to create a spec proposal"
|
||||
- "I want to create a spec"
|
||||
|
||||
Loose matching guidance:
|
||||
- Contains one of: `proposal`, `change`, `spec`
|
||||
- With one of: `create`, `plan`, `make`, `start`, `help`
|
||||
|
||||
Skip proposal for:
|
||||
- Bug fixes (restore intended behavior)
|
||||
- Typos, formatting, comments
|
||||
- Dependency updates (non-breaking)
|
||||
- Configuration changes
|
||||
- Tests for existing behavior
|
||||
|
||||
**Workflow**
|
||||
1. Review `openspec/project.md`, `openspec list`, and `openspec list --specs` to understand current context.
|
||||
2. Choose a unique verb-led `change-id` and scaffold `proposal.md`, `tasks.md`, optional `design.md`, and spec deltas under `openspec/changes/<id>/`.
|
||||
3. Draft spec deltas using `## ADDED|MODIFIED|REMOVED Requirements` with at least one `#### Scenario:` per requirement.
|
||||
4. Run `openspec validate <id> --strict` and resolve any issues before sharing the proposal.
|
||||
|
||||
### Stage 2: Implementing Changes
|
||||
Track these steps as TODOs and complete them one by one.
|
||||
1. **Read proposal.md** - Understand what's being built
|
||||
2. **Read design.md** (if exists) - Review technical decisions
|
||||
3. **Read tasks.md** - Get implementation checklist
|
||||
4. **Implement tasks sequentially** - Complete in order
|
||||
5. **Confirm completion** - Ensure every item in `tasks.md` is finished before updating statuses
|
||||
6. **Update checklist** - After all work is done, set every task to `- [x]` so the list reflects reality
|
||||
7. **Approval gate** - Do not start implementation until the proposal is reviewed and approved
|
||||
|
||||
### Stage 3: Archiving Changes
|
||||
After deployment, create separate PR to:
|
||||
- Move `changes/[name]/` → `changes/archive/YYYY-MM-DD-[name]/`
|
||||
- Update `specs/` if capabilities changed
|
||||
- Use `openspec archive <change-id> --skip-specs --yes` for tooling-only changes (always pass the change ID explicitly)
|
||||
- Run `openspec validate --strict` to confirm the archived change passes checks
|
||||
|
||||
## Before Any Task
|
||||
|
||||
**Context Checklist:**
|
||||
- [ ] Read relevant specs in `specs/[capability]/spec.md`
|
||||
- [ ] Check pending changes in `changes/` for conflicts
|
||||
- [ ] Read `openspec/project.md` for conventions
|
||||
- [ ] Run `openspec list` to see active changes
|
||||
- [ ] Run `openspec list --specs` to see existing capabilities
|
||||
|
||||
**Before Creating Specs:**
|
||||
- Always check if capability already exists
|
||||
- Prefer modifying existing specs over creating duplicates
|
||||
- Use `openspec show [spec]` to review current state
|
||||
- If request is ambiguous, ask 1–2 clarifying questions before scaffolding
|
||||
|
||||
### Search Guidance
|
||||
- Enumerate specs: `openspec spec list --long` (or `--json` for scripts)
|
||||
- Enumerate changes: `openspec list` (or `openspec change list --json` - deprecated but available)
|
||||
- Show details:
|
||||
- Spec: `openspec show <spec-id> --type spec` (use `--json` for filters)
|
||||
- Change: `openspec show <change-id> --json --deltas-only`
|
||||
- Full-text search (use ripgrep): `rg -n "Requirement:|Scenario:" openspec/specs`
|
||||
|
||||
## Quick Start
|
||||
|
||||
### CLI Commands
|
||||
|
||||
```bash
|
||||
# Essential commands
|
||||
openspec list # List active changes
|
||||
openspec list --specs # List specifications
|
||||
openspec show [item] # Display change or spec
|
||||
openspec validate [item] # Validate changes or specs
|
||||
openspec archive <change-id> [--yes|-y] # Archive after deployment (add --yes for non-interactive runs)
|
||||
|
||||
# Project management
|
||||
openspec init [path] # Initialize OpenSpec
|
||||
openspec update [path] # Update instruction files
|
||||
|
||||
# Interactive mode
|
||||
openspec show # Prompts for selection
|
||||
openspec validate # Bulk validation mode
|
||||
|
||||
# Debugging
|
||||
openspec show [change] --json --deltas-only
|
||||
openspec validate [change] --strict
|
||||
```
|
||||
|
||||
### Command Flags
|
||||
|
||||
- `--json` - Machine-readable output
|
||||
- `--type change|spec` - Disambiguate items
|
||||
- `--strict` - Comprehensive validation
|
||||
- `--no-interactive` - Disable prompts
|
||||
- `--skip-specs` - Archive without spec updates
|
||||
- `--yes`/`-y` - Skip confirmation prompts (non-interactive archive)
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
openspec/
|
||||
├── project.md # Project conventions
|
||||
├── specs/ # Current truth - what IS built
|
||||
│ └── [capability]/ # Single focused capability
|
||||
│ ├── spec.md # Requirements and scenarios
|
||||
│ └── design.md # Technical patterns
|
||||
├── changes/ # Proposals - what SHOULD change
|
||||
│ ├── [change-name]/
|
||||
│ │ ├── proposal.md # Why, what, impact
|
||||
│ │ ├── tasks.md # Implementation checklist
|
||||
│ │ ├── design.md # Technical decisions (optional; see criteria)
|
||||
│ │ └── specs/ # Delta changes
|
||||
│ │ └── [capability]/
|
||||
│ │ └── spec.md # ADDED/MODIFIED/REMOVED
|
||||
│ └── archive/ # Completed changes
|
||||
```
|
||||
|
||||
## Creating Change Proposals
|
||||
|
||||
### Decision Tree
|
||||
|
||||
```
|
||||
New request?
|
||||
├─ Bug fix restoring spec behavior? → Fix directly
|
||||
├─ Typo/format/comment? → Fix directly
|
||||
├─ New feature/capability? → Create proposal
|
||||
├─ Breaking change? → Create proposal
|
||||
├─ Architecture change? → Create proposal
|
||||
└─ Unclear? → Create proposal (safer)
|
||||
```
|
||||
|
||||
### Proposal Structure
|
||||
|
||||
1. **Create directory:** `changes/[change-id]/` (kebab-case, verb-led, unique)
|
||||
|
||||
2. **Write proposal.md:**
|
||||
```markdown
|
||||
# Change: [Brief description of change]
|
||||
|
||||
## Why
|
||||
[1-2 sentences on problem/opportunity]
|
||||
|
||||
## What Changes
|
||||
- [Bullet list of changes]
|
||||
- [Mark breaking changes with **BREAKING**]
|
||||
|
||||
## Impact
|
||||
- Affected specs: [list capabilities]
|
||||
- Affected code: [key files/systems]
|
||||
```
|
||||
|
||||
3. **Create spec deltas:** `specs/[capability]/spec.md`
|
||||
```markdown
|
||||
## ADDED Requirements
|
||||
### Requirement: New Feature
|
||||
The system SHALL provide...
|
||||
|
||||
#### Scenario: Success case
|
||||
- **WHEN** user performs action
|
||||
- **THEN** expected result
|
||||
|
||||
## MODIFIED Requirements
|
||||
### Requirement: Existing Feature
|
||||
[Complete modified requirement]
|
||||
|
||||
## REMOVED Requirements
|
||||
### Requirement: Old Feature
|
||||
**Reason**: [Why removing]
|
||||
**Migration**: [How to handle]
|
||||
```
|
||||
If multiple capabilities are affected, create multiple delta files under `changes/[change-id]/specs/<capability>/spec.md`—one per capability.
|
||||
|
||||
4. **Create tasks.md:**
|
||||
```markdown
|
||||
## 1. Implementation
|
||||
- [ ] 1.1 Create database schema
|
||||
- [ ] 1.2 Implement API endpoint
|
||||
- [ ] 1.3 Add frontend component
|
||||
- [ ] 1.4 Write tests
|
||||
```
|
||||
|
||||
5. **Create design.md when needed:**
|
||||
Create `design.md` if any of the following apply; otherwise omit it:
|
||||
- Cross-cutting change (multiple services/modules) or a new architectural pattern
|
||||
- New external dependency or significant data model changes
|
||||
- Security, performance, or migration complexity
|
||||
- Ambiguity that benefits from technical decisions before coding
|
||||
|
||||
Minimal `design.md` skeleton:
|
||||
```markdown
|
||||
## Context
|
||||
[Background, constraints, stakeholders]
|
||||
|
||||
## Goals / Non-Goals
|
||||
- Goals: [...]
|
||||
- Non-Goals: [...]
|
||||
|
||||
## Decisions
|
||||
- Decision: [What and why]
|
||||
- Alternatives considered: [Options + rationale]
|
||||
|
||||
## Risks / Trade-offs
|
||||
- [Risk] → Mitigation
|
||||
|
||||
## Migration Plan
|
||||
[Steps, rollback]
|
||||
|
||||
## Open Questions
|
||||
- [...]
|
||||
```
|
||||
|
||||
## Spec File Format
|
||||
|
||||
### Critical: Scenario Formatting
|
||||
|
||||
**CORRECT** (use #### headers):
|
||||
```markdown
|
||||
#### Scenario: User login success
|
||||
- **WHEN** valid credentials provided
|
||||
- **THEN** return JWT token
|
||||
```
|
||||
|
||||
**WRONG** (don't use bullets or bold):
|
||||
```markdown
|
||||
- **Scenario: User login** ❌
|
||||
**Scenario**: User login ❌
|
||||
### Scenario: User login ❌
|
||||
```
|
||||
|
||||
Every requirement MUST have at least one scenario.
|
||||
|
||||
### Requirement Wording
|
||||
- Use SHALL/MUST for normative requirements (avoid should/may unless intentionally non-normative)
|
||||
|
||||
### Delta Operations
|
||||
|
||||
- `## ADDED Requirements` - New capabilities
|
||||
- `## MODIFIED Requirements` - Changed behavior
|
||||
- `## REMOVED Requirements` - Deprecated features
|
||||
- `## RENAMED Requirements` - Name changes
|
||||
|
||||
Headers matched with `trim(header)` - whitespace ignored.
|
||||
|
||||
#### When to use ADDED vs MODIFIED
|
||||
- ADDED: Introduces a new capability or sub-capability that can stand alone as a requirement. Prefer ADDED when the change is orthogonal (e.g., adding "Slash Command Configuration") rather than altering the semantics of an existing requirement.
|
||||
- MODIFIED: Changes the behavior, scope, or acceptance criteria of an existing requirement. Always paste the full, updated requirement content (header + all scenarios). The archiver will replace the entire requirement with what you provide here; partial deltas will drop previous details.
|
||||
- RENAMED: Use when only the name changes. If you also change behavior, use RENAMED (name) plus MODIFIED (content) referencing the new name.
|
||||
|
||||
Common pitfall: Using MODIFIED to add a new concern without including the previous text. This causes loss of detail at archive time. If you aren’t explicitly changing the existing requirement, add a new requirement under ADDED instead.
|
||||
|
||||
Authoring a MODIFIED requirement correctly:
|
||||
1) Locate the existing requirement in `openspec/specs/<capability>/spec.md`.
|
||||
2) Copy the entire requirement block (from `### Requirement: ...` through its scenarios).
|
||||
3) Paste it under `## MODIFIED Requirements` and edit to reflect the new behavior.
|
||||
4) Ensure the header text matches exactly (whitespace-insensitive) and keep at least one `#### Scenario:`.
|
||||
|
||||
Example for RENAMED:
|
||||
```markdown
|
||||
## RENAMED Requirements
|
||||
- FROM: `### Requirement: Login`
|
||||
- TO: `### Requirement: User Authentication`
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Errors
|
||||
|
||||
**"Change must have at least one delta"**
|
||||
- Check `changes/[name]/specs/` exists with .md files
|
||||
- Verify files have operation prefixes (## ADDED Requirements)
|
||||
|
||||
**"Requirement must have at least one scenario"**
|
||||
- Check scenarios use `#### Scenario:` format (4 hashtags)
|
||||
- Don't use bullet points or bold for scenario headers
|
||||
|
||||
**Silent scenario parsing failures**
|
||||
- Exact format required: `#### Scenario: Name`
|
||||
- Debug with: `openspec show [change] --json --deltas-only`
|
||||
|
||||
### Validation Tips
|
||||
|
||||
```bash
|
||||
# Always use strict mode for comprehensive checks
|
||||
openspec validate [change] --strict
|
||||
|
||||
# Debug delta parsing
|
||||
openspec show [change] --json | jq '.deltas'
|
||||
|
||||
# Check specific requirement
|
||||
openspec show [spec] --json -r 1
|
||||
```
|
||||
|
||||
## Happy Path Script
|
||||
|
||||
```bash
|
||||
# 1) Explore current state
|
||||
openspec spec list --long
|
||||
openspec list
|
||||
# Optional full-text search:
|
||||
# rg -n "Requirement:|Scenario:" openspec/specs
|
||||
# rg -n "^#|Requirement:" openspec/changes
|
||||
|
||||
# 2) Choose change id and scaffold
|
||||
CHANGE=add-two-factor-auth
|
||||
mkdir -p openspec/changes/$CHANGE/{specs/auth}
|
||||
printf "## Why\n...\n\n## What Changes\n- ...\n\n## Impact\n- ...\n" > openspec/changes/$CHANGE/proposal.md
|
||||
printf "## 1. Implementation\n- [ ] 1.1 ...\n" > openspec/changes/$CHANGE/tasks.md
|
||||
|
||||
# 3) Add deltas (example)
|
||||
cat > openspec/changes/$CHANGE/specs/auth/spec.md << 'EOF'
|
||||
## ADDED Requirements
|
||||
### Requirement: Two-Factor Authentication
|
||||
Users MUST provide a second factor during login.
|
||||
|
||||
#### Scenario: OTP required
|
||||
- **WHEN** valid credentials are provided
|
||||
- **THEN** an OTP challenge is required
|
||||
EOF
|
||||
|
||||
# 4) Validate
|
||||
openspec validate $CHANGE --strict
|
||||
```
|
||||
|
||||
## Multi-Capability Example
|
||||
|
||||
```
|
||||
openspec/changes/add-2fa-notify/
|
||||
├── proposal.md
|
||||
├── tasks.md
|
||||
└── specs/
|
||||
├── auth/
|
||||
│ └── spec.md # ADDED: Two-Factor Authentication
|
||||
└── notifications/
|
||||
└── spec.md # ADDED: OTP email notification
|
||||
```
|
||||
|
||||
auth/spec.md
|
||||
```markdown
|
||||
## ADDED Requirements
|
||||
### Requirement: Two-Factor Authentication
|
||||
...
|
||||
```
|
||||
|
||||
notifications/spec.md
|
||||
```markdown
|
||||
## ADDED Requirements
|
||||
### Requirement: OTP Email Notification
|
||||
...
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Simplicity First
|
||||
- Default to <100 lines of new code
|
||||
- Single-file implementations until proven insufficient
|
||||
- Avoid frameworks without clear justification
|
||||
- Choose boring, proven patterns
|
||||
|
||||
### Complexity Triggers
|
||||
Only add complexity with:
|
||||
- Performance data showing current solution too slow
|
||||
- Concrete scale requirements (>1000 users, >100MB data)
|
||||
- Multiple proven use cases requiring abstraction
|
||||
|
||||
### Clear References
|
||||
- Use `file.ts:42` format for code locations
|
||||
- Reference specs as `specs/auth/spec.md`
|
||||
- Link related changes and PRs
|
||||
|
||||
### Capability Naming
|
||||
- Use verb-noun: `user-auth`, `payment-capture`
|
||||
- Single purpose per capability
|
||||
- 10-minute understandability rule
|
||||
- Split if description needs "AND"
|
||||
|
||||
### Change ID Naming
|
||||
- Use kebab-case, short and descriptive: `add-two-factor-auth`
|
||||
- Prefer verb-led prefixes: `add-`, `update-`, `remove-`, `refactor-`
|
||||
- Ensure uniqueness; if taken, append `-2`, `-3`, etc.
|
||||
|
||||
## Tool Selection Guide
|
||||
|
||||
| Task | Tool | Why |
|
||||
|------|------|-----|
|
||||
| Find files by pattern | Glob | Fast pattern matching |
|
||||
| Search code content | Grep | Optimized regex search |
|
||||
| Read specific files | Read | Direct file access |
|
||||
| Explore unknown scope | Task | Multi-step investigation |
|
||||
|
||||
## Error Recovery
|
||||
|
||||
### Change Conflicts
|
||||
1. Run `openspec list` to see active changes
|
||||
2. Check for overlapping specs
|
||||
3. Coordinate with change owners
|
||||
4. Consider combining proposals
|
||||
|
||||
### Validation Failures
|
||||
1. Run with `--strict` flag
|
||||
2. Check JSON output for details
|
||||
3. Verify spec file format
|
||||
4. Ensure scenarios properly formatted
|
||||
|
||||
### Missing Context
|
||||
1. Read project.md first
|
||||
2. Check related specs
|
||||
3. Review recent archives
|
||||
4. Ask for clarification
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Stage Indicators
|
||||
- `changes/` - Proposed, not yet built
|
||||
- `specs/` - Built and deployed
|
||||
- `archive/` - Completed changes
|
||||
|
||||
### File Purposes
|
||||
- `proposal.md` - Why and what
|
||||
- `tasks.md` - Implementation steps
|
||||
- `design.md` - Technical decisions
|
||||
- `spec.md` - Requirements and behavior
|
||||
|
||||
### CLI Essentials
|
||||
```bash
|
||||
openspec list # What's in progress?
|
||||
openspec show [item] # View details
|
||||
openspec validate --strict # Is it correct?
|
||||
openspec archive <change-id> [--yes|-y] # Mark complete (add --yes for automation)
|
||||
```
|
||||
|
||||
Remember: Specs are truth. Changes are proposals. Keep them in sync.
|
||||
107
openspec/changes/add-doc-processing-api/design.md
Normal file
107
openspec/changes/add-doc-processing-api/design.md
Normal file
@@ -0,0 +1,107 @@
|
||||
## Context
|
||||
|
||||
This is the initial implementation of the DocProcesser service. The system integrates multiple external models and services:
|
||||
|
||||
- DocLayout-YOLO for document layout analysis
|
||||
- PaddleOCR-VL with PP-DocLayoutV2 for text and formula recognition (deployed via vLLM)
|
||||
- markdown_2_docx for document conversion
|
||||
|
||||
Target deployment: Ubuntu machine with RTX 5080 GPU (16GB VRAM), Python 3.11.0.
|
||||
|
||||
## Goals / Non-Goals
|
||||
|
||||
**Goals:**
|
||||
|
||||
- Clean FastAPI project structure following best practices
|
||||
- Image preprocessing with OpenCV (30% padding)
|
||||
- Layout-aware OCR routing using DocLayout-YOLO
|
||||
- Text and formula recognition via PaddleOCR-VL
|
||||
- Markdown to DOCX conversion
|
||||
- GPU-enabled Docker deployment
|
||||
|
||||
**Non-Goals:**
|
||||
|
||||
- Authentication/authorization (can be added later)
|
||||
- Rate limiting
|
||||
- Persistent storage
|
||||
- Training or fine-tuning models
|
||||
|
||||
## Decisions
|
||||
|
||||
### Project Structure
|
||||
|
||||
Follow FastAPI best practices with modular organization:
|
||||
|
||||
```
|
||||
app/
|
||||
├── api/
|
||||
│ └── v1/
|
||||
│ ├── endpoints/
|
||||
│ │ ├── image.py # Image OCR endpoint
|
||||
│ │ └── convert.py # Markdown to DOCX endpoint
|
||||
│ └── router.py
|
||||
├── core/
|
||||
│ └── config.py # Settings and environment config
|
||||
|—— model/
|
||||
| |—— DocLayout
|
||||
| |—— PP-DocLayout
|
||||
├── services/
|
||||
│ ├── image_processor.py # OpenCV preprocessing
|
||||
│ ├── layout_detector.py # DocLayout-YOLO wrapper
|
||||
│ ├── ocr_service.py # PaddleOCR-VL client
|
||||
│ └── docx_converter.py # markdown_2_docx wrapper
|
||||
├── schemas/
|
||||
│ ├── image.py # Request/response models for image OCR
|
||||
│ └── convert.py # Request/response models for conversion
|
||||
└── main.py # FastAPI app initialization
|
||||
```
|
||||
|
||||
**Rationale:** Separation of concerns between API layer, business logic (services), and data models (schemas).
|
||||
|
||||
### Image Preprocessing
|
||||
|
||||
- Use OpenCV `cv2.copyMakeBorder()` to add 30% whitespace padding
|
||||
- Padding color: white `[255, 255, 255]`
|
||||
- This matches DocLayout-YOLO's demo.py pattern
|
||||
|
||||
### Layout Detection Flow
|
||||
|
||||
1. DocLayout-YOLO detects layout regions (plain text, formulas, tables, figures)
|
||||
2. Exsit plain text, routes to PaddleOCR-VL with PP-DocLayoutV2, othewise routes to PaddleOCR-VL with prompt
|
||||
3. PaddleOCR-VL combined PP-DocLayoutV2 handles mixed content recognition internally, PaddleOCR-VL combined prompt handles formula
|
||||
|
||||
### External Service Integration
|
||||
|
||||
- PaddleOCR-VL: Connect to vLLM server at configurable URL (default: `http://localhost:8080/v1`)
|
||||
- DocLayout-YOLO: Load model from pre-downloaded path (not downloaded in container)
|
||||
|
||||
### Docker Strategy
|
||||
|
||||
- Base image: NVIDIA CUDA with Python 3.11
|
||||
- Pre-install OpenCV dependencies (`libgl1-mesa-glx`, `libglib2.0-0`)
|
||||
- Mount model directory for DocLayout-YOLO weights
|
||||
- Expose port 8053
|
||||
- Use Uvicorn with multiple workers
|
||||
|
||||
## Risks / Trade-offs
|
||||
|
||||
| Risk | Mitigation |
|
||||
| --------------------------------- | ------------------------------------------------------------------ |
|
||||
| PaddleOCR-VL service unavailable | Health check endpoint, retry logic with exponential backoff |
|
||||
| Large image memory consumption | Configure max image size, resize before processing |
|
||||
| DocLayout-YOLO model loading time | Load model once at startup, keep in memory |
|
||||
| GPU memory contention | DocLayout-YOLO uses GPU; PaddleOCR-VL runs on separate vLLM server |
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
|
||||
- `PADDLEOCR_VL_URL`: vLLM server URL (default: `http://localhost:8000/v1`)
|
||||
- `DOCLAYOUT_MODEL_PATH`: Path to DocLayout-YOLO weights
|
||||
- `PP_DOCLAYOUT_MODEL_DIR`: Path to PP-DocLayoutV3 model directory
|
||||
- `MAX_IMAGE_SIZE_MB`: Maximum upload size (default: 10)
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Should we add async queue for large batch processing? (Defer to future change)
|
||||
- Do we need WebSocket for progress updates? (Defer to future change)
|
||||
31
openspec/changes/add-doc-processing-api/proposal.md
Normal file
31
openspec/changes/add-doc-processing-api/proposal.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# Change: Add Document Processing API
|
||||
|
||||
## Why
|
||||
|
||||
DocProcesser needs a FastAPI backend to accept images (via URL or base64) and convert them to LaTeX/Markdown/MathML, plus a markdown-to-DOCX conversion endpoint. This establishes the core functionality of the project.
|
||||
|
||||
## What Changes
|
||||
|
||||
- **BREAKING**: Initial project setup (new FastAPI project structure)
|
||||
- Add image-to-OCR API endpoint (`POST /doc_process/v1/image/ocr`)
|
||||
- Accept `image_url` or `image_base64` input
|
||||
- Preprocess with OpenCV (30% whitespace padding)
|
||||
- Use DocLayout-YOLO for layout detection
|
||||
- Route to PaddleOCR-VL (with PP-DocLayoutV2) for text/formula recognition
|
||||
- Exists `plain_text` element, use PP-DocLayoutV2 to recognize the image as mixed_recognition , otherwise directly PaddleOCR-VL API combined with prompt Formula Recognition as formula_recognition.
|
||||
- Refrence markdown_2_docx code convert the markdown to latex, mathml for mixed_recognition, convert the latex to markdown, mathml for formula_recognition
|
||||
- Return LaTeX, Markdown, and MathML outputs
|
||||
- Add markdown-to-DOCX API endpoint (`POST /doc_process/v1/convert/docx`)
|
||||
- Accept markdown content
|
||||
- Refrence markdown_2_docx library for conversion, the address is http://github.com/YogeLiu/markdown_2_docxdd.
|
||||
- Return DOCX file
|
||||
- Add Dockerfile for GPU-enabled deployment (RTX 5080, port 8053)
|
||||
|
||||
## Impact
|
||||
|
||||
- Affected specs: `image-ocr`, `markdown-docx`
|
||||
- Affected code: New project structure under `app/`
|
||||
- External dependencies:
|
||||
- DocLayout-YOLO (pre-downloaded model, not fetched in container)
|
||||
- PaddleOCR-VL with vLLM backend (external service at localhost:8080)
|
||||
- markdown_2_docx library
|
||||
137
openspec/changes/add-doc-processing-api/specs/image-ocr/spec.md
Normal file
137
openspec/changes/add-doc-processing-api/specs/image-ocr/spec.md
Normal file
@@ -0,0 +1,137 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Image Input Acceptance
|
||||
|
||||
The system SHALL accept images via `POST /api/v1/image/ocr` endpoint with either:
|
||||
|
||||
- `image_url`: A publicly accessible URL to the image
|
||||
- `image_base64`: Base64-encoded image data
|
||||
|
||||
The system SHALL return an error if neither input is provided or if both are provided simultaneously.
|
||||
|
||||
#### Scenario: Image URL provided
|
||||
|
||||
- **WHEN** a valid `image_url` is provided in the request body
|
||||
- **THEN** the system SHALL download the image and process it
|
||||
- **AND** return OCR results in the response
|
||||
|
||||
#### Scenario: Base64 image provided
|
||||
|
||||
- **WHEN** a valid `image_base64` string is provided in the request body
|
||||
- **THEN** the system SHALL decode the image and process it
|
||||
- **AND** return OCR results in the response
|
||||
|
||||
#### Scenario: Invalid input
|
||||
|
||||
- **WHEN** neither `image_url` nor `image_base64` is provided
|
||||
- **THEN** the system SHALL return HTTP 422 with validation error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Image Preprocessing with Padding
|
||||
|
||||
The system SHALL preprocess all input images by adding 30% whitespace padding around the image borders using OpenCV.
|
||||
|
||||
The padding calculation: `padding = int(max(height, width) * 0.15)` on each side (totaling 30% expansion).
|
||||
|
||||
The padding color SHALL be white (`RGB: 255, 255, 255`).
|
||||
|
||||
#### Scenario: Image padding applied
|
||||
|
||||
- **WHEN** an image of dimensions 1000x800 pixels is received
|
||||
- **THEN** the system SHALL add approximately 150 pixels of white padding on each side
|
||||
- **AND** the resulting image dimensions SHALL be approximately 1300x1100 pixels
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Layout Detection with DocLayout-YOLO
|
||||
|
||||
The system SHALL use DocLayout-YOLO model to detect document layout regions including:
|
||||
|
||||
- Plain text blocks
|
||||
- Formulas/equations
|
||||
- Tables
|
||||
- Figures
|
||||
|
||||
The model SHALL be loaded from a pre-configured local path (not downloaded at runtime).
|
||||
|
||||
#### Scenario: Layout detection success
|
||||
|
||||
- **WHEN** a padded image is passed to DocLayout-YOLO
|
||||
- **THEN** the system SHALL return detected regions with bounding boxes and class labels
|
||||
- **AND** confidence scores for each detection
|
||||
|
||||
#### Scenario: Model not available
|
||||
|
||||
- **WHEN** the DocLayout-YOLO model file is not found at the configured path
|
||||
- **THEN** the system SHALL fail startup with a clear error message
|
||||
|
||||
---
|
||||
|
||||
### Requirement: OCR Processing with PaddleOCR-VL
|
||||
|
||||
The system SHALL send images to PaddleOCR-VL (via vLLM backend) for text and formula recognition.
|
||||
|
||||
PaddleOCR-VL SHALL be configured with PP-DocLayoutV2 for document layout understanding.
|
||||
|
||||
The system SHALL handle both plain text and formula/math content.
|
||||
|
||||
#### Scenario: Plain text recognition
|
||||
|
||||
- **WHEN** DocLayout-YOLO detects plain text regions
|
||||
- **THEN** the system SHALL send the image to PaddleOCR-VL
|
||||
- **AND** return recognized text content
|
||||
|
||||
#### Scenario: Formula recognition
|
||||
|
||||
- **WHEN** DocLayout-YOLO detects formula/equation regions
|
||||
- **THEN** the system SHALL send the image to PaddleOCR-VL
|
||||
- **AND** return formula content in LaTeX format
|
||||
|
||||
#### Scenario: Mixed content handling
|
||||
|
||||
- **WHEN** DocLayout-YOLO detects both text and formula regions
|
||||
- **THEN** the system SHALL process all regions via PaddleOCR-VL with PP-DocLayoutV3
|
||||
- **AND** return combined results preserving document structure
|
||||
|
||||
#### Scenario: PaddleOCR-VL service unavailable
|
||||
|
||||
- **WHEN** the PaddleOCR-VL vLLM server is unreachable
|
||||
- **THEN** the system SHALL return HTTP 503 with service unavailable error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Multi-Format Output
|
||||
|
||||
The system SHALL return OCR results in multiple formats:
|
||||
|
||||
- `latex`: LaTeX representation of the content
|
||||
- `markdown`: Markdown representation of the content
|
||||
- `mathml`: MathML representation for mathematical content
|
||||
|
||||
#### Scenario: Successful OCR response
|
||||
|
||||
- **WHEN** image processing completes successfully
|
||||
- **THEN** the response SHALL include:
|
||||
- `latex`: string containing LaTeX output
|
||||
- `markdown`: string containing Markdown output
|
||||
- `mathml`: string containing MathML output (empty string if no math detected)
|
||||
- **AND** HTTP status code SHALL be 200
|
||||
|
||||
#### Scenario: Response structure
|
||||
|
||||
- **WHEN** the OCR endpoint returns successfully
|
||||
- **THEN** the response body SHALL be JSON with structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"latex": "...",
|
||||
"markdown": "...",
|
||||
"mathml": "...",
|
||||
"layout_info": {
|
||||
"regions": [
|
||||
{"type": "text|formula|table|figure", "bbox": [x1, y1, x2, y2], "confidence": 0.95}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,93 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Markdown Input Acceptance
|
||||
|
||||
The system SHALL accept markdown content via `POST /api/v1/convert/docx` endpoint.
|
||||
|
||||
The request body SHALL contain:
|
||||
- `markdown`: string containing the markdown content to convert
|
||||
|
||||
#### Scenario: Valid markdown provided
|
||||
|
||||
- **WHEN** valid markdown content is provided in the request body
|
||||
- **THEN** the system SHALL process and convert it to DOCX format
|
||||
|
||||
#### Scenario: Empty markdown
|
||||
|
||||
- **WHEN** an empty `markdown` string is provided
|
||||
- **THEN** the system SHALL return HTTP 422 with validation error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: DOCX Conversion
|
||||
|
||||
The system SHALL convert markdown content to DOCX format using the markdown_2_docx library.
|
||||
|
||||
The conversion SHALL preserve:
|
||||
- Headings (H1-H6)
|
||||
- Paragraphs
|
||||
- Bold and italic formatting
|
||||
- Lists (ordered and unordered)
|
||||
- Code blocks
|
||||
- Tables
|
||||
- Images (if embedded as base64 or accessible URLs)
|
||||
|
||||
#### Scenario: Basic markdown conversion
|
||||
|
||||
- **WHEN** markdown with headings, paragraphs, and formatting is provided
|
||||
- **THEN** the system SHALL generate a valid DOCX file
|
||||
- **AND** the DOCX SHALL preserve the document structure
|
||||
|
||||
#### Scenario: Complex markdown with tables
|
||||
|
||||
- **WHEN** markdown containing tables is provided
|
||||
- **THEN** the system SHALL convert tables to Word table format
|
||||
- **AND** preserve table structure and content
|
||||
|
||||
#### Scenario: Markdown with math formulas
|
||||
|
||||
- **WHEN** markdown containing LaTeX math expressions is provided
|
||||
- **THEN** the system SHALL convert math to OMML (Office Math Markup Language) format
|
||||
- **AND** render correctly in Microsoft Word
|
||||
|
||||
---
|
||||
|
||||
### Requirement: DOCX File Response
|
||||
|
||||
The system SHALL return the generated DOCX file as a binary download.
|
||||
|
||||
The response SHALL include:
|
||||
- Content-Type: `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
|
||||
- Content-Disposition: `attachment; filename="output.docx"`
|
||||
|
||||
#### Scenario: Successful conversion response
|
||||
|
||||
- **WHEN** markdown conversion completes successfully
|
||||
- **THEN** the response SHALL be the DOCX file binary
|
||||
- **AND** HTTP status code SHALL be 200
|
||||
- **AND** appropriate headers for file download SHALL be set
|
||||
|
||||
#### Scenario: Custom filename
|
||||
|
||||
- **WHEN** an optional `filename` parameter is provided in the request
|
||||
- **THEN** the Content-Disposition header SHALL use the provided filename
|
||||
- **AND** append `.docx` extension if not present
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Error Handling
|
||||
|
||||
The system SHALL provide clear error responses for conversion failures.
|
||||
|
||||
#### Scenario: Conversion failure
|
||||
|
||||
- **WHEN** markdown_2_docx fails to convert the content
|
||||
- **THEN** the system SHALL return HTTP 500 with error details
|
||||
- **AND** the error message SHALL describe the failure reason
|
||||
|
||||
#### Scenario: Malformed markdown
|
||||
|
||||
- **WHEN** severely malformed markdown is provided
|
||||
- **THEN** the system SHALL attempt best-effort conversion
|
||||
- **AND** log a warning about potential formatting issues
|
||||
|
||||
34
openspec/changes/add-doc-processing-api/tasks.md
Normal file
34
openspec/changes/add-doc-processing-api/tasks.md
Normal file
@@ -0,0 +1,34 @@
|
||||
## 1. Project Scaffolding
|
||||
|
||||
- [x] 1.1 Create FastAPI project structure (`app/`, `api/`, `core/`, `services/`, `schemas/`)
|
||||
- [x] 1.2 Use uv handle with dependencies (fastapi, uvicorn, opencv-python, python-multipart, pydantic, httpx)
|
||||
- [x] 1.3 Create `app/main.py` with FastAPI app initialization
|
||||
- [x] 1.4 Create `app/core/config.py` with Pydantic Settings
|
||||
|
||||
## 2. Image OCR API
|
||||
|
||||
- [x] 2.1 Create request/response schemas in `app/schemas/image.py`
|
||||
- [x] 2.2 Implement image preprocessing service with OpenCV padding (`app/services/image_processor.py`)
|
||||
- [x] 2.3 Implement DocLayout-YOLO wrapper (`app/services/layout_detector.py`)
|
||||
- [x] 2.4 Implement PaddleOCR-VL client (`app/services/ocr_service.py`)
|
||||
- [x] 2.5 Create image OCR endpoint (`app/api/v1/endpoints/image.py`)
|
||||
- [x] 2.6 Wire up router and test endpoint
|
||||
|
||||
## 3. Markdown to DOCX API
|
||||
|
||||
- [x] 3.1 Create request/response schemas in `app/schemas/convert.py`
|
||||
- [x] 3.2 Integrate markdown_2_docx library (`app/services/docx_converter.py`)
|
||||
- [x] 3.3 Create conversion endpoint (`app/api/v1/endpoints/convert.py`)
|
||||
- [x] 3.4 Wire up router and test endpoint
|
||||
|
||||
## 4. Deployment
|
||||
|
||||
- [x] 4.1 Create Dockerfile with CUDA base image for RTX 5080
|
||||
- [x] 4.2 Create docker-compose.yml (optional, for local development)
|
||||
- [x] 4.3 Document deployment steps in README
|
||||
|
||||
## 5. Validation
|
||||
|
||||
- [ ] 5.1 Test image OCR endpoint with sample images
|
||||
- [ ] 5.2 Test markdown to DOCX conversion
|
||||
- [ ] 5.3 Verify Docker build and GPU access
|
||||
42
openspec/project.md
Normal file
42
openspec/project.md
Normal file
@@ -0,0 +1,42 @@
|
||||
# Project Context
|
||||
|
||||
## Purpose
|
||||
|
||||
This project is DocProcesser which can process the image to latex, markdown, mathml, omml, ect.
|
||||
It is a fastapi web project, it accept the request from upstream and process the image or send the image to the third-part, then return the result to upstream.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- python
|
||||
- fastapi
|
||||
|
||||
## Project Conventions
|
||||
|
||||
### Code Style
|
||||
|
||||
[Describe your code style preferences, formatting rules, and naming conventions]
|
||||
|
||||
### Architecture Patterns
|
||||
|
||||
[Document your architectural decisions and patterns]
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
[Explain your testing approach and requirements]
|
||||
|
||||
### Git Workflow
|
||||
|
||||
[Describe your branching strategy and commit conventions]
|
||||
|
||||
## Domain Context
|
||||
|
||||
- DocLayout
|
||||
A YOLO model which can recognize the document layout (Book, Paper, NewPapers) will be used to recongize if has plain text in a image.
|
||||
|
||||
## Important Constraints
|
||||
|
||||
[List any technical, business, or regulatory constraints]
|
||||
|
||||
## External Dependencies
|
||||
|
||||
[Document key external services, APIs, or systems]
|
||||
Reference in New Issue
Block a user