init repo
This commit is contained in:
137
openspec/changes/add-doc-processing-api/specs/image-ocr/spec.md
Normal file
137
openspec/changes/add-doc-processing-api/specs/image-ocr/spec.md
Normal file
@@ -0,0 +1,137 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Image Input Acceptance
|
||||
|
||||
The system SHALL accept images via `POST /api/v1/image/ocr` endpoint with either:
|
||||
|
||||
- `image_url`: A publicly accessible URL to the image
|
||||
- `image_base64`: Base64-encoded image data
|
||||
|
||||
The system SHALL return an error if neither input is provided or if both are provided simultaneously.
|
||||
|
||||
#### Scenario: Image URL provided
|
||||
|
||||
- **WHEN** a valid `image_url` is provided in the request body
|
||||
- **THEN** the system SHALL download the image and process it
|
||||
- **AND** return OCR results in the response
|
||||
|
||||
#### Scenario: Base64 image provided
|
||||
|
||||
- **WHEN** a valid `image_base64` string is provided in the request body
|
||||
- **THEN** the system SHALL decode the image and process it
|
||||
- **AND** return OCR results in the response
|
||||
|
||||
#### Scenario: Invalid input
|
||||
|
||||
- **WHEN** neither `image_url` nor `image_base64` is provided
|
||||
- **THEN** the system SHALL return HTTP 422 with validation error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Image Preprocessing with Padding
|
||||
|
||||
The system SHALL preprocess all input images by adding 30% whitespace padding around the image borders using OpenCV.
|
||||
|
||||
The padding calculation: `padding = int(max(height, width) * 0.15)` on each side (totaling 30% expansion).
|
||||
|
||||
The padding color SHALL be white (`RGB: 255, 255, 255`).
|
||||
|
||||
#### Scenario: Image padding applied
|
||||
|
||||
- **WHEN** an image of dimensions 1000x800 pixels is received
|
||||
- **THEN** the system SHALL add approximately 150 pixels of white padding on each side
|
||||
- **AND** the resulting image dimensions SHALL be approximately 1300x1100 pixels
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Layout Detection with DocLayout-YOLO
|
||||
|
||||
The system SHALL use DocLayout-YOLO model to detect document layout regions including:
|
||||
|
||||
- Plain text blocks
|
||||
- Formulas/equations
|
||||
- Tables
|
||||
- Figures
|
||||
|
||||
The model SHALL be loaded from a pre-configured local path (not downloaded at runtime).
|
||||
|
||||
#### Scenario: Layout detection success
|
||||
|
||||
- **WHEN** a padded image is passed to DocLayout-YOLO
|
||||
- **THEN** the system SHALL return detected regions with bounding boxes and class labels
|
||||
- **AND** confidence scores for each detection
|
||||
|
||||
#### Scenario: Model not available
|
||||
|
||||
- **WHEN** the DocLayout-YOLO model file is not found at the configured path
|
||||
- **THEN** the system SHALL fail startup with a clear error message
|
||||
|
||||
---
|
||||
|
||||
### Requirement: OCR Processing with PaddleOCR-VL
|
||||
|
||||
The system SHALL send images to PaddleOCR-VL (via vLLM backend) for text and formula recognition.
|
||||
|
||||
PaddleOCR-VL SHALL be configured with PP-DocLayoutV2 for document layout understanding.
|
||||
|
||||
The system SHALL handle both plain text and formula/math content.
|
||||
|
||||
#### Scenario: Plain text recognition
|
||||
|
||||
- **WHEN** DocLayout-YOLO detects plain text regions
|
||||
- **THEN** the system SHALL send the image to PaddleOCR-VL
|
||||
- **AND** return recognized text content
|
||||
|
||||
#### Scenario: Formula recognition
|
||||
|
||||
- **WHEN** DocLayout-YOLO detects formula/equation regions
|
||||
- **THEN** the system SHALL send the image to PaddleOCR-VL
|
||||
- **AND** return formula content in LaTeX format
|
||||
|
||||
#### Scenario: Mixed content handling
|
||||
|
||||
- **WHEN** DocLayout-YOLO detects both text and formula regions
|
||||
- **THEN** the system SHALL process all regions via PaddleOCR-VL with PP-DocLayoutV3
|
||||
- **AND** return combined results preserving document structure
|
||||
|
||||
#### Scenario: PaddleOCR-VL service unavailable
|
||||
|
||||
- **WHEN** the PaddleOCR-VL vLLM server is unreachable
|
||||
- **THEN** the system SHALL return HTTP 503 with service unavailable error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Multi-Format Output
|
||||
|
||||
The system SHALL return OCR results in multiple formats:
|
||||
|
||||
- `latex`: LaTeX representation of the content
|
||||
- `markdown`: Markdown representation of the content
|
||||
- `mathml`: MathML representation for mathematical content
|
||||
|
||||
#### Scenario: Successful OCR response
|
||||
|
||||
- **WHEN** image processing completes successfully
|
||||
- **THEN** the response SHALL include:
|
||||
- `latex`: string containing LaTeX output
|
||||
- `markdown`: string containing Markdown output
|
||||
- `mathml`: string containing MathML output (empty string if no math detected)
|
||||
- **AND** HTTP status code SHALL be 200
|
||||
|
||||
#### Scenario: Response structure
|
||||
|
||||
- **WHEN** the OCR endpoint returns successfully
|
||||
- **THEN** the response body SHALL be JSON with structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"latex": "...",
|
||||
"markdown": "...",
|
||||
"mathml": "...",
|
||||
"layout_info": {
|
||||
"regions": [
|
||||
{"type": "text|formula|table|figure", "bbox": [x1, y1, x2, y2], "confidence": 0.95}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,93 @@
|
||||
## ADDED Requirements
|
||||
|
||||
### Requirement: Markdown Input Acceptance
|
||||
|
||||
The system SHALL accept markdown content via `POST /api/v1/convert/docx` endpoint.
|
||||
|
||||
The request body SHALL contain:
|
||||
- `markdown`: string containing the markdown content to convert
|
||||
|
||||
#### Scenario: Valid markdown provided
|
||||
|
||||
- **WHEN** valid markdown content is provided in the request body
|
||||
- **THEN** the system SHALL process and convert it to DOCX format
|
||||
|
||||
#### Scenario: Empty markdown
|
||||
|
||||
- **WHEN** an empty `markdown` string is provided
|
||||
- **THEN** the system SHALL return HTTP 422 with validation error
|
||||
|
||||
---
|
||||
|
||||
### Requirement: DOCX Conversion
|
||||
|
||||
The system SHALL convert markdown content to DOCX format using the markdown_2_docx library.
|
||||
|
||||
The conversion SHALL preserve:
|
||||
- Headings (H1-H6)
|
||||
- Paragraphs
|
||||
- Bold and italic formatting
|
||||
- Lists (ordered and unordered)
|
||||
- Code blocks
|
||||
- Tables
|
||||
- Images (if embedded as base64 or accessible URLs)
|
||||
|
||||
#### Scenario: Basic markdown conversion
|
||||
|
||||
- **WHEN** markdown with headings, paragraphs, and formatting is provided
|
||||
- **THEN** the system SHALL generate a valid DOCX file
|
||||
- **AND** the DOCX SHALL preserve the document structure
|
||||
|
||||
#### Scenario: Complex markdown with tables
|
||||
|
||||
- **WHEN** markdown containing tables is provided
|
||||
- **THEN** the system SHALL convert tables to Word table format
|
||||
- **AND** preserve table structure and content
|
||||
|
||||
#### Scenario: Markdown with math formulas
|
||||
|
||||
- **WHEN** markdown containing LaTeX math expressions is provided
|
||||
- **THEN** the system SHALL convert math to OMML (Office Math Markup Language) format
|
||||
- **AND** render correctly in Microsoft Word
|
||||
|
||||
---
|
||||
|
||||
### Requirement: DOCX File Response
|
||||
|
||||
The system SHALL return the generated DOCX file as a binary download.
|
||||
|
||||
The response SHALL include:
|
||||
- Content-Type: `application/vnd.openxmlformats-officedocument.wordprocessingml.document`
|
||||
- Content-Disposition: `attachment; filename="output.docx"`
|
||||
|
||||
#### Scenario: Successful conversion response
|
||||
|
||||
- **WHEN** markdown conversion completes successfully
|
||||
- **THEN** the response SHALL be the DOCX file binary
|
||||
- **AND** HTTP status code SHALL be 200
|
||||
- **AND** appropriate headers for file download SHALL be set
|
||||
|
||||
#### Scenario: Custom filename
|
||||
|
||||
- **WHEN** an optional `filename` parameter is provided in the request
|
||||
- **THEN** the Content-Disposition header SHALL use the provided filename
|
||||
- **AND** append `.docx` extension if not present
|
||||
|
||||
---
|
||||
|
||||
### Requirement: Error Handling
|
||||
|
||||
The system SHALL provide clear error responses for conversion failures.
|
||||
|
||||
#### Scenario: Conversion failure
|
||||
|
||||
- **WHEN** markdown_2_docx fails to convert the content
|
||||
- **THEN** the system SHALL return HTTP 500 with error details
|
||||
- **AND** the error message SHALL describe the failure reason
|
||||
|
||||
#### Scenario: Malformed markdown
|
||||
|
||||
- **WHEN** severely malformed markdown is provided
|
||||
- **THEN** the system SHALL attempt best-effort conversion
|
||||
- **AND** log a warning about potential formatting issues
|
||||
|
||||
Reference in New Issue
Block a user