4.3 KiB
ADDED Requirements
Requirement: Image Input Acceptance
The system SHALL accept images via POST /api/v1/image/ocr endpoint with either:
image_url: A publicly accessible URL to the imageimage_base64: Base64-encoded image data
The system SHALL return an error if neither input is provided or if both are provided simultaneously.
Scenario: Image URL provided
- WHEN a valid
image_urlis provided in the request body - THEN the system SHALL download the image and process it
- AND return OCR results in the response
Scenario: Base64 image provided
- WHEN a valid
image_base64string is provided in the request body - THEN the system SHALL decode the image and process it
- AND return OCR results in the response
Scenario: Invalid input
- WHEN neither
image_urlnorimage_base64is provided - THEN the system SHALL return HTTP 422 with validation error
Requirement: Image Preprocessing with Padding
The system SHALL preprocess all input images by adding 30% whitespace padding around the image borders using OpenCV.
The padding calculation: padding = int(max(height, width) * 0.15) on each side (totaling 30% expansion).
The padding color SHALL be white (RGB: 255, 255, 255).
Scenario: Image padding applied
- WHEN an image of dimensions 1000x800 pixels is received
- THEN the system SHALL add approximately 150 pixels of white padding on each side
- AND the resulting image dimensions SHALL be approximately 1300x1100 pixels
Requirement: Layout Detection with DocLayout-YOLO
The system SHALL use DocLayout-YOLO model to detect document layout regions including:
- Plain text blocks
- Formulas/equations
- Tables
- Figures
The model SHALL be loaded from a pre-configured local path (not downloaded at runtime).
Scenario: Layout detection success
- WHEN a padded image is passed to DocLayout-YOLO
- THEN the system SHALL return detected regions with bounding boxes and class labels
- AND confidence scores for each detection
Scenario: Model not available
- WHEN the DocLayout-YOLO model file is not found at the configured path
- THEN the system SHALL fail startup with a clear error message
Requirement: OCR Processing with PaddleOCR-VL
The system SHALL send images to PaddleOCR-VL (via vLLM backend) for text and formula recognition.
PaddleOCR-VL SHALL be configured with PP-DocLayoutV2 for document layout understanding.
The system SHALL handle both plain text and formula/math content.
Scenario: Plain text recognition
- WHEN DocLayout-YOLO detects plain text regions
- THEN the system SHALL send the image to PaddleOCR-VL
- AND return recognized text content
Scenario: Formula recognition
- WHEN DocLayout-YOLO detects formula/equation regions
- THEN the system SHALL send the image to PaddleOCR-VL
- AND return formula content in LaTeX format
Scenario: Mixed content handling
- WHEN DocLayout-YOLO detects both text and formula regions
- THEN the system SHALL process all regions via PaddleOCR-VL with PP-DocLayoutV3
- AND return combined results preserving document structure
Scenario: PaddleOCR-VL service unavailable
- WHEN the PaddleOCR-VL vLLM server is unreachable
- THEN the system SHALL return HTTP 503 with service unavailable error
Requirement: Multi-Format Output
The system SHALL return OCR results in multiple formats:
latex: LaTeX representation of the contentmarkdown: Markdown representation of the contentmathml: MathML representation for mathematical content
Scenario: Successful OCR response
- WHEN image processing completes successfully
- THEN the response SHALL include:
latex: string containing LaTeX outputmarkdown: string containing Markdown outputmathml: string containing MathML output (empty string if no math detected)
- AND HTTP status code SHALL be 200
Scenario: Response structure
- WHEN the OCR endpoint returns successfully
- THEN the response body SHALL be JSON with structure:
{
"latex": "...",
"markdown": "...",
"mathml": "...",
"layout_info": {
"regions": [
{"type": "text|formula|table|figure", "bbox": [x1, y1, x2, y2], "confidence": 0.95}
]
}
}