## ADDED Requirements ### Requirement: Image Input Acceptance The system SHALL accept images via `POST /api/v1/image/ocr` endpoint with either: - `image_url`: A publicly accessible URL to the image - `image_base64`: Base64-encoded image data The system SHALL return an error if neither input is provided or if both are provided simultaneously. #### Scenario: Image URL provided - **WHEN** a valid `image_url` is provided in the request body - **THEN** the system SHALL download the image and process it - **AND** return OCR results in the response #### Scenario: Base64 image provided - **WHEN** a valid `image_base64` string is provided in the request body - **THEN** the system SHALL decode the image and process it - **AND** return OCR results in the response #### Scenario: Invalid input - **WHEN** neither `image_url` nor `image_base64` is provided - **THEN** the system SHALL return HTTP 422 with validation error --- ### Requirement: Image Preprocessing with Padding The system SHALL preprocess all input images by adding 30% whitespace padding around the image borders using OpenCV. The padding calculation: `padding = int(max(height, width) * 0.15)` on each side (totaling 30% expansion). The padding color SHALL be white (`RGB: 255, 255, 255`). #### Scenario: Image padding applied - **WHEN** an image of dimensions 1000x800 pixels is received - **THEN** the system SHALL add approximately 150 pixels of white padding on each side - **AND** the resulting image dimensions SHALL be approximately 1300x1100 pixels --- ### Requirement: Layout Detection with DocLayout-YOLO The system SHALL use DocLayout-YOLO model to detect document layout regions including: - Plain text blocks - Formulas/equations - Tables - Figures The model SHALL be loaded from a pre-configured local path (not downloaded at runtime). #### Scenario: Layout detection success - **WHEN** a padded image is passed to DocLayout-YOLO - **THEN** the system SHALL return detected regions with bounding boxes and class labels - **AND** confidence scores for each detection #### Scenario: Model not available - **WHEN** the DocLayout-YOLO model file is not found at the configured path - **THEN** the system SHALL fail startup with a clear error message --- ### Requirement: OCR Processing with PaddleOCR-VL The system SHALL send images to PaddleOCR-VL (via vLLM backend) for text and formula recognition. PaddleOCR-VL SHALL be configured with PP-DocLayoutV2 for document layout understanding. The system SHALL handle both plain text and formula/math content. #### Scenario: Plain text recognition - **WHEN** DocLayout-YOLO detects plain text regions - **THEN** the system SHALL send the image to PaddleOCR-VL - **AND** return recognized text content #### Scenario: Formula recognition - **WHEN** DocLayout-YOLO detects formula/equation regions - **THEN** the system SHALL send the image to PaddleOCR-VL - **AND** return formula content in LaTeX format #### Scenario: Mixed content handling - **WHEN** DocLayout-YOLO detects both text and formula regions - **THEN** the system SHALL process all regions via PaddleOCR-VL with PP-DocLayoutV3 - **AND** return combined results preserving document structure #### Scenario: PaddleOCR-VL service unavailable - **WHEN** the PaddleOCR-VL vLLM server is unreachable - **THEN** the system SHALL return HTTP 503 with service unavailable error --- ### Requirement: Multi-Format Output The system SHALL return OCR results in multiple formats: - `latex`: LaTeX representation of the content - `markdown`: Markdown representation of the content - `mathml`: MathML representation for mathematical content #### Scenario: Successful OCR response - **WHEN** image processing completes successfully - **THEN** the response SHALL include: - `latex`: string containing LaTeX output - `markdown`: string containing Markdown output - `mathml`: string containing MathML output (empty string if no math detected) - **AND** HTTP status code SHALL be 200 #### Scenario: Response structure - **WHEN** the OCR endpoint returns successfully - **THEN** the response body SHALL be JSON with structure: ```json { "latex": "...", "markdown": "...", "mathml": "...", "layout_info": { "regions": [ {"type": "text|formula|table|figure", "bbox": [x1, y1, x2, y2], "confidence": 0.95} ] } } ```