Files
doc_ai_frontend/content/docs/en/ocr-accuracy.md
yoge 409bbf742e feat: optimize docs pages and add 4 new doc articles (en + zh)
- Rewrote DocsListPage and DocDetailPage with landing.css aesthetic
  (icon cards, skeleton loader, prose styles, CTA box)
- Added docs-specific CSS to landing.css
- Created image-to-latex, copy-to-word, ocr-accuracy, pdf-extraction
  articles in both English and Chinese
- Updated DocsSeoSection guide cards to link to real doc slugs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:15:22 +08:00

80 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: OCR Accuracy
description: Understanding TexPixel recognition accuracy and how to get the best results
slug: ocr-accuracy
date: 2026-03-25
tags: [accuracy, tips]
order: 5
---
# OCR Accuracy
TexPixel achieves industry-leading accuracy on mathematical formula recognition — but accuracy isn't uniform across all input types. This guide explains what affects accuracy and how to maximize it.
## Accuracy by Formula Type
| Formula Type | Typical Accuracy |
|---|---|
| Printed formulas (textbooks, papers) | 9599% |
| Clean handwritten formulas | 8895% |
| Scanned documents (300 DPI+) | 9398% |
| Photos of whiteboards | 8292% |
| Low-resolution images (< 72 DPI) | 6080% |
These are approximate ranges. Individual results depend heavily on image quality.
## Factors That Affect Accuracy
### Image Quality
The single biggest factor. A blurry, low-resolution, or poorly lit image will always produce worse results than a clean scan.
- **Resolution** — 150 DPI or higher is recommended. 300 DPI is ideal for documents.
- **Contrast** — dark ink on a white background gives the clearest signal to the model.
- **Sharpness** — avoid motion blur or out-of-focus shots.
### Formula Complexity
Simple single-line equations are recognized with near-perfect accuracy. More complex structures may have occasional errors:
- Multi-line equation systems
- Large matrices (6×6 or larger)
- Heavily nested fractions (3+ levels deep)
- Non-standard notation or custom symbols
### Handwriting Style
Printed (typed) formulas outperform handwritten ones, but TexPixel handles handwriting well when:
- Letters are clearly formed and not connected (print style, not cursive)
- Variables are written in distinct sizes (clearly different x and × for example)
- Spacing between symbols is consistent
### What Reduces Accuracy
- **Rotated images** — formulas at an angle are harder to parse
- **Overlapping elements** — crossed-out work, annotations, or arrows near symbols
- **Pencil on paper** — low contrast; try increasing image brightness/contrast before uploading
- **Multiple formulas in one image** — crop to the specific formula you need
- **Decorative fonts** — calligraphic or stylized mathematical writing
## Improving Results
If you're getting errors, try these steps in order:
1. **Increase image resolution** — scan at 300 DPI instead of 150 DPI
2. **Improve contrast** — use a photo editor to increase brightness and contrast
3. **Crop tightly** — remove surrounding text and whitespace
4. **Straighten the image** — correct rotation before uploading
5. **Re-photograph** — better lighting, closer distance, sharper focus
## Reporting Errors
Found a formula type that TexPixel consistently gets wrong? Let us know — accuracy feedback directly improves the model over time.
Contact us at: [support@texpixel.com](mailto:support@texpixel.com)
---
[Upload a formula and test accuracy →](/app)