- Delete blog/copy-math-to-word (EN+ZH) — identical to docs/copy-to-word - Rewrite blog/pdf-formula-issues as narrative troubleshooting story; operational steps now link out to docs/pdf-extraction - Add "Further reading" cross-links: 4 docs → relevant blog posts - Add "See also" cross-links: 3 blog posts → relevant docs Docs = product reference; Blog = narrative/use cases/opinions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.0 KiB
title, description, slug, date, tags, order
| title | description | slug | date | tags | order | ||
|---|---|---|---|---|---|---|---|
| PDF Extraction | Extract and convert formulas from PDF documents automatically with TexPixel | pdf-extraction | 2026-03-25 |
|
6 |
PDF Extraction
TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.
How to Extract from a PDF
- Click the upload zone or drag and drop your PDF file.
- TexPixel detects all pages and identifies formula regions.
- Each recognized formula is listed in the result panel.
- Copy individual formulas or export the entire document as DOCX.
What Gets Extracted
TexPixel identifies formulas in PDFs regardless of whether they were:
- Typeset in LaTeX (rendered as vector math)
- Embedded as images (scanned pages)
- A mix of both
For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.
Supported PDF Types
| Type | Description | Accuracy |
|---|---|---|
| Vector PDF | Created from LaTeX, Word, or typesetting tools | 95–99% |
| Scanned PDF (high quality) | 300 DPI scan of printed text | 90–97% |
| Scanned PDF (low quality) | < 150 DPI or poor contrast | 60–80% |
| Photo PDF | Photographed pages embedded as images | 75–90% |
File Limits
- Max file size: 20 MB
- Max pages: 50 pages per upload (Pro plan: unlimited)
- Processing time: ~2–5 seconds per page
For documents exceeding these limits, split the PDF into smaller chunks before uploading.
Exporting PDF Results
After extraction, you can export in several ways:
- Copy individual formula — click any recognized formula to copy its LaTeX
- DOCX export — download the full document with formulas as native Word equations
- Batch copy — copy all formulas as a list (Pro feature)
Tips for Better PDF Results
- Use the original PDF, not a re-scanned copy — vector PDFs give the best results
- Avoid password-protected PDFs — these cannot be processed
- Crop pages if a PDF has wide margins with no content — smaller pages process faster
- Split by chapter for very large documents to stay within page limits
Common Issues
"No formulas found" The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.
Formulas recognized but garbled This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.
Processing is slow Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.
Further reading: I tried to extract formulas from my professor's PDF — real-world troubleshooting →