- Rewrote DocsListPage and DocDetailPage with landing.css aesthetic (icon cards, skeleton loader, prose styles, CTA box) - Added docs-specific CSS to landing.css - Created image-to-latex, copy-to-word, ocr-accuracy, pdf-extraction articles in both English and Chinese - Updated DocsSeoSection guide cards to link to real doc slugs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
76 lines
2.8 KiB
Markdown
76 lines
2.8 KiB
Markdown
---
|
||
title: PDF Extraction
|
||
description: Extract and convert formulas from PDF documents automatically with TexPixel
|
||
slug: pdf-extraction
|
||
date: 2026-03-25
|
||
tags: [PDF, extraction]
|
||
order: 6
|
||
---
|
||
|
||
# PDF Extraction
|
||
|
||
TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.
|
||
|
||
## How to Extract from a PDF
|
||
|
||
1. Click the upload zone or drag and drop your PDF file.
|
||
2. TexPixel detects all pages and identifies formula regions.
|
||
3. Each recognized formula is listed in the result panel.
|
||
4. Copy individual formulas or export the entire document as DOCX.
|
||
|
||
## What Gets Extracted
|
||
|
||
TexPixel identifies formulas in PDFs regardless of whether they were:
|
||
- Typeset in LaTeX (rendered as vector math)
|
||
- Embedded as images (scanned pages)
|
||
- A mix of both
|
||
|
||
For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.
|
||
|
||
## Supported PDF Types
|
||
|
||
| Type | Description | Accuracy |
|
||
|---|---|---|
|
||
| Vector PDF | Created from LaTeX, Word, or typesetting tools | 95–99% |
|
||
| Scanned PDF (high quality) | 300 DPI scan of printed text | 90–97% |
|
||
| Scanned PDF (low quality) | < 150 DPI or poor contrast | 60–80% |
|
||
| Photo PDF | Photographed pages embedded as images | 75–90% |
|
||
|
||
## File Limits
|
||
|
||
- **Max file size:** 20 MB
|
||
- **Max pages:** 50 pages per upload (Pro plan: unlimited)
|
||
- **Processing time:** ~2–5 seconds per page
|
||
|
||
For documents exceeding these limits, split the PDF into smaller chunks before uploading.
|
||
|
||
## Exporting PDF Results
|
||
|
||
After extraction, you can export in several ways:
|
||
|
||
- **Copy individual formula** — click any recognized formula to copy its LaTeX
|
||
- **DOCX export** — download the full document with formulas as native Word equations
|
||
- **Batch copy** — copy all formulas as a list (Pro feature)
|
||
|
||
## Tips for Better PDF Results
|
||
|
||
- **Use the original PDF**, not a re-scanned copy — vector PDFs give the best results
|
||
- **Avoid password-protected PDFs** — these cannot be processed
|
||
- **Crop pages** if a PDF has wide margins with no content — smaller pages process faster
|
||
- **Split by chapter** for very large documents to stay within page limits
|
||
|
||
## Common Issues
|
||
|
||
**"No formulas found"**
|
||
The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.
|
||
|
||
**Formulas recognized but garbled**
|
||
This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.
|
||
|
||
**Processing is slow**
|
||
Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.
|
||
|
||
---
|
||
|
||
[Upload a PDF and extract formulas →](/app)
|