76 lines
2.8 KiB
Markdown
76 lines
2.8 KiB
Markdown
|
|
---
|
|||
|
|
title: PDF Extraction
|
|||
|
|
description: Extract and convert formulas from PDF documents automatically with TexPixel
|
|||
|
|
slug: pdf-extraction
|
|||
|
|
date: 2026-03-25
|
|||
|
|
tags: [PDF, extraction]
|
|||
|
|
order: 6
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# PDF Extraction
|
|||
|
|
|
|||
|
|
TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.
|
|||
|
|
|
|||
|
|
## How to Extract from a PDF
|
|||
|
|
|
|||
|
|
1. Click the upload zone or drag and drop your PDF file.
|
|||
|
|
2. TexPixel detects all pages and identifies formula regions.
|
|||
|
|
3. Each recognized formula is listed in the result panel.
|
|||
|
|
4. Copy individual formulas or export the entire document as DOCX.
|
|||
|
|
|
|||
|
|
## What Gets Extracted
|
|||
|
|
|
|||
|
|
TexPixel identifies formulas in PDFs regardless of whether they were:
|
|||
|
|
- Typeset in LaTeX (rendered as vector math)
|
|||
|
|
- Embedded as images (scanned pages)
|
|||
|
|
- A mix of both
|
|||
|
|
|
|||
|
|
For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.
|
|||
|
|
|
|||
|
|
## Supported PDF Types
|
|||
|
|
|
|||
|
|
| Type | Description | Accuracy |
|
|||
|
|
|---|---|---|
|
|||
|
|
| Vector PDF | Created from LaTeX, Word, or typesetting tools | 95–99% |
|
|||
|
|
| Scanned PDF (high quality) | 300 DPI scan of printed text | 90–97% |
|
|||
|
|
| Scanned PDF (low quality) | < 150 DPI or poor contrast | 60–80% |
|
|||
|
|
| Photo PDF | Photographed pages embedded as images | 75–90% |
|
|||
|
|
|
|||
|
|
## File Limits
|
|||
|
|
|
|||
|
|
- **Max file size:** 20 MB
|
|||
|
|
- **Max pages:** 50 pages per upload (Pro plan: unlimited)
|
|||
|
|
- **Processing time:** ~2–5 seconds per page
|
|||
|
|
|
|||
|
|
For documents exceeding these limits, split the PDF into smaller chunks before uploading.
|
|||
|
|
|
|||
|
|
## Exporting PDF Results
|
|||
|
|
|
|||
|
|
After extraction, you can export in several ways:
|
|||
|
|
|
|||
|
|
- **Copy individual formula** — click any recognized formula to copy its LaTeX
|
|||
|
|
- **DOCX export** — download the full document with formulas as native Word equations
|
|||
|
|
- **Batch copy** — copy all formulas as a list (Pro feature)
|
|||
|
|
|
|||
|
|
## Tips for Better PDF Results
|
|||
|
|
|
|||
|
|
- **Use the original PDF**, not a re-scanned copy — vector PDFs give the best results
|
|||
|
|
- **Avoid password-protected PDFs** — these cannot be processed
|
|||
|
|
- **Crop pages** if a PDF has wide margins with no content — smaller pages process faster
|
|||
|
|
- **Split by chapter** for very large documents to stay within page limits
|
|||
|
|
|
|||
|
|
## Common Issues
|
|||
|
|
|
|||
|
|
**"No formulas found"**
|
|||
|
|
The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.
|
|||
|
|
|
|||
|
|
**Formulas recognized but garbled**
|
|||
|
|
This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.
|
|||
|
|
|
|||
|
|
**Processing is slow**
|
|||
|
|
Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
[Upload a PDF and extract formulas →](/app)
|