doc_ai_frontend/content/docs/en/pdf-extraction.md

---
title: PDF Extraction
description: Extract and convert formulas from PDF documents automatically with TexPixel
slug: pdf-extraction
date: 2026-03-25
tags: [PDF, extraction]
order: 6
---

# PDF Extraction

TexPixel can process entire PDF documents and extract every formula from every page — automatically. This is useful for textbooks, research papers, or any multi-page document with mathematical content.

## How to Extract from a PDF

1. Click the upload zone or drag and drop your PDF file.
2. TexPixel detects all pages and identifies formula regions.
3. Each recognized formula is listed in the result panel.
4. Copy individual formulas or export the entire document as DOCX.

## What Gets Extracted

TexPixel identifies formulas in PDFs regardless of whether they were:
- Typeset in LaTeX (rendered as vector math)
- Embedded as images (scanned pages)
- A mix of both

For vector PDFs (generated from LaTeX or Word), recognition accuracy is typically 95%+. For scanned/image PDFs, accuracy follows the same image quality guidelines as regular image uploads.

## Supported PDF Types

| Type | Description | Accuracy |
|---|---|---|
| Vector PDF | Created from LaTeX, Word, or typesetting tools | 95–99% |
| Scanned PDF (high quality) | 300 DPI scan of printed text | 90–97% |
| Scanned PDF (low quality) | < 150 DPI or poor contrast | 60–80% |
| Photo PDF | Photographed pages embedded as images | 75–90% |

## File Limits

- **Max file size:** 20 MB
- **Max pages:** 50 pages per upload (Pro plan: unlimited)
- **Processing time:** ~2–5 seconds per page

For documents exceeding these limits, split the PDF into smaller chunks before uploading.

## Exporting PDF Results

After extraction, you can export in several ways:

- **Copy individual formula** — click any recognized formula to copy its LaTeX
- **DOCX export** — download the full document with formulas as native Word equations
- **Batch copy** — copy all formulas as a list (Pro feature)

## Tips for Better PDF Results

- **Use the original PDF**, not a re-scanned copy — vector PDFs give the best results
- **Avoid password-protected PDFs** — these cannot be processed
- **Crop pages** if a PDF has wide margins with no content — smaller pages process faster
- **Split by chapter** for very large documents to stay within page limits

## Common Issues

**"No formulas found"**
The PDF may be encrypted, have formulas stored as complex vector paths, or use non-standard encoding. Try converting the page to a PNG image and uploading that instead.

**Formulas recognized but garbled**
This often happens with very low DPI scans. Try using a PDF scanner app to rescan at 300 DPI before uploading.

**Processing is slow**
Large PDFs with many pages can take 30–60 seconds. This is normal. The result will appear when processing is complete.

---

[Upload a PDF and extract formulas →](/app)